## Adaptive Statistical Language Modelling (1994)

### BibTeX

@MISC{Zue94adaptivestatistical,

author = {Victor Zue and F. R. Morgenthaler and Raymond Lau and Raymond Lau},

title = {Adaptive Statistical Language Modelling},

year = {1994}

}

### OpenURL

### Abstract

The trigram statistical language model is remarkably successful when used in such applications as speech recognition. However, the trigram model is static in that it only considers the previous two words when making a prediction about a future word. The work presented here attempts to improve upon the trigram model by considering additional contextual and longer distance information. This is frequently referred to in the literature as adaptive statistical language modelling because the model is thought of as adapting to the longer term information. This work considers the creation of topic specific models, statistical evidence from the presence or absence of triggers, or related words, in the document history (document triggers) and in the current sentence (in-sentence triggers), and the incorporation of the document cache, which predicts the probability of a word by considering its frequency in the document history. An important result of this work is that the presence of self-triggers, that is, whether or not the word itself occurred in the document history, is an extremely important piece of information. A maximum entropy (ME) approach will be used in many instances to incorporate information from different sources. Maximum entropy considers a model which maximizes entropy while satisfying the constraints presented by the information we wish to incorporate. The generalized iterative scaling (GIS) algorithm can be used to compute the maximum entropy solution. This work also considers various methods of smoothing the information in a maximum entropy model. An inportant result is that smoothing improves performance noticibly and that Good-Turing discounting is an effective method of smoothing. Thesis Supervisor: Victor Zue Title: Principal Research Scientist, Departme...