## Putting Language Into Language Modeling (1999)

Venue: | In Proc. of Eurospeech-99 |

Citations: | 17 - 0 self |

### BibTeX

@INPROCEEDINGS{Jelinek99puttinglanguage,

author = {Frederick Jelinek and Ciprian Chelba},

title = {Putting Language Into Language Modeling},

booktitle = {In Proc. of Eurospeech-99},

year = {1999},

pages = {1--6}

}

### Years of Citing Articles

### OpenURL

### Abstract

In this paper we describe the statistical Structured Language Model (SLM) that uses grammatical analysis of the hypothesized sentence segment (prefix) to predict the next word. We first describe the operation of a basic, completely lexicalized SLM that builds up partial parses as it proceeds left to right. We then develop a chart parsing algorithm and with its help a method to compute the prediction probabilities P (w i+1 jW i ): We suggest useful computational shortcuts followed by a method of training SLM parameters from text data. Finally, we introduce more detailed parametrization that involves non-terminal labeling and considerably improves smoothing of SLM statistical parameters. We conclude by presenting certain recognition and perplexity results achieved on standard corpora. 1. INTRODUCTION In the accepted statistical formulation of the speech recognition problem [1] the recognizer seeks to find the word string c W : = arg max W P (AjW)P (W) where A denotes the observab...

### Citations

2184 | Building a large annotated corpus of English: The Penn Treebank
- Marcus, Santorini, et al.
- 1993
(Show Context)
Citation Context ...es b T(i) that correspond to the sentences W(i); i = 1; 2; : : : ; K making up the training corpus. Of course, initial statistics would be derived from parses present in some convenient treebank.[10],=-=[11]-=- For the sake of brevity we now state without proof the basic recursion of the Viterbi algorithm. 8 Let xy y [i; j] denote the probability, given that x is the last exposed headword and w i is generat... |

768 |
Statistical Methods for Speech Recognition
- Jelinek
- 1997
(Show Context)
Citation Context ...arameters. We conclude by presenting certain recognition and perplexity results achieved on standard corpora. 1. INTRODUCTION In the accepted statistical formulation of the speech recognition problem =-=[1]-=- the recognizer seeks to find the word string c W : = arg max W P (AjW)P (W) where A denotes the observable speech signal, P (AjW) is the probability that when the word string W is spoken, the signal ... |

349 | Self-organized language modeling for speech recognition
- Jelinek
- 1990
(Show Context)
Citation Context ...-to right requirement for a language model, (b) inadequate parametrization, and (c) sparseness of data. Fortunately, we have had some initial success with the Structured Language Model (SLM) [4], [5],=-=[12]-=- that both reduces entropy and the error rate. In this presentation we give a description of operation of a basic SLM (Section 3), discuss its training, provide a new parsing algorithm, generalize the... |

273 |
Trainable grammars for speech recognition
- Baker
- 1979
(Show Context)
Citation Context ...rs by an appropriate maximum likelihood procedure applied to data. In principle, it would be possible to proceed analogously to the inside -- outside algorithm for probabilistic context free grammars =-=[9]-=-. The recursion (10) of Section 5 already corresponds to the inside algorithm and we could develop an outside analogue as well. However, such a re-estimation would be extremely costly. The simplest wa... |

125 | Exploiting syntactic structure for language modeling
- Chelba, Jelinek
- 1998
(Show Context)
Citation Context ... the left-to right requirement for a language model, (b) inadequate parametrization, and (c) sparseness of data. Fortunately, we have had some initial success with the Structured Language Model (SLM) =-=[4]-=-, [5],[12] that both reduces entropy and the error rate. In this presentation we give a description of operation of a basic SLM (Section 3), discuss its training, provide a new parsing algorithm, gene... |

91 |
An ecient recognition and syntax algorithm for contextfree languages
- Kasami
- 1965
(Show Context)
Citation Context ... SLM can be used to compute the language model probabilities (2) P (w i j\Phi(W i\Gamma1 )) = X T i P (w i ; T i\Gamma1 jW i\Gamma1 ) To do so, we will first develop a chart parsing algorithm [6],[7],=-=[8]-=-. In a previous paper [4] we have shown how to approximate the summation in (2) with the help of stacks that hold as entries the dominant terms P (w i ; T i\Gamma1 jW i\Gamma1 ) of that sum. The chart... |

27 |
Running a Grammar Factory: The Production of Syntactically Analysed Corpora or `Treebanks
- Leech, Garside
- 1991
(Show Context)
Citation Context ... parses b T(i) that correspond to the sentences W(i); i = 1; 2; : : : ; K making up the training corpus. Of course, initial statistics would be derived from parses present in some convenient treebank.=-=[10]-=-,[11] For the sake of brevity we now state without proof the basic recursion of the Viterbi algorithm. 8 Let xy y [i; j] denote the probability, given that x is the last exposed headword and w i is ge... |

23 | Recognition performance of a structured language model
- Chelba, Jelinek
- 1999
(Show Context)
Citation Context ...left-to right requirement for a language model, (b) inadequate parametrization, and (c) sparseness of data. Fortunately, we have had some initial success with the Structured Language Model (SLM) [4], =-=[5]-=-,[12] that both reduces entropy and the error rate. In this presentation we give a description of operation of a basic SLM (Section 3), discuss its training, provide a new parsing algorithm, generaliz... |

16 | Combining nonlocal, syntactic and n-gram dependencies in language modeling
- Wu, Khudanpur
- 1999
(Show Context)
Citation Context ... : The information extracted from T i might be made even more comprehensive if we took advantage of the maximum entropy estimation paradigm [2]. We have had some success with such an approach already =-=[13]-=-. 10. PRELIMINARY RESULTS We have tested the SLM on the Wall Street Journal and Switchboard tasks [5],[12]. Compared to the state-of-the-art trigram language model, the SLM has a lower perplexity by 1... |

15 |
A latent semantic analysis framework for large-span language modeling
- Bellegarda
- 1997
(Show Context)
Citation Context ...mprove on it in the last 20 years have failed. The one interesting enhancement, facilitated by maximum entropy estimation methodology, has been the use of triggers [2] or singular value decomposition =-=[3]-=- (either of which dynamically identify the topic of discourse) in combination with N \Gammagram models . 2. GRAMMATICAL ANALYSIS OF THE HISTORY It has always seemed desirable to base the language mode... |

14 |
Recognition and parsing of context free languages
- Younger
- 1967
(Show Context)
Citation Context ... the SLM can be used to compute the language model probabilities (2) P (w i j\Phi(W i\Gamma1 )) = X T i P (w i ; T i\Gamma1 jW i\Gamma1 ) To do so, we will first develop a chart parsing algorithm [6],=-=[7]-=-,[8]. In a previous paper [4] we have shown how to approximate the summation in (2) with the help of stacks that hold as entries the dominant terms P (w i ; T i\Gamma1 jW i\Gamma1 ) of that sum. The c... |

1 |
A Maximum Entropy Approach to Statistical Language Modeling
- Rosenfeld
- 1996
(Show Context)
Citation Context ... and, essentially, all attempts to improve on it in the last 20 years have failed. The one interesting enhancement, facilitated by maximum entropy estimation methodology, has been the use of triggers =-=[2]-=- or singular value decomposition [3] (either of which dynamically identify the topic of discourse) in combination with N \Gammagram models . 2. GRAMMATICAL ANALYSIS OF THE HISTORY It has always seemed... |