## A Maximum Entropy Language Model Integrating N-Grams And Topic Dependencies For Conversational Speech Recognition (1999)

Venue: | Proceedings of ICASSP'99 |

Citations: | 24 - 7 self |

### BibTeX

@INPROCEEDINGS{Khudanpur99amaximum,

author = {Sanjeev Khudanpur and Jun Wu},

title = {A Maximum Entropy Language Model Integrating N-Grams And Topic Dependencies For Conversational Speech Recognition},

booktitle = {Proceedings of ICASSP'99},

year = {1999},

pages = {553--556}

}

### Years of Citing Articles

### OpenURL

### Abstract

A compact language model which incorporates local dependencies in the form of N-grams and long distance dependencies through dynamic topic conditional constraints is presented. These constraints are integrated using the maximum entropy principle. Issues in assigning a topic to a test utterance are investigated. Recognition results on the Switchboard corpus are presented showing that with a very small increase in the number of model parameters, reduction in word error rate and language model perplexity are achieved over trigram models. Some analysis follows, demonstrating that the gains are even larger on content-bearing words. The results are compared with those obtained by interpolating topicindependent and topic-specific N-gram models. The framework presented here extends easily to incorporate other forms of statistical dependencies such as syntactic word-pair relationships or hierarchical topic constraints. 1. INTRODUCTION Language modeling is a crucial component of systems that c...

### Citations

431 |
Generalized iterative scaling for log-linear models
- Darroch, Ratcliff
- 1972
(Show Context)
Citation Context ...ile the fourth one is a topic-unigram parameter determined by termfrequencies in a particular topic. 2.2. Computational Issues in ME Model Estimation The generalized iterative scaling (GIS) algorithm =-=[6]-=- is used to compute the ME model parameters . Several challenges, predominantly associated with the computational and storage needs of the parameter estimation procedure, must be overcome in order to ... |

179 |
Why least squares and maximum entropy? An axiomatic approach to inference for linear inverse problems
- CsiszĂˇr
- 1991
(Show Context)
Citation Context ... corpus, just as overall Ngram frequencies are topic-independent salient features. An admissible model is then required to satisfy constraints that reflect both the sets of features. The ME principle =-=[5]-=- is used to select a statistical model which meets all these constraints. This method has the advantage that only constraints on those term-frequencies which vary significantly across topics are made ... |

123 | Exploiting syntactic structure for language modeling
- Chelba, Jelinek
- 1998
(Show Context)
Citation Context ...extends easily to combining other dependencies, our current efforts are in the direction of exploiting syntactic structure obtained from a left to right partial parse of the utterance as described in =-=[2]-=-. The syntactic constraints will provide information which complements both N-grams and topic dependencies. Additional constraints such as word class frequencies based on parts of speech, hierarchical... |

82 | Language model adaptation using mixtures and an exponentially decaying cache
- Clarkson, Robinson
- 1997
(Show Context)
Citation Context ... dependencies with N-grams in a statistically sound manner in the maximum entropy (ME) framework. Several models which combine topic related information with N-gram models have been studied, e.g., in =-=[1, 4, 3, 8, 9, 10]-=-. The essential idea comes from the information retrieval (IR) literature where extensive use is made of weighted term-frequencies to discern the topic or genre of a document. Most schemes [4, 8, 10] ... |

16 |
Exploiting both local and global constraints for multispan statistical language modeling
- Bellegarda
- 1998
(Show Context)
Citation Context ... dependencies with N-grams in a statistically sound manner in the maximum entropy (ME) framework. Several models which combine topic related information with N-gram models have been studied, e.g., in =-=[1, 4, 3, 8, 9, 10]-=-. The essential idea comes from the information retrieval (IR) literature where extensive use is made of weighted term-frequencies to discern the topic or genre of a document. Most schemes [4, 8, 10] ... |

9 |
Modeling long range dependencies in languages
- Iyer, Ostendorf
- 1996
(Show Context)
Citation Context ... dependencies with N-grams in a statistically sound manner in the maximum entropy (ME) framework. Several models which combine topic related information with N-gram models have been studied, e.g., in =-=[1, 4, 3, 8, 9, 10]-=-. The essential idea comes from the information retrieval (IR) literature where extensive use is made of weighted term-frequencies to discern the topic or genre of a document. Most schemes [4, 8, 10] ... |

2 |
et al, "Topic Adaptation for Language Modeling Using Unnormalized Exponential Models
- Chen
- 1419
(Show Context)
Citation Context |

2 | Exploiting nonlocal and syntactic word relationships in language models for conversational speech recognition
- Yarowsky
- 1997
(Show Context)
Citation Context ...the closest matching topic in the results presented here, though the formalism extends easily to soft topic decisions. We employ a standard cosine similarity measure commonly used in the IR community =-=[1, 7]-=- to assign a topic to test sentences. 2 The null topic, which defaults to a topic-independent baseline model, is available as one of the choices to the topic classifier. Source of Agreement of Utt. Le... |

2 |
et al, "Language Model Adaptation Using Dynamic Marginals
- Kneser
(Show Context)
Citation Context |

1 |
et al, "Adaptive Topic-Dep. Language Modeling Using WordBased Varigrams
- Martin
(Show Context)
Citation Context |