• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

A Maximum Entropy Language Model to Integrate N-Grams and Topic Dependencies for Conversational Speech Recognition (1999)

by S Khudanpur, J Wu
Venue:Proceedings of ICASSP'99
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 15
Next 10 →

Two decades of statistical language modeling: Where do we go from here

by Ronald Rosenfeld - Proceedings of the IEEE , 2000
"... Statistical Language Models estimate the distribution of various natural language phenomena for the purpose of speech recognition and other language technologies. Since the first significant model was proposed in 1980, many attempts have been made to improve the state of the art. We review them here ..."
Abstract - Cited by 119 (1 self) - Add to MetaCart
Statistical Language Models estimate the distribution of various natural language phenomena for the purpose of speech recognition and other language technologies. Since the first significant model was proposed in 1980, many attempts have been made to improve the state of the art. We review them here, point to a few promising directions, and argue for a Bayesian approach to integration of linguistic theories with data. 1. OUTLINE Statistical language modeling (SLM) is the attempt to capture regularities of natural language for the purpose of improving the performance of various natural language applications. By and large, statistical language modeling amounts to estimating the probability distribution of various linguistic units, such as words, sentences, and whole documents. Statistical language modeling is crucial for a large variety of language technology applications. These include speech recognition (where SLM got its start), machine translation, document classification and routing, optical character recognition, information retrieval, handwriting recognition, spelling correction, and many more. In machine translation, for example, purely statistical approaches have been introduced in [1]. But even researchers using rule-based approaches have found it beneficial to introduce some elements of SLM and statistical estimation [2]. In information retrieval, a language modeling approach was recently proposed by [3], and a statistical/information theoretical approach was developed by [4]. SLM employs statistical estimation techniques using language training data, that is, text. Because of the categorical nature of language, and the large vocabularies people naturally use, statistical techniques must estimate a large number of parameters, and consequently depend critically on the availability of large amounts of training data.

Maximum Entropy Techniques for Exploiting Syntactic, Semantic and Collocational Dependencies in Language Modeling

by Sanjeev Khudanpur, Jun Wu
"... A new statistical language model is presented which combines collocational dependencies with two important sources of long-range statistical dependence: the syntactic structure and the topic of a sentence. These dependencies or constraints are integrated using the maximum entropy technique. Subs ..."
Abstract - Cited by 33 (7 self) - Add to MetaCart
A new statistical language model is presented which combines collocational dependencies with two important sources of long-range statistical dependence: the syntactic structure and the topic of a sentence. These dependencies or constraints are integrated using the maximum entropy technique. Substantial improvements are demonstrated over a trigram model in both perplexity and speech recognition accuracy on the Switchboard task. A detailed analysis of the performance of this language model is provided in order to characterize the manner in which it performs better than a standard N-gram model. It is shown that topic dependencies are most useful in predicting words which are semantically related by the subject matter of the conversation. Syntactic dependencies on the other hand are found to be most helpful in positions where the best predictors of the following word are not within N-gram range due to an intervening phrase or clause. It is also shown that these two methods ind...

Combining Nonlocal, Syntactic And N-Gram Dependencies In Language Modeling

by Jun Wu, Sanjeev Khudanpur - Proceedings of Eurospeech'99, vol , 1999
"... A new language model is presented which incorporates local N-gram dependencies with two important sources of long-range dependencies: the syntactic structure and the topic of a sentence. These dependencies or constraints are integrated using the maximum entropy method. Substantial improvements are d ..."
Abstract - Cited by 14 (4 self) - Add to MetaCart
A new language model is presented which incorporates local N-gram dependencies with two important sources of long-range dependencies: the syntactic structure and the topic of a sentence. These dependencies or constraints are integrated using the maximum entropy method. Substantial improvements are demonstrated over a trigram model in both perplexity and speech recognition accuracy on the Switchboard task. It is shown that topic dependencies are most useful in predicting words which are semantically related by the subject matter of the conversation. Syntactic dependencies on the other hand are found to be most helpful in positions where the best predictors of the following word are not within N-gram range due to an intervening phrase or clause. It is also shown that these two methods individually enhance an N-gram model in complementary ways and the overall improvement from their combination is nearly additive. 1. INTRODUCTION N-gram models have been widely used as statistical models ...

Structural Event Detection for Rich Transcription of Speech

by Yang Liu , 2004
"... xviii 1 ..."
Abstract - Cited by 12 (5 self) - Add to MetaCart
Abstract not found

Using Cross-Language Cues For Story-Specific Language Modeling

by Sanjeev Khudanpur, Woosung Kim - In Proc. ICSLP , 2002
"... We propose methods to exploit contemporary news articles in a resource rich language, together with cross-language information retrieval and machine translation, to sharpen language models for a news story in a language with fewer linguistic resources. We report experimental results on storyspecific ..."
Abstract - Cited by 11 (7 self) - Add to MetaCart
We propose methods to exploit contemporary news articles in a resource rich language, together with cross-language information retrieval and machine translation, to sharpen language models for a news story in a language with fewer linguistic resources. We report experimental results on storyspecific Chinese language models that use cues from a parallel corpus of English news stories. We demonstrate that even with fairly crude cross-language information retrieval, level-1 machine translation and simple linear interpolation, a significant (18%) reduction in perplexity may be obtained over a Chinese trigram model. We also demonstrate that this method of sharpening the Chinese language model is complementary to other techniques like topic dependent modeling, and the two in combination result in an even greater reduction in perplexity (28%).

Efficient Training Methods For Maximum Entropy Language Modeling

by Jun Wu, Sanjeev Khudanpur - IN PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE TECHNOLOGIES (ICSLP-00 , 2000
"... Maximum entropy language modeling techniques combine different sources of statistical dependence, such as syntactic relationships, topic cohesiveness and collocation frequency, in a unified and effective language model. These techniques however are also computationally very intensive, particularly d ..."
Abstract - Cited by 7 (1 self) - Add to MetaCart
Maximum entropy language modeling techniques combine different sources of statistical dependence, such as syntactic relationships, topic cohesiveness and collocation frequency, in a unified and effective language model. These techniques however are also computationally very intensive, particularly during model estimation, compared to the more prevalent alternative of interpolating several simple models, each capturing one type of dependency. In this paper we present ways which significantly reduce this complexity by reorganizing the required computations. We show that in case of a model with N-gram constraints, each iteration of the parameter estimation algorithm requires the same amount of computation as estimating a comparable back-off N-gram model. In general, the computational cost of each iteration in model estimation is linear in the number of distinct "histories" seen in the training corpus, times a model-class dependent factor. The reorganization focuses mainly on reducing this...

Topic-Based Mixture Language Modelling

by Yoshihiko Gotoh, Steve Renals, Y. Gotoh , 2000
"... This paper describes an approach for constructing a mixture of language models based on simple statistical notions of semantics using probabilistic models developed for information retrieval. The approach encapsulates corpus-derived semantic information and is able to model varying styles of text. U ..."
Abstract - Cited by 3 (0 self) - Add to MetaCart
This paper describes an approach for constructing a mixture of language models based on simple statistical notions of semantics using probabilistic models developed for information retrieval. The approach encapsulates corpus-derived semantic information and is able to model varying styles of text. Using such information, the corpus texts are clustered in an unsupervised manner and a mixture of topic-specific language models is automatically created. The principal contribution of this work is to characterise the document space resulting from information retrieval techniques and to demonstrate the approach for mixture language modelling. A comparison is made between manual and automatic clustering in order to elucidate how the global content information is expressed in the space. We also compare (in terms of association with manual clustering and language modelling accuracy) alternative term-weighting schemes and the effect of singular value decomposition dimension reduction (...

Maximum Entropy Language Modeling with Non-Local Dependencies -- Dissertation Proposal

by Jun Wu , 2000
"... ..."
Abstract - Cited by 3 (0 self) - Add to MetaCart
Abstract not found

Building A Topic-Dependent Maximum Entropy Model For Very Large Corpora

by Jun Wu, Sanjeev Khudanpur - In Proceedings of ICASSP2002 , 1217
"... Maximum entropy (ME) techniques have been successfully used to combine different sources of linguistically meaningful constraints in language models. However, most of the current ME models can only be used for small corpora, since the computational load in training ME models for large corpora is unb ..."
Abstract - Cited by 3 (0 self) - Add to MetaCart
Maximum entropy (ME) techniques have been successfully used to combine different sources of linguistically meaningful constraints in language models. However, most of the current ME models can only be used for small corpora, since the computational load in training ME models for large corpora is unbearable. This problem is especially severe when non-local dependencies are considered. In this paper, we show how to train and use topic-dependent ME models efficiently for a very large corpus, Broadcast News (BN). The training time is greatly reduced by hierarchical training and divide-and-conquer approaches. The computation in using the model is also simplified by pre-normalizing the denominators of the ME model. We report new speech recognition results showing improvement with the topic model relative to the standard N-gram model for the Broadcast News task.

Language model switching based on topic detection for dialog speech processing

by Ian R. Lane, Tatsuya Kawahara, Tomoko Matsui - in Proceedings of the ICASSP, Hong Kong , 2003
"... An efficient, scalable speech recognition architecture is proposed for multidomain dialog systems by combining topic detection and topic-dependent language modeling. The inferred domain is automatically detected from the user’s utterance, and speech recognition is then performed with an appropriate ..."
Abstract - Cited by 2 (1 self) - Add to MetaCart
An efficient, scalable speech recognition architecture is proposed for multidomain dialog systems by combining topic detection and topic-dependent language modeling. The inferred domain is automatically detected from the user’s utterance, and speech recognition is then performed with an appropriate domain-dependent language model. The architecture improves accuracy and efficiency over current approaches and is scaleable to a large number of domains. In this paper, a novel framework using a multilayer hierarchy of language models is introduced in order to improve robustness against topic detection errors. The proposed system provides a relative reduction in WER of 10.5 % over a single language model system. Furthermore it achieves an accuracy that is comparable to using multiple language models in parallel while using only a fraction of the computational cost. 1.
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University