Results 1 - 10
of
14
Two decades of statistical language modeling: Where do we go from here
- Proceedings of the IEEE
, 2000
"... Statistical Language Models estimate the distribution of various natural language phenomena for the purpose of speech recognition and other language technologies. Since the first significant model was proposed in 1980, many attempts have been made to improve the state of the art. We review them here ..."
Abstract
-
Cited by 119 (1 self)
- Add to MetaCart
Statistical Language Models estimate the distribution of various natural language phenomena for the purpose of speech recognition and other language technologies. Since the first significant model was proposed in 1980, many attempts have been made to improve the state of the art. We review them here, point to a few promising directions, and argue for a Bayesian approach to integration of linguistic theories with data. 1. OUTLINE Statistical language modeling (SLM) is the attempt to capture regularities of natural language for the purpose of improving the performance of various natural language applications. By and large, statistical language modeling amounts to estimating the probability distribution of various linguistic units, such as words, sentences, and whole documents. Statistical language modeling is crucial for a large variety of language technology applications. These include speech recognition (where SLM got its start), machine translation, document classification and routing, optical character recognition, information retrieval, handwriting recognition, spelling correction, and many more. In machine translation, for example, purely statistical approaches have been introduced in [1]. But even researchers using rule-based approaches have found it beneficial to introduce some elements of SLM and statistical estimation [2]. In information retrieval, a language modeling approach was recently proposed by [3], and a statistical/information theoretical approach was developed by [4]. SLM employs statistical estimation techniques using language training data, that is, text. Because of the categorical nature of language, and the large vocabularies people naturally use, statistical techniques must estimate a large number of parameters, and consequently depend critically on the availability of large amounts of training data.
Catching the Drift: Probabilistic Content Models, with Applications to Generation and Summarization
, 2004
"... We consider the problem of modeling the content structure of texts within a specific domain, in terms of the topics the texts address and the order in which these topics appear. ..."
Abstract
-
Cited by 67 (3 self)
- Add to MetaCart
We consider the problem of modeling the content structure of texts within a specific domain, in terms of the topics the texts address and the order in which these topics appear.
Statistical language model adaptation: review and perspectives
- Speech Communication
, 2004
"... Speech recognition performance is severely affected when the lexical, syntactic, or semantic characteristics of the discourse in the training and recognition tasks differ. The aim of language model adaptation is to exploit specific, albeit limited, knowledge about the recognition task to compensate ..."
Abstract
-
Cited by 35 (0 self)
- Add to MetaCart
Speech recognition performance is severely affected when the lexical, syntactic, or semantic characteristics of the discourse in the training and recognition tasks differ. The aim of language model adaptation is to exploit specific, albeit limited, knowledge about the recognition task to compensate for this mismatch. More generally, an adaptive language model seeks to maintain an adequate representation of the current task domain under changing conditions involving potential variations in vocabulary, syntax, content, and style. This paper presents an overview of the major approaches proposed to address this issue, and offers some perspectives regarding their comparative merits and associated tradeoffs. Ó 2003 Elsevier B.V. All rights reserved. 1.
Maximum Entropy Techniques for Exploiting Syntactic, Semantic and Collocational Dependencies in Language Modeling
"... A new statistical language model is presented which combines collocational dependencies with two important sources of long-range statistical dependence: the syntactic structure and the topic of a sentence. These dependencies or constraints are integrated using the maximum entropy technique. Subs ..."
Abstract
-
Cited by 33 (7 self)
- Add to MetaCart
A new statistical language model is presented which combines collocational dependencies with two important sources of long-range statistical dependence: the syntactic structure and the topic of a sentence. These dependencies or constraints are integrated using the maximum entropy technique. Substantial improvements are demonstrated over a trigram model in both perplexity and speech recognition accuracy on the Switchboard task. A detailed analysis of the performance of this language model is provided in order to characterize the manner in which it performs better than a standard N-gram model. It is shown that topic dependencies are most useful in predicting words which are semantically related by the subject matter of the conversation. Syntactic dependencies on the other hand are found to be most helpful in positions where the best predictors of the following word are not within N-gram range due to an intervening phrase or clause. It is also shown that these two methods ind...
Topic modeling in fringe word prediction for aac
- In IUI
, 2006
"... Word prediction can be used for enhancing the communication ability of persons with speech and language impairments. In this work, we explore two methods of adapting a language model to the topic of conversation, and apply these methods to the prediction of fringe words. Keywords Word prediction, ke ..."
Abstract
-
Cited by 11 (6 self)
- Add to MetaCart
Word prediction can be used for enhancing the communication ability of persons with speech and language impairments. In this work, we explore two methods of adapting a language model to the topic of conversation, and apply these methods to the prediction of fringe words. Keywords Word prediction, keystroke savings, alternative and augmentative communication (AAC), topic modeling, language modeling 1.
Nonlinear Interpolation Of Topic Models For Language Model Adaptation
- IN PROCEEDINGS OF ICSLP-98
, 1998
"... Topic adaptation for language modeling is concerned with adjusting the probabilities in a language model to better reflect the expected frequencies of topical words for a new document. The language model to be adapted is usually built from large amounts of training text and is considered representat ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
Topic adaptation for language modeling is concerned with adjusting the probabilities in a language model to better reflect the expected frequencies of topical words for a new document. The language model to be adapted is usually built from large amounts of training text and is considered representative of the current domain. In order to adapt this model for a new document, the topic (or topics) of the new document are identified. Then, the probabilities of words that are more likely to occur in the identified topic(s) than in general are boosted, and the probabilities of words that are unlikely for the identified topic(s) are suppressed. We present a novel technique for adapting a languagemodel to the topic of a document, using a nonlinear interpolation of n-gram language models. A three-way, mutually exclusive division of the vocabulary into general, on-topic and off-topic word classes is used to combine word predictions from a topic-specific and a general language model. We achieve ...
A comparative study on language model adaptation techniques using new evaluation metrics
- Proc. HLT/EMNLP
, 2005
"... This paper presents comparative experimental results on four techniques of language model adaptation, including a maximum a posteriori (MAP) method and three discriminative training methods, the boosting algorithm, the average perceptron and the minimum sample risk method, on the task of Japanese Ka ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
This paper presents comparative experimental results on four techniques of language model adaptation, including a maximum a posteriori (MAP) method and three discriminative training methods, the boosting algorithm, the average perceptron and the minimum sample risk method, on the task of Japanese Kana-Kanji conversion. We evaluate these techniques beyond simply using the character error rate (CER): the CER results are interpreted using a metric of domain similarity between background and adaptation domains, and are further evaluated by correlating them with a novel metric for measuring the side effects of adapted models. Using these metrics, we show that the discriminative methods are superior to a MAP-based method not only in terms of achieving larger CER reduction, but also of being more robust against the similarity of background and adaptation domains, and achieve larger CER reduction with fewer side effects. 1
Data Augmentation And Language Model Adaptation
- University of Avignon 84911 Avignon Cedex 9
, 1998
"... A method is presented for augmenting word n-gram counts in a matrix which represents a 2-gram Language Model (LM). ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
A method is presented for augmenting word n-gram counts in a matrix which represents a 2-gram Language Model (LM).
Improvements in Japanese Broadcast News Transcription
- Proc. DARPA Broadcast News Workshop
, 1999
"... This paper reports on recent improvements in Japanese broadcast news transcription and topic extraction. We constructed a language model that depends on the readings of words in order to prevent recognition errors caused by context-dependent readings of Japanese characters. We also introduced interj ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
This paper reports on recent improvements in Japanese broadcast news transcription and topic extraction. We constructed a language model that depends on the readings of words in order to prevent recognition errors caused by context-dependent readings of Japanese characters. We also introduced interjection modeling into the language model. To improve the model's performance for a series of sentences spoken by one speaker, an on-line incremental speaker adaptation was applied. We investigated a method for extracting topic-words from the speech recognition results that was based on a significance measure. This paper also proposes a new formulation for speech recognition/understanding systems, in which the a posteriori probability of a message that the speaker intends to address given an observed acoustic sequence is maximized. We applied the formulation to rescoring the recognition hypotheses. 1. INTRODUCTION We have been developing a large-vocabulary continuous -speech recognition (LVC...
MAXIMUM ENTROPY BASED GENERIC FILTER FOR LANGUAGE MODEL ADAPTATION
"... Language Model (LM) Adaptation has been shown to be very important to reduce the Word Error Rate (WER) in task specific speech recognition systems. Adaptation data collected in the real world, however, usually contain large amount of non-dictated text such as email headers, long URL, code fragments, ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
Language Model (LM) Adaptation has been shown to be very important to reduce the Word Error Rate (WER) in task specific speech recognition systems. Adaptation data collected in the real world, however, usually contain large amount of non-dictated text such as email headers, long URL, code fragments, included reply, signature, etc. that the user will never dictate. Adapting with these data may corrupt the LM. In this paper, we propose a Maximum Entropy (MaxEnt) based filter to remove a variety of non-dictated words from the adaptation data and improve the effectiveness of the LM adaptation. We argue that this generic filter is language independent and efficient. We describe the design of the filter, and show that the usage of the filter can give us 10 % relative WER reduction over LM adaptation without the filtering, and 22 % relative WER reduction over the un-adapted LM in English email dictation task. 1.

