Results 1 -
5 of
5
Two decades of statistical language modeling: Where do we go from here
- Proceedings of the IEEE
, 2000
"... Statistical Language Models estimate the distribution of various natural language phenomena for the purpose of speech recognition and other language technologies. Since the first significant model was proposed in 1980, many attempts have been made to improve the state of the art. We review them here ..."
Abstract
-
Cited by 119 (1 self)
- Add to MetaCart
Statistical Language Models estimate the distribution of various natural language phenomena for the purpose of speech recognition and other language technologies. Since the first significant model was proposed in 1980, many attempts have been made to improve the state of the art. We review them here, point to a few promising directions, and argue for a Bayesian approach to integration of linguistic theories with data. 1. OUTLINE Statistical language modeling (SLM) is the attempt to capture regularities of natural language for the purpose of improving the performance of various natural language applications. By and large, statistical language modeling amounts to estimating the probability distribution of various linguistic units, such as words, sentences, and whole documents. Statistical language modeling is crucial for a large variety of language technology applications. These include speech recognition (where SLM got its start), machine translation, document classification and routing, optical character recognition, information retrieval, handwriting recognition, spelling correction, and many more. In machine translation, for example, purely statistical approaches have been introduced in [1]. But even researchers using rule-based approaches have found it beneficial to introduce some elements of SLM and statistical estimation [2]. In information retrieval, a language modeling approach was recently proposed by [3], and a statistical/information theoretical approach was developed by [4]. SLM employs statistical estimation techniques using language training data, that is, text. Because of the categorical nature of language, and the large vocabularies people naturally use, statistical techniques must estimate a large number of parameters, and consequently depend critically on the availability of large amounts of training data.
Whole-Sentence Exponential Language Models: A Vehicle for Linguistic-Statistical Integration
- Computers, Speech and Language
, 2001
"... We introduce an exponential language model which models a whole sentence or utterance as a single unit. By avoiding the chain rule, the model treats each sentence as a "bag of features", where features are arbitrary computable properties of the sentence. The new model is computationally more effici ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
We introduce an exponential language model which models a whole sentence or utterance as a single unit. By avoiding the chain rule, the model treats each sentence as a "bag of features", where features are arbitrary computable properties of the sentence. The new model is computationally more efficient, and more naturally suited to modeling global sentential phenomena, than the conditional exponential (e.g. Maximum Entropy) models proposed to date. Using the model is straightforward. Training the model requires sampling from an exponential distribution. We describe the challenge of applying Monte Carlo Markov Chain (MCMC) and other sampling techniques to natural language, and discuss smoothing and step-size selection. We then present a novel procedure for feature selection, which exploits discrepancies between the existing model and the training corpus. We demonstrate our ideas by constructing and analyzing competitive models in the Switchboard domain, incorporating lexical and syntact...
Interactive Feature Induction And Logistic Regression For Whole Sentence Exponential Language Models
- IN PROCEEDINGS OF THE IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING
, 1999
"... Whole sentence exponential language models directly model the probability of an entire sentence using arbitrary computable properties of that sentence. We present an interactive methodology for feature induction, and demonstrate it in the simple but common case of a trigram baseline, focusing on fea ..."
Abstract
-
Cited by 5 (4 self)
- Add to MetaCart
Whole sentence exponential language models directly model the probability of an entire sentence using arbitrary computable properties of that sentence. We present an interactive methodology for feature induction, and demonstrate it in the simple but common case of a trigram baseline, focusing on features that capture the linguistic notion of semantic coherence. We then show how parametric regression can be used in this setup to efficiently estimate the model's parameters, whereas non-parametric regression can be used to construct more powerful exponential models from the raw features.
Incorporating Linguistic Structure into Statistical Language Models
- In Philosophical Transactions of the Royal Society of London A
, 2000
"... this paper. References ..."
Using Wordnet to Supplement Corpus Statistics
"... Data-driven techniques, although commonly used for many natural language processing tasks, require large amounts of data to perform well. Even with significant amounts of data there is always a long tail of infrequent linguistic events, which results in poor statistical estimation. To lessen the eff ..."
Abstract
- Add to MetaCart
Data-driven techniques, although commonly used for many natural language processing tasks, require large amounts of data to perform well. Even with significant amounts of data there is always a long tail of infrequent linguistic events, which results in poor statistical estimation. To lessen the effect of these unreliable estimates, we propose augmenting corpus statistics with linguistic knowledge encoded in existing resources. This paper evaluates the use-fulness of the information encoded in WordNet for two tasks: improving perplexity of a bigram lan-guage model trained on very little data, and finding longer-distance correlations in text. Word similar-ities derived from WordNet are evaluated by com-paring them to association statistics derived from large amounts of data. Although we see the trends we were hoping for, the overall effect is small. We have found that WordNet does not currently have the breadth or quantity of relations necessary to make substantial improvements over purely data-driven approaches for these two tasks. 1

