Results 1 -
8 of
8
A Bit of Progress in Language Modeling
, 2001
"... Language modeling is the art of determining the probability of a sequence of words. This is useful in a large variety of areas including speech recognition, optical character recognition, handwriting recognition, machine translation, and spelling correction (Church, 1988; Brown et al., 1990; Hull, 1 ..."
Abstract
-
Cited by 70 (1 self)
- Add to MetaCart
Language modeling is the art of determining the probability of a sequence of words. This is useful in a large variety of areas including speech recognition, optical character recognition, handwriting recognition, machine translation, and spelling correction (Church, 1988; Brown et al., 1990; Hull, 1992; Kernighan et al., 1990; Srihari and Baltus, 1992). The most commonly used language models are very simple (e.g. a Katz-smoothed trigram model). There are many improvements over this simple model however, including caching, clustering, higherorder n-grams, skipping models, and sentence-mixture models, all of which we will describe below. Unfortunately, these more complicated techniques have rarely been examined in combination. It is entirely possible that two techniques that work well separately will not work well together, and, as we will show, even possible that some techniques will work better together than either one does by itself. In this...
Putting It All Together: Language Model Combination
- IN PROCEEDINGS OF IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING
"... In the past several years, a number of different language modeling improvements over simple trigram models have been found, including caching, higher-order n-grams, skipping, modified Kneser-Ney smoothing, and clustering. While all of these techniques have been studied separately, they have rarely b ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
In the past several years, a number of different language modeling improvements over simple trigram models have been found, including caching, higher-order n-grams, skipping, modified Kneser-Ney smoothing, and clustering. While all of these techniques have been studied separately, they have rarely been studied in combination. We find some significant interactions, especially with smoothing techniques. The combination of all techniques leads to up to a 45% perplexity reduction over a Katz smoothed trigram model with no count cuto#s, the highest such perplexity reduction reported.
On the Use of Grammar Based Language Models for Statistical Machine Translation
- 6th Int. Workshop on Parsing Technologies
, 1999
"... In this paper, we describe some concepts of language models beyond the usually used standard trigram and prove the need of such language models for statistical machine translation. In statistical machine translation the language model is the a-priori knowledge source of the system about the target ..."
Abstract
-
Cited by 6 (5 self)
- Add to MetaCart
In this paper, we describe some concepts of language models beyond the usually used standard trigram and prove the need of such language models for statistical machine translation. In statistical machine translation the language model is the a-priori knowledge source of the system about the target language. The most important demands for the language model in statistical machine translation is the correct word order, given a certain choice of words, and to score the selection of translations, that are done by the translation model Pr(f J 1 je I 1 ), in view of the syntactical context. Beside the inquisition of standard m-grams with long histories, we examined the use of Part-of-Speech based models as well as linguistically motivated grammars with stochastic parsing as a special type of language model. Translation results are given on the Verbmobil task, where translation are performed from German to English, with vocabulary sizes of 6500 and 4000 words respectively. 1 Introduct...
Refined Lexicon Models for Statistical Machine Translation using a Maximum Entropy Approach
- In Proc. of ACL-EACL
, 2001
"... Typically, the lexicon models used in statistical machine translation systems do not include any kind of linguistic or contextual information, which often leads to problems in performing a correct word-sense disambiguation. One way to deal with this problem within the statistical framework is ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Typically, the lexicon models used in statistical machine translation systems do not include any kind of linguistic or contextual information, which often leads to problems in performing a correct word-sense disambiguation. One way to deal with this problem within the statistical framework is using maximum entropy methods. In this paper, we present how to use this information within a statistical machine translation system. We show that it is possible to significantly decrease training and test corpus perplexity of the translation models. In addition, we perform a rescoring of N-Best lists using our maximum entropy model and thereby yield an improvement in translation quality. Experimental results are presented with the so called "Vermobil Task".
Log-Linear Interpolation of Language Models
, 2000
"... Building probabilistic models of language is a central task in natural language and speech processing allowing to integrate the syntactic and/or semantic (and recently pragmatic) constraints of the language into the systems. Probabilistic language models are an attractive alternative to the more tra ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Building probabilistic models of language is a central task in natural language and speech processing allowing to integrate the syntactic and/or semantic (and recently pragmatic) constraints of the language into the systems. Probabilistic language models are an attractive alternative to the more traditional rule-based systems, such as context free grammars, because of the recent availability of massive amount of text corpora which can be used to e#ciently train the models and because instead of binary grammaticality judgement o#ered by the rule-based systems, likelihood of any sequence of lexical units can be obtained, which is a crucial factor in such tasks as speech recognition. Probabilistic language models also find their application in part-of-speech tagging, machine translation, semantic disambiguation and numerous other fields.
An Empirical Comparison of the Performance of PPM Variants on a Prediction Task with Monophonic Music
, 2003
"... N-gram models have been employed for a number of musical tasks including the development of practical applications providing computational support for creative individuals as well as theoretical studies of creative processes. Our goal in this research is to evaluate, in an application independent ..."
Abstract
- Add to MetaCart
N-gram models have been employed for a number of musical tasks including the development of practical applications providing computational support for creative individuals as well as theoretical studies of creative processes. Our goal in this research is to evaluate, in an application independent manner, some recent techniques for improving the performance on monophonic music of a subclass of such models based on the Prediction by Partial Match (PPM) algorithm. These techniques include the use of escape method C, interpolated smoothing and unbounded orders. We have applied these techniques incrementally to eight melodic datasets using cross entropy computed by 10-fold cross-validation on each dataset as our performance metric. The results
Structured Language Models for . . .
, 2009
"... Language model plays an important role in statistical machine translation systems. It is the key knowledge source to determine the right word order of the translation. Standard n-gram based language model predicts the next word based on the n − 1 immediate left context. Increasing the order of n and ..."
Abstract
- Add to MetaCart
Language model plays an important role in statistical machine translation systems. It is the key knowledge source to determine the right word order of the translation. Standard n-gram based language model predicts the next word based on the n − 1 immediate left context. Increasing the order of n and the size of the training data improves the performance of the LM as shown by the suffix array language model and distributed language model systems. However, such improvements narrow down very fast after n reaches 6. To improve the n-gram language model, we also developed dynamic n-gram language model adaptation and discriminative language model to tackle issues with the standard n-gram language models and observed improvements in the translation qualities. The fact is that human beings do not reuse long n-grams to create new sentences. Rather, we reuse the structure (grammar) and replace constituents to construct new sentences. Structured language model tries to model the structural information in natural language, especially the long-distance dependencies in a probabilistic framework. However, exploring and using structural information is computationally expensive, as the number of possible structures for a sentence is very large even with the constraint of a grammar. It is difficult to apply parsers on data that is different from the training data of the treebank and parsers are usually hard to scale up. In this

