Results 1 -
7 of
7
An Empirical Study of Smoothing Techniques for Language Modeling
, 1998
"... We present an extensive empirical comparison of several smoothing techniques in the domain of language modeling, including those described by Jelinek and Mercer (1980), Katz (1987), and Church and Gale (1991). We investigate for the first time how factors such as training data size, corpus (e.g., Br ..."
Abstract
-
Cited by 631 (19 self)
- Add to MetaCart
We present an extensive empirical comparison of several smoothing techniques in the domain of language modeling, including those described by Jelinek and Mercer (1980), Katz (1987), and Church and Gale (1991). We investigate for the first time how factors such as training data size, corpus (e.g., Brown versus Wall Street Journal), and n-gram order (bigram versus trigram) affect the relative performance of these methods, which we measure through the cross-entropy of test data. In addition, we introduce two novel smoothing techniques, one a variation of Jelinek-Mercer smoothing and one a very simple linear interpolation technique, both of which outperform existing methods. 1
Bayesian grammar induction for language modeling
- In Proceedings of ACL
, 1995
"... We describe a corpus-based induction algorithm for probabilistic context-free grammars. The algorithm employs a greedy heuristic search within a Bayesian framework, and a post-pass using the Inside-Outside algorithm. We compare the performance of our algorithm to n-gram models and the Inside-Outside ..."
Abstract
-
Cited by 49 (2 self)
- Add to MetaCart
We describe a corpus-based induction algorithm for probabilistic context-free grammars. The algorithm employs a greedy heuristic search within a Bayesian framework, and a post-pass using the Inside-Outside algorithm. We compare the performance of our algorithm to n-gram models and the Inside-Outside algorithm in three language modeling tasks. In two of the tasks, the training data is generated by a probabilistic context-free grammar and in both tasks our algorithm outperforms the other techniques. The third task involves naturally-occurring data, and in this task our algorithm does not perform as well as n-gram models but vastly outperforms the Inside-Outside algorithm. 1
Learning compatibility coefficients for relaxation labeling processes
- IEEE Trans. Pattern Anal. Machine Intell
, 1994
"... Abstract-Relaxation labeling processes have been widely used in many different domains including image processing, pattern recognition, and artificial intelligence. They are iterative procedures that aim at reducing local ambiguities and achieving global consistency through a parallel exploitation o ..."
Abstract
-
Cited by 33 (5 self)
- Add to MetaCart
Abstract-Relaxation labeling processes have been widely used in many different domains including image processing, pattern recognition, and artificial intelligence. They are iterative procedures that aim at reducing local ambiguities and achieving global consistency through a parallel exploitation of contextual information, which is quantitatively expressed in terms of a set of “compatibility coefficients. ” The problem of determining compatibility coefficients has received a considerable attention in the past and many heuristic, statistical-based methods have been suggested. In this paper, we propose a rather different viewpoint to solve this problem: we derive them attempting to optimize the performance of the relaxation algorithm over a sample of training data; no statistical interpretation is given: compatibility coefficients are simply interpreted as real numbers, for which performance is optimal. Experimental results over a novel application of relaxation are given, which prove the effectiveness of the proposed approach. Index Terms- Compatibility coefficients, constraint satisfaction, gradient projection, learning, neural networks, nonlinear
Fertility Models for Statistical Natural Language Understanding
- In Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics
, 1997
"... Several recent efforts in statistical nat- urM language understanding (NLU) have focused on generating clumps of English words from semantic meaning concepts (Miller et al., 1995; Levin and Pieraccini, 1995; Epstein et al., 1996; Epstein, 1996). This paper extends the IBM Ma- chine Translation Group ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
Several recent efforts in statistical nat- urM language understanding (NLU) have focused on generating clumps of English words from semantic meaning concepts (Miller et al., 1995; Levin and Pieraccini, 1995; Epstein et al., 1996; Epstein, 1996). This paper extends the IBM Ma- chine Translation Group's concept of fertil- ity (Brown et al., 1993) to the generation of clumps for natural language understanding. The basic underlying intuition is that a single concept may be expressed in English as many disjoint clump of words. We present two fertility models which attempt to capture this phenomenon. The first is a Poisson model which leads to appealing computational simplicity. The second is a general nonparametric fertility model. The general model's parameters are bootstrapped from the Poisson model and updated by the EM algorithm. These fertility models can be used to impose clump fertility structure on top of preexisting clump generation models. Here, we present resuits for adding fertility structure to unigram, bigram, and headword clump generation models on ARPA's Air Travel Infor- mation Service (ATIS) domain.
Language Models for Machine Translation: Original vs. Translated Texts
"... We investigate the differences between language models compiled from original target-language texts and those compiled from texts manually translated to the target language. Corroborating established observations of Translation Studies, we demonstrate that the latter are significantly better predict ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
We investigate the differences between language models compiled from original target-language texts and those compiled from texts manually translated to the target language. Corroborating established observations of Translation Studies, we demonstrate that the latter are significantly better predictors of translated sentences than the former, and hence fit the reference set better. Furthermore, translated texts yield better language models for statistical machine translation than original texts. 1
Combination Of N-Grams And Stochastic
"... This t)apcr de, scribcs a hybrid t)rol)osal combine n-grams and Stoehast;ie Context-Free Grmnmars (SCFGs) fbr language modeling. A. classical n-grmn model is used to capture the local relations be[ween words, whi]c a stodmstic grammati(:al model is considered to tel)resent the long-t,erm relations b ..."
Abstract
- Add to MetaCart
This t)apcr de, scribcs a hybrid t)rol)osal combine n-grams and Stoehast;ie Context-Free Grmnmars (SCFGs) fbr language modeling. A. classical n-grmn model is used to capture the local relations be[ween words, whi]c a stodmstic grammati(:al model is considered to tel)resent the long-t,erm relations between syntactical st,ru(:tm'es. In order to define t,his grammatical model, which will be used on large-vo(:abulary eomt)lex tasks, a eatcgory-l)ased SCFG and 1)robabilist;ie model of word disLribution in the categories h:we been 1)rol)oscd. M. cthods learning these sochast,ie models for complex tasks are described, and algorithms tbr eoml) uting Lhe word t,ransifion prol)abi]ities are also presented. ]?inally, experiments using the Penn Treebank corpus improved by 3(/% the test; set, l)erplexity wilh regard to tim classical n-gram models.
Modeling Morphologically Rich Languages Using Split Words and Unstructured Dependencies
"... We experiment with splitting words into their stem and suffix components for modeling morphologically rich languages. We show that using a morphological analyzer and disambiguator results in a significant perplexity reduction in Turkish. We present flexible n-gram models, Flex-Grams, which assume th ..."
Abstract
- Add to MetaCart
We experiment with splitting words into their stem and suffix components for modeling morphologically rich languages. We show that using a morphological analyzer and disambiguator results in a significant perplexity reduction in Turkish. We present flexible n-gram models, Flex-Grams, which assume that the n−1 tokens that determine the probability of a given token can be chosen anywhere in the sentence rather than the preceding n − 1 positions. Our final model achieves 27 % perplexity reduction compared to the standard n-gram model. 1

