Results 1  10
of
21
An Empirical Study of Smoothing Techniques for Language Modeling
, 1998
"... We present an extensive empirical comparison of several smoothing techniques in the domain of language modeling, including those described by Jelinek and Mercer (1980), Katz (1987), and Church and Gale (1991). We investigate for the first time how factors such as training data size, corpus (e.g., Br ..."
Abstract

Cited by 850 (20 self)
 Add to MetaCart
We present an extensive empirical comparison of several smoothing techniques in the domain of language modeling, including those described by Jelinek and Mercer (1980), Katz (1987), and Church and Gale (1991). We investigate for the first time how factors such as training data size, corpus (e.g., Brown versus Wall Street Journal), and ngram order (bigram versus trigram) affect the relative performance of these methods, which we measure through the crossentropy of test data. In addition, we introduce two novel smoothing techniques, one a variation of JelinekMercer smoothing and one a very simple linear interpolation technique, both of which outperform existing methods. 1
Improving Text Classification by Shrinkage in a Hierarchy of Classes
, 1998
"... When documents are organized in a large number of topic categories, the categories are often arranged in a hierarchy. The U.S. patent database and Yahoo are two examples. ..."
Abstract

Cited by 235 (5 self)
 Add to MetaCart
When documents are organized in a large number of topic categories, the categories are often arranged in a hierarchy. The U.S. patent database and Yahoo are two examples.
A hierarchical Bayesian language model based on Pitman–Yor processes
 In Coling/ACL, 2006. 9
, 2006
"... We propose a new hierarchical Bayesian ngram model of natural languages. Our model makes use of a generalization of the commonly used Dirichlet distributions called PitmanYor processes which produce powerlaw distributions more closely resembling those in natural languages. We show that an approxi ..."
Abstract

Cited by 78 (8 self)
 Add to MetaCart
We propose a new hierarchical Bayesian ngram model of natural languages. Our model makes use of a generalization of the commonly used Dirichlet distributions called PitmanYor processes which produce powerlaw distributions more closely resembling those in natural languages. We show that an approximation to the hierarchical PitmanYor language model recovers the exact formulation of interpolated KneserNey, one of the best smoothing methods for ngram language models. Experiments verify that our model gives cross entropy results superior to interpolated KneserNey and comparable to modified KneserNey. 1
A bayesian framework for word segmentation: Exploring the effects of context
 In 46th Annual Meeting of the ACL
, 2009
"... Since the experiments of Saffran et al. (1996a), there has been a great deal of interest in the question of how statistical regularities in the speech stream might be used by infants to begin to identify individual words. In this work, we use computational modeling to explore the effects of differen ..."
Abstract

Cited by 50 (11 self)
 Add to MetaCart
Since the experiments of Saffran et al. (1996a), there has been a great deal of interest in the question of how statistical regularities in the speech stream might be used by infants to begin to identify individual words. In this work, we use computational modeling to explore the effects of different assumptions the learner might make regarding the nature of words – in particular, how these assumptions affect the kinds of words that are segmented from a corpus of transcribed childdirected speech. We develop several models within a Bayesian ideal observer framework, and use them to examine the consequences of assuming either that words are independent units, or units that help to predict other units. We show through empirical and theoretical results that the assumption of independence causes the learner to undersegment the corpus, with many two and threeword sequences (e.g. what’s that, do you, in the house) misidentified as individual words. In contrast, when the learner assumes that words are predictive, the resulting segmentation is far more accurate. These results indicate that taking context into account is important for a statistical word segmentation strategy to be successful, and raise the possibility that even young infants may be able to exploit more subtle statistical patterns than have usually been considered. 1
Ensemble learning for independent component analysis
 in Advances in Independent Component Analysis
, 2000
"... i Abstract This thesis is concerned with the problem of Blind Source Separation. Specifically we considerthe Independent Component Analysis (ICA) model in which a set of observations are modelled by xt = Ast: (1) where A is an unknown mixing matrix and st is a vector of hidden source components atti ..."
Abstract

Cited by 49 (2 self)
 Add to MetaCart
i Abstract This thesis is concerned with the problem of Blind Source Separation. Specifically we considerthe Independent Component Analysis (ICA) model in which a set of observations are modelled by xt = Ast: (1) where A is an unknown mixing matrix and st is a vector of hidden source components attime t. The ICA problem is to find the sources given only a set of observations. In chapter 1, the blind source separation problem is introduced. In chapter 2 the methodof Ensemble Learning is explained. Chapter 3 applies Ensemble Learning to the ICA model and chapter 4 assesses the use of Ensemble Learning for model selection.Chapters 57 apply the Ensemble Learning ICA algorithm to data sets from physics (a medical imaging data set consisting of images of a tooth), biology (data sets from cDNAmicroarrays) and astrophysics (Planck image separation and galaxy spectra separation).
The revisiting problem in mobile robot map building: A hierarchical Bayesian approach
 In Proc. of the Conference on Uncertainty in Artificial Intelligence (UAI
, 2003
"... We present an application of hierarchical Bayesian estimation to robot map building. The revisiting problem occurs when a robot has to decide whether it is seeing a previouslybuilt portion of a map, or is exploring new territory. This is a difficult decision problem, requiring the probability of be ..."
Abstract

Cited by 26 (4 self)
 Add to MetaCart
We present an application of hierarchical Bayesian estimation to robot map building. The revisiting problem occurs when a robot has to decide whether it is seeing a previouslybuilt portion of a map, or is exploring new territory. This is a difficult decision problem, requiring the probability of being outside of the current known map. To estimate this probability, we model the structure of a ”typical ” environment as a hidden Markov model that generates sequences of views observed by a robot navigating through the environment. A Dirichlet prior over structural models is learned from previously explored environments. Whenever a robot explores a new environment, the posterior over the model is estimated by Dirichlet hyperparameters. Our approach is implemented and tested in the context of multirobot map merging, a particularly difficult instance of the revisiting problem. Experiments with robot data show that the technique yields strong improvements over alternative methods. 1
Choice of Basis for Laplace Approximation
 Machine Learning
, 1998
"... Maximum a posterJori optimization of parameters and the Laplace approximation for the marginal likelihood are both basisdependent methods. This note compares two choices of basis for models parameterized by probabilities, showing that it is possible to improve on the traditional choice, the prob ..."
Abstract

Cited by 23 (1 self)
 Add to MetaCart
Maximum a posterJori optimization of parameters and the Laplace approximation for the marginal likelihood are both basisdependent methods. This note compares two choices of basis for models parameterized by probabilities, showing that it is possible to improve on the traditional choice, the probability simplex, by transforming to the softmax' basis.
A bayesian interpretation of interpolated kneserney
, 2006
"... Interpolated KneserNey is one of the best smoothing methods for ngram language models. Previous explanations for its superiority have been based on intuitive and empirical justifications of specific properties of the method. We propose a novel interpretation of interpolated KneserNey as approxima ..."
Abstract

Cited by 16 (2 self)
 Add to MetaCart
Interpolated KneserNey is one of the best smoothing methods for ngram language models. Previous explanations for its superiority have been based on intuitive and empirical justifications of specific properties of the method. We propose a novel interpretation of interpolated KneserNey as approximate inference in a hierarchical Bayesian model consisting of PitmanYor processes. As opposed to past explanations, our interpretation can recover exactly the formulation of interpolated KneserNey, and performs better than interpolated KneserNey when a better inference procedure is used. 1
A Hierarchical Bayesian Approach to the Revisiting Problem in Mobile Robot Map Building
 In Proc. of the Int. Symposium of Robotics Research (ISRR
, 2003
"... We present an application of hierarchical Bayesian estimation to robot map building. The revisiting problem occurs when a robot has to decide whether it is seeing a previouslybuilt portion of a map, or is exploring new territory. This is a difficult decision problem, requiring the probability of bei ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
We present an application of hierarchical Bayesian estimation to robot map building. The revisiting problem occurs when a robot has to decide whether it is seeing a previouslybuilt portion of a map, or is exploring new territory. This is a difficult decision problem, requiring the probability of being outside of the current known map. To estimate this probability, we model the structure of a "typical" environment as a hidden Markov model that generates sequences of views observed by a robot navigating through the environment. A Dirichlet prior over structural models is learned from previously explored environments. Whenever a robot explores a new environment, the posterior over the model is estimated using Dirichlet hyperparameters. Our approach is implemented and tested in the context of multirobot map merging, a particularly difficult instance of the revisiting problem. Experiments with robot data show that the technique yields strong improvements over alternative methods.
Weakly supervised partofspeech tagging for morphologicallyrich, resourcescarce languages
 In Proceedings of EACL
, 2009
"... This paper examines unsupervised approaches to partofspeech (POS) tagging for morphologicallyrich, resourcescarce languages, with an emphasis on Goldwater and Griffiths’s (2007) fullyBayesian approach originally developed for English POS tagging. We argue that existing unsupervised POS taggers ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
This paper examines unsupervised approaches to partofspeech (POS) tagging for morphologicallyrich, resourcescarce languages, with an emphasis on Goldwater and Griffiths’s (2007) fullyBayesian approach originally developed for English POS tagging. We argue that existing unsupervised POS taggers unrealistically assume as input a perfect POS lexicon, and consequently, we propose a weakly supervised fullyBayesian approach to POS tagging, which relaxes the unrealistic assumption by automatically acquiring the lexicon from a small amount of POStagged data. Since such relaxation comes at the expense of a drop in tagging accuracy, we propose two extensions to the Bayesian framework and demonstrate that they are effective in improving a fullyBayesian POS tagger for Bengali, our representative morphologicallyrich, resourcescarce language. 1