Results 1 
2 of
2
A fully bayesian approach to unsupervised partofspeech tagging
 In ACL
, 2007
"... Unsupervised learning of linguistic structure is a difficult problem. A common approach is to define a generative model and maximize the probability of the hidden structure given the observed data. Typically, this is done using maximumlikelihood estimation (MLE) of the model parameters. We show usi ..."
Abstract

Cited by 114 (0 self)
 Add to MetaCart
Unsupervised learning of linguistic structure is a difficult problem. A common approach is to define a generative model and maximize the probability of the hidden structure given the observed data. Typically, this is done using maximumlikelihood estimation (MLE) of the model parameters. We show using partofspeech tagging that a fully Bayesian approach can greatly improve performance. Rather than estimating a single set of parameters, the Bayesian approach integrates over all possible parameter values. This difference ensures that the learned structure will have high probability over a range of possible parameters, and permits the use of priors favoring the sparse distributions that are typical of natural language. Our model has the structure of a standard trigram HMM, yet its accuracy is closer to that of a stateoftheart discriminative model (Smith and Eisner, 2005), up to 14 percentage points better than MLE. We find improvements both when training from data alone, and using a tagging dictionary. 1
Representational bias in unsupervised learning of syllable structure
 In Proceedings of the 9th Conference on Computational Natural Language Learning (CoNLL), Ann Arbor
, 2005
"... Unsupervised learning algorithms based on Expectation Maximization (EM) are often straightforward to implement and provably converge on a local likelihood maximum. However, these algorithms often do not perform well in practice. Common wisdom holds that they yield poor results because they are overl ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
Unsupervised learning algorithms based on Expectation Maximization (EM) are often straightforward to implement and provably converge on a local likelihood maximum. However, these algorithms often do not perform well in practice. Common wisdom holds that they yield poor results because they are overly sensitive to initial parameter values and easily get stuck in local (but not global) maxima. We present a series of experiments indicating that for the task of learning syllable structure, the initial parameter weights are not crucial. Rather, it is the choice of model class itself that makes the difference between successful and unsuccessful learning. We use a languageuniversal rulebased algorithm to find a good set of parameters, and then train the parameter weights using EM. We achieve word accuracy of 95.9 % on German and 97.1 % on English, as compared to 97.4 % and 98.1% respectively for supervised training. 1