## Why Doesn’t EM Find Good HMM POS-Taggers (2007)

Venue: In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL). Prague, Czech Republic: Association for Computational Linguistics

Citations: | 26 - 2 self |

@INPROCEEDINGS{Johnson07whydoesn’t,

author = {Mark Johnson},

title = {Why Doesn’t EM Find Good HMM POS-Taggers},

booktitle = {In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL). Prague, Czech Republic: Association for Computational Linguistics},

year = {2007},

pages = {296--305}

}

This paper investigates why the HMMs es-timated by Expectation-Maximization (EM) produce such poor results as Part-of-Speech (POS) taggers. We find that the HMMs es-timated by EM generally assign a roughly equal number of word tokens to each hid-den state, while the empirical distribution of tokens to POS tags is highly skewed. This motivates a Bayesian approach using a sparse prior to bias the estimator toward such a skewed distribution. We investigate Gibbs Sampling (GS) and Variational Bayes (VB) estimators and show that VB con-verges faster than GS for this task and that VB significantly improves 1-to-1 tagging ac-curacy over EM.We also show that EM does nearly as well as VB when the number of hidden HMM states is dramatically reduced. We also point out the high variance in all of these estimators, and that they require many more iterations to approach conver-gence than usually thought. 1

