@MISC{Liang_type-basedmcmc, author = {Percy Liang and Michael I. Jordan and Dan Klein}, title = {Type-Based MCMC}, year = {} }
Bookmark
OpenURL
Abstract
Most existing algorithms for learning latentvariable models—such as EM and existing Gibbs samplers—are token-based, meaning that they update the variables associated with one sentence at a time. The incremental nature of these methods makes them susceptible to local optima/slow mixing. In this paper, we introduce a type-based sampler, which updates a block of variables, identified by a type, which spans multiple sentences. We show improvements on part-of-speech induction, word segmentation, and learning tree-substitution grammars. 1