@MISC{Andreas97hub4language, author = {Fuliang Weng Andreas}, title = {Hub4 Language Modeling Using Domain Interpolation and Data Clustering}, year = {1997} }
Bookmark
OpenURL
Abstract
In SRI's language modeling experiments for the Hub4 domain, three basic approaches were pursued: interpolating multiple models estimated from Hub4 and non-Hub4 training data, adapting the language model (LM) to the focus conditions, and adapting the LM to different topic types. In the first approach, we built separate LMs for the closely transcribed Hub4 material (acoustic training transcripts) and the loosely transcribed Hub4 material (LM training data), as well as the NorthAmerican Business News (NABN) and Switchboard training data, projected onto the Hub4 vocabulary. By interpolating the probabilities obtained from these models, we obtained a 20% reduction in perplexity and a 1.8% reduction in word error rate, compared to a baseline Hub4-only language model. Two adaptation approaches are also described: adapting language models to the speech styles correlated with different focus conditions, and building cluster-specific LM mixtures. These two approaches give some reduction in per...