Rapid Unsupervised Topic . . . (2009)
BibTeX
@MISC{Tam09rapidunsupervised,
author = {Yik-cheung Tam},
title = {Rapid Unsupervised Topic . . . },
year = {2009}
}
OpenURL
Abstract
In open-domain language exploitation applications, a wide variety of topics with swift topic shifts has to be captured. Consequently, it is crucial to rapidly adapt all language components of a spoken language system. This thesis addresses unsupervised topic adaptation in both monolingual and crosslingual settings. For automatic speech recognition we rapidly adapt a language model on a source language. For statistical machine translation, we adapt a language model of a target language, a translation lexicon and a phrase table using a source text. For monolingual adaptation, we propose latent Dirichlet-Tree allocation for Bayesian latent semantic analysis. Our model enables rapid incremental language model adaptation via caching the fractional topic counts of word hypotheses decoded from previous speech utterances. Latent Dirichlet-Tree allocation models topic correlation in a tree-based hierarchy and thus addresses the model initialization issue. To address the “bag-of-word” assumption in latent semantic analysis, we extend our approach to N-gram latent Dirichlet-Tree allocation. We investigate a fractional Kneser-Ney smoothing approach to handle







