Results 1 -
1 of
1
N-gram weighting: Reducing training data mismatch in cross-domain language model estimation
- in Proc. EMNLP
, 2008
"... In domains with insufficient matched training data, language models are often constructed by interpolating component models trained from partially matched corpora. Since the n-grams from such corpora may not be of equal relevance to the target domain, we propose an n-gram weighting technique to adju ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
In domains with insufficient matched training data, language models are often constructed by interpolating component models trained from partially matched corpora. Since the n-grams from such corpora may not be of equal relevance to the target domain, we propose an n-gram weighting technique to adjust the component n-gram probabilities based on features derived from readily available segmentation and metadata information for each corpus. Using a log-linear combination of such features, the resulting model achieves up to a 1.2 % absolute word error rate reduction over a linearly interpolated baseline language model on a lecture transcription task. 1

