## Context dependent language model adaptation (2008)

Venue: | In Proc. InterSpeech-2008 |

Citations: | 4 - 1 self |

### BibTeX

@INPROCEEDINGS{Liu08contextdependent,

author = {X. Liu and M. J. F. Gales and P. C. Woodl},

title = {Context dependent language model adaptation},

booktitle = {In Proc. InterSpeech-2008},

year = {2008}

}

### OpenURL

### Abstract

Language models (LMs) are often constructed by building multiple component LMs that are combined using interpolation weights. By tuning these interpolation weights, using either perplexity or discriminative approaches, it is possible to adapt LMs to a particular task. In this work, improved LM adaptation is achieved by introducing context dependent interpolation weights. An important part of this new approach is obtaining robust estimation. Two schemes for this are described. The first is based on MAP estimation, where either global interpolation weights are used as priors, or context dependent interpolation priors obtained from the training data. The second scheme uses class based contexts to determine the interpolation weights. Both schemes are evaluated using unsupervised LM adaptation on a Mandarin broadcast transcription task. Consistent gains in perplexity using context dependent, rather than global, weights are observed as well as reductions in character error rate. 1.

### Citations

727 | Class-based n-gram models of natural language
- Brown, deSouza, et al.
- 1992
(Show Context)
Citation Context ... ) φ(h n−2 i ) else if ∃ φ(h n−2 i ) (8) · · · · · · φ(null) otherwise Class Context Dependent Weights: Class-based n-gram models have been shown to be helpful in addressing the data sparsity problem =-=[1]-=-. Words are clustered into syntactically, semantically or statistically equivalent classes. The intuition is that even if a word n-gram does not occur in the training data, the corresponding class n-g... |

685 | Estimation of probabilities from sparse data for the language model component of a speech recognizer
- Katz
- 1987
(Show Context)
Citation Context ...her than global, weights are observed as well as reductions in character error rate. 1. Introduction Back-off n-gram models remain the dominant language modeling approach for state-of-art ASR systems =-=[4]-=-. Training text corpora are often collected from different sources, having a range of topics and styles. One common way of using these multiple sources is to build an n-gram mixture model and tune the... |

192 | Minimum phone error and I-smoothing for improved discriminative training - POVEY, WOODLAND |

19 |
Discriminative n-gram language modeling,” in Computer Speech and Language, 2007, vol. 21. Vita Lambert Mathias was born in Mumbai, India. He graduated from the University of Mumbai in 2000, with a B.E. in Electronics. After receiving a M.S. in Electrical
- Roark, Saraclar, et al.
(Show Context)
Citation Context ...information [3]. Second, the correlation between perplexity and error rate is well known to be fairly weak for current ASR systems. Hence, it may be useful to use discriminative adaptation techniques =-=[6, 8, 2]-=-. To address these issues, this paper investigates the use of discriminatively trained context dependent interpolation weights for unsupervised LM adaptation. As this dramatically increases the number... |

12 | The CU-HTK Mandarin Broadcast News Transcription System
- Sinha, Gales, et al.
(Show Context)
Citation Context ...ombining the ML weight statistics of the two classes before the merge, as given in (3). 4. Experiments and Results The CU-HTK Mandarin ASR system was used to evaluate various LM adaptation techniques =-=[9]-=-. It comprises an initial lattice generation stage using a 58k word list, interpolated 4-gram word based back-off LM, and adapted MPE acoustic models trained on 942 hours of broadcast speech data. A t... |

9 | Generalized linear interpolation of language models
- Hsu
- 2007
(Show Context)
Citation Context ...to BBN Technologies. The paper does not necessarily reflect the position or the policy of the US Government and no official endorsement should be inferred. parameters by adding contextual information =-=[3]-=-. Second, the correlation between perplexity and error rate is well known to be fairly weak for current ASR systems. Hence, it may be useful to use discriminative adaptation techniques [6, 8, 2]. To a... |

3 |
Improved Clustering Techniques for Class Based Statistical Language Modeling
- Kneser, Ney
- 1993
(Show Context)
Citation Context ...ey issue is how to derive a suitable word to class mapping. An efficient clustering scheme, referred to as exchange algorithm, has been proposed and widely used for standard class based n-gram models =-=[5]-=-. However, this algorithm may not be appropriate for context dependent weights. Therefore, an alternative clustering algorithms is required. The method considered is a maximum likelihood based weight ... |

3 | Discriminative language model adaptation for Mandarin broadcast speech transcription and translation
- Liu, Byrne, et al.
- 2007
(Show Context)
Citation Context ...information [3]. Second, the correlation between perplexity and error rate is well known to be fairly weak for current ASR systems. Hence, it may be useful to use discriminative adaptation techniques =-=[6, 8, 2]-=-. To address these issues, this paper investigates the use of discriminatively trained context dependent interpolation weights for unsupervised LM adaptation. As this dramatically increases the number... |