## Unsupervised language model adaptation using latent semantic marginals (2006)

Venue: | In Proc. of Interspeech |

Citations: | 13 - 3 self |

### BibTeX

@INPROCEEDINGS{Tam06unsupervisedlanguage,

author = {Yik-cheung Tam and Tanja Schultz},

title = {Unsupervised language model adaptation using latent semantic marginals},

booktitle = {In Proc. of Interspeech},

year = {2006}

}

### OpenURL

### Abstract

We integrated the Latent Dirichlet Allocation (LDA) approach, a latent semantic analysis model, into unsupervised language model adaptation framework. We adapted a background language model by minimizing the Kullback-Leibler divergence between the adapted model and the background model subject to a constraint that the marginalized unigram probability distribution of the adapted model is equal to the corresponding distribution estimated by the LDA model – the latent semantic marginals. We evaluated our approach on the RT04 Mandarin Broadcast News test set and experimented with different LM training settings. Results showed that our approach reduces the perplexity and the character error rates using supervised and unsupervised adaptation.

### Citations

2577 | Latent Dirichlet allocation
- Blei, Ng, et al.
- 2003
(Show Context)
Citation Context ...t smoothed out properly in estimating the dynamic marginals based on relative word frequency. In this paper, we revisit their approach but we propose using the Latent Dirichlet Allocation (LDA) model =-=[3]-=-, a Bayesian latent semantic analysis approach, to estimate the dynamic marginals based on automatic transcription. As a latent semantic model, the LDA model contains a set of unigram LM each of which... |

856 | An introduction to variational methods for graphical models
- Jordan, Ghahramani, et al.
- 1999
(Show Context)
Citation Context ... = Eq[log f(θ, wn 1 ,z n 1 ;Λ) q(θ, zn 1 ;Γ) ] (3) where q(θ, z n 1 ) is an approximate posterior distribution over all the latent variables given an observed document. In Variational Bayes inference =-=[10]-=-, the distribution is factorizable and parameterized by Γ: q(θ, z n 1 ;Γ) = q(θ; {γk}) · i=1 nY q(zi) (4) where q(θ; {γk}) is a Dirichlet distribution over topic mixture weights parameterized by the “... |

847 | Probabilistic latent semantic indexing
- Hofmann
- 1999
(Show Context)
Citation Context ...h the latent topics are consistent. Various LSA techniques has been proposed and applied across different research fields, such as SVD-based LSI [6] and its extension [7], pLSI using the EM algorithm =-=[8]-=-, Latent Dirichlet Allocation [3] and its extension [9] to model the correlation among topics. The LDA model is a Bayesian extension of a mixture of unigram models where a vector of topic mixture weig... |

801 | SRILM - An Extensible Language Modelling Toolkit
- Stolcke
- 2002
(Show Context)
Citation Context ... rates (CER) evaluated on the RT04 test set containing three episodes: CCTV, RFA and NTDTV. We trained the background 4-gram LM using the modified Kneser-Ney smoothing scheme using the SRI LM toolkit =-=[13]-=-. We trained the LDA model with 200 topics which was found optimal from our previous experience. The LM adaptation procedure is to first perform first-pass decoding on the test audios to obtain the au... |

423 | Dynamic topic models
- Blei, Lafferty
- 2006
(Show Context)
Citation Context ...ues has been proposed and applied across different research fields, such as SVD-based LSI [6] and its extension [7], pLSI using the EM algorithm [8], Latent Dirichlet Allocation [3] and its extension =-=[9]-=- to model the correlation among topics. The LDA model is a Bayesian extension of a mixture of unigram models where a vector of topic mixture weights θ is drawn from a prior Dirichlet distribution: f(θ... |

183 | Adaptive Statistical Language Modeling: A Maximum Entropy Approach
- Rosenfeld
- 1994
(Show Context)
Citation Context ...to the marginalization constraints for each word w in the vocabulary: X Pra(w|h) · Pra(h) = Prlda(w) ∀w (11) h The constraint optimization problem has close connection to the maximum entropy approach =-=[11]-=-. It turns out that the form of the adapted model is a rescaled version of the background LM: Pra(w|h) = α(w) · Prbg(w|h) Z(h) (12) where Z(h) is a normalization term to guarantee that the probability... |

78 | Exploiting latent semantic information in statistical language modeling
- Bellegarda
- 2000
(Show Context)
Citation Context ...ent usually refers to a piece of news story within which the latent topics are consistent. Various LSA techniques has been proposed and applied across different research fields, such as SVD-based LSI =-=[6]-=- and its extension [7], pLSI using the EM algorithm [8], Latent Dirichlet Allocation [3] and its extension [9] to model the correlation among topics. The LDA model is a Bayesian extension of a mixture... |

56 |
Language model adaptation using dynamic marginals
- Kneser, Peters, et al.
- 1997
(Show Context)
Citation Context ...mportant since it is undesirable to reinforce the errors back to the background LM after adaptation. Different LM adaptation techniques have been proposed in the literature. One technique proposed in =-=[1]-=- attempts to adapt the background LM by minimizing the Kullback-Leibler divergence between the adapted LM and the background LM subject to a constraint that the marginalized unigram distribution of th... |

42 | Adaptive language modeling using minimum discriminant estimation
- Pietra, Pietra, et al.
- 1992
(Show Context)
Citation Context ...ution of the adapted LM is equal to some unigram distribution which is estimated using an in-domain text data. They called the latter as “dynamic marginals”. Similar idea was also proposed earlier in =-=[2]-=-. The approach was shown to reduce the perplexity and the recognition errors successfully when in-domain supervised text data were available for LM adaptation. However, they [1] reported degradation o... |

16 |
Language model adaptation using variational Bayes inference
- Tam, Schultz
- 2005
(Show Context)
Citation Context ...c marginals based on automatic transcription. As a latent semantic model, the LDA model contains a set of unigram LM each of which describes a word distribution of a latent topic. In our earlier work =-=[4]-=-, we successfully applied the LDA model into unsupervised LM adaptation by interpolating the background LM with the dynamic unigram LM estimated by the LDA incrementally. In this paper, we propose usi... |

4 | Language model adaptation through topic decomposition and MDI estimation
- Federico
- 2002
(Show Context)
Citation Context ...e weights instead of directly boosting the probability of misrecognized words in the automatic transcription. Similar approach has been explored using probabilistic Latent Semantic Analysis (pLSA) in =-=[5]-=- but on an supervised setting where short text descriptions of the test audios were utilized. We employed the LDA model which provides regularization over the pLSA model due to the Bayesian nature of ... |

3 |
Latent semantic mapping: dimensionality reduction via globally opti
- Bellegarda
(Show Context)
Citation Context ...a piece of news story within which the latent topics are consistent. Various LSA techniques has been proposed and applied across different research fields, such as SVD-based LSI [6] and its extension =-=[7]-=-, pLSI using the EM algorithm [8], Latent Dirichlet Allocation [3] and its extension [9] to model the correlation among topics. The LDA model is a Bayesian extension of a mixture of unigram models whe... |