## Modeling topic and role information in meetings using the hierarchical Dirichlet process (2008)

### Cached

### Download Links

- [www.cstr.inf.ed.ac.uk]
- [www.cstr.ed.ac.uk]
- DBLP

### Other Repositories/Bibliography

Venue: | Proc. of Machine Learning for Multimodal Interaction (MLMI’08 |

Citations: | 2 - 2 self |

### BibTeX

@INPROCEEDINGS{Huang08modelingtopic,

author = {Songfang Huang and Steve Renals},

title = {Modeling topic and role information in meetings using the hierarchical Dirichlet process},

booktitle = {Proc. of Machine Learning for Multimodal Interaction (MLMI’08},

year = {2008}

}

### OpenURL

### Abstract

Abstract. In this paper, we address the modeling of topic and role information in multiparty meetings, via a nonparametric Bayesian model called the hierarchical Dirichlet process. This model provides a powerful solution to topic modeling and a flexible framework for the incorporation of other cues such as speaker role information. We present our modeling framework for topic and role on the AMI Meeting Corpus, and illustrate the effectiveness of the approach in the context of adapting a baseline language model in a large-vocabulary automatic speech recognition system for multiparty meetings. The adapted LM produces significant improvements in terms of both perplexity and word error rate. 1

### Citations

2612 | Latent dirichlet allocation - Blei, Ng, et al. - 2003 |

776 |
A Bayesian analysis of some nonparametric problems
- Ferguson
- 1973
(Show Context)
Citation Context ...t processes in the HDP as priors for topic proportions; second, the priors are arranged in a tree structure. Dirichlet Process. The Dirichlet process (DP) is a stochastic process, first formalised in =-=[17]-=- for general Bayesian modeling, which has become an important prior for nonparametric models. Nonparametric models are characterised by allowing the number of model parameters to grow with the amount ... |

587 | Hierarchical Dirichlet processes
- TEH, JORDAN, et al.
- 2005
(Show Context)
Citation Context ...en what are those cues, and how can we incorporate them into an n-gram LM? To address this question, we here focus on the modeling of topic and role information using a hierarchical Dirichlet process =-=[9]-=-. Consider an augmented n-gram model for ASR, with its context enriched by the inclusion of two cues from meetings: the topic and the speaker role. Unlike role, which could be seen as deterministic in... |

342 |
A constructive definition of dirichlet priors
- Sethuraman
- 1994
(Show Context)
Citation Context ...riori. Draws from a DP are composed as a weighted sum of point masses located at the previous draws θ1,...,θn. This leads to a constructive definition of the DP called the stick-breaking construction =-=[18]-=-: βk ∼ Beta(1,α) ∏ k−1 πk = βk (1 − βk) θ ∗ k ∼ H G = l=1 k=1 Then G ∼ DP(α, H). θ∗ k is a unique value among θ1,...,θn, andδθ∗ k a point mass at θ∗ k ∞∑ πkδθ ∗ k (5) denotes . The construction of π c... |

165 | A neural probabilistic language model
- Bengio, Ducharme, et al.
- 2003
(Show Context)
Citation Context ...oved modeling of word sequences, or on the incorporation of richer knowledge. Approaches which aim to improve on maximum likelihood n-gram models of word sequences include neural network-based models =-=[1]-=-, latent variable models [2], and a Bayesian framework [3,4]. The exploitation of richer knowledge has included the use of morphological information in factored LMs [5], syntactic knowledge using stru... |

94 | Factored language models and generalized parallel backoff
- Bilmes, Kirchhoff
- 2003
(Show Context)
Citation Context ...ude neural network-based models [1], latent variable models [2], and a Bayesian framework [3,4]. The exploitation of richer knowledge has included the use of morphological information in factored LMs =-=[5]-=-, syntactic knowledge using structured LMs [6], and semantic knowledge such as topic information using Bayesian models [7]. In this paper, we investigate language modeling for ASR in multiparty meetin... |

94 | Topic modeling: Beyond bag-ofwords
- Wallach
- 2006
(Show Context)
Citation Context ...r knowledge has included the use of morphological information in factored LMs [5], syntactic knowledge using structured LMs [6], and semantic knowledge such as topic information using Bayesian models =-=[7]-=-. In this paper, we investigate language modeling for ASR in multiparty meetings through the inclusion of richer knowledge in a conventional n-gram language model. We have used the AMI Meeting Corpus ... |

89 | A hierarchical bayesian language model based on pitman-yor processes
- Teh
- 2006
(Show Context)
Citation Context ...f richer knowledge. Approaches which aim to improve on maximum likelihood n-gram models of word sequences include neural network-based models [1], latent variable models [2], and a Bayesian framework =-=[3,4]-=-. The exploitation of richer knowledge has included the use of morphological information in factored LMs [5], syntactic knowledge using structured LMs [6], and semantic knowledge such as topic informa... |

69 | Unleashing the killer corpus: experiences in creating the multi-everything AMI meeting corpus. Language Resources and Evaluation
- Carletta
- 2007
(Show Context)
Citation Context ...In this paper, we investigate language modeling for ASR in multiparty meetings through the inclusion of richer knowledge in a conventional n-gram language model. We have used the AMI Meeting Corpus 1 =-=[8]-=-, which consists of 100 hours of multimodal meeting recordings with comprehensive annotations at a number 1 http://corpus.amiproject.org A. Popescu-Belis and R. Stiefelhagen (Eds.): MLMI 2008, LNCS 52... |

56 |
Language model adaptation using dynamic marginals
- Kneser, Peters, et al.
- 1997
(Show Context)
Citation Context ...α0,G0) are document-dependent and thus are calculated dynamically for each document. For rHDP, the difference is that the topic weights are derived from role DPs, i.e., θd|Grole ∼ DP(α1,Grole). As in =-=[19]-=-, we treat Phdp(w|d) as a dynamic marginal and use the following equation to adapt the baseline n-gram model Pback(w|h) to get an adapted ngram Padapt(w|h), where z(h) is a normalisation factor: k=1 P... |

30 | Collapsed variational inference for HDP
- Teh, Kurihara, et al.
(Show Context)
Citation Context ...ribution over words in the vocabulary, with the vector of probabilities for words in topic k denoted by φ k. In this section, we review two “bag-of-word” models, LDA and the HDP, following Teh et al. =-=[9,15,16]-=-. 2.1 Latent Dirichlet Allocation Latent Dirichlet allocation [10] is a three-level hierarchical Bayesian model, which pioneered the use of the Dirichlet distribution for latent topics. That is, the F... |

25 | Style & topic language model adaptation using HMM-LDA
- Hsu, Glass
(Show Context)
Citation Context ...n the area of combining n-gram models and topic models such as LDA and probabilistic latent semantic analysis (pLSA) for ASR on different data, for example, broadcast news [11,12], lecture recordings =-=[13]-=-, and Japanese meetings [14]. The new ideas we exploit in this work cover the following aspects. First, we use the nonparametric HDP for topic modeling to adapt n-gram LMs. Second, we consider sequent... |

12 | Unsupervised language model adaptation for Mandarin Broadcast Conversation transcription
- Mrva, Woodland
- 1961
(Show Context)
Citation Context ...revious work has been done in the area of combining n-gram models and topic models such as LDA and probabilistic latent semantic analysis (pLSA) for ASR on different data, for example, broadcast news =-=[11,12]-=-, lecture recordings [13], and Japanese meetings [14]. The new ideas we exploit in this work cover the following aspects. First, we use the nonparametric HDP for topic modeling to adapt n-gram LMs. Se... |

11 | Distributed latent variable models of lexical co-occurrences
- Blitzer, Globerson, et al.
(Show Context)
Citation Context ...ces, or on the incorporation of richer knowledge. Approaches which aim to improve on maximum likelihood n-gram models of word sequences include neural network-based models [1], latent variable models =-=[2]-=-, and a Bayesian framework [3,4]. The exploitation of richer knowledge has included the use of morphological information in factored LMs [5], syntactic knowledge using structured LMs [6], and semantic... |

11 |
Training connectionist models for the structured language model
- Xu, Emami, et al.
- 2003
(Show Context)
Citation Context ...riable models [2], and a Bayesian framework [3,4]. The exploitation of richer knowledge has included the use of morphological information in factored LMs [5], syntactic knowledge using structured LMs =-=[6]-=-, and semantic knowledge such as topic information using Bayesian models [7]. In this paper, we investigate language modeling for ASR in multiparty meetings through the inclusion of richer knowledge i... |

8 | Hierarchical Pitman-Yor language models for ASR in meetings
- Huang, Renals
- 2007
(Show Context)
Citation Context ...f richer knowledge. Approaches which aim to improve on maximum likelihood n-gram models of word sequences include neural network-based models [1], latent variable models [2], and a Bayesian framework =-=[3,4]-=-. The exploitation of richer knowledge has included the use of morphological information in factored LMs [5], syntactic knowledge using structured LMs [6], and semantic knowledge such as topic informa... |

5 | T.: PLSA-based topic detection in meetings for adaptation of lexicon and language model
- Akita, Nemoto, et al.
(Show Context)
Citation Context ...am models and topic models such as LDA and probabilistic latent semantic analysis (pLSA) for ASR on different data, for example, broadcast news [11,12], lecture recordings [13], and Japanese meetings =-=[14]-=-. The new ideas we exploit in this work cover the following aspects. First, we use the nonparametric HDP for topic modeling to adapt n-gram LMs. Second, we consider sequential topic modeling, and defi... |

2 |
T.: Unsupervised LM adaptation using latent semantic marginals
- Tam, Schultz
(Show Context)
Citation Context ...revious work has been done in the area of combining n-gram models and topic models such as LDA and probabilistic latent semantic analysis (pLSA) for ASR on different data, for example, broadcast news =-=[11,12]-=-, lecture recordings [13], and Japanese meetings [14]. The new ideas we exploit in this work cover the following aspects. First, we use the nonparametric HDP for topic modeling to adapt n-gram LMs. Se... |

2 |
et al: “The AMI System for the Transcription of Speech in Meetings
- Hain
- 2007
(Show Context)
Citation Context ...ation. 4.2 ASR Experiment Finally, we investigated the effectiveness of the adapted LMs based on topic and role information from meetings on a practical large vocabulary ASR system. The AMIASR system =-=[20]-=- was used as the baseline system. We began from the lattices for the whole AMI Meeting Corpus, generated by the AMIASR system using a trigram LM trained on a large set of data coming from Fisher, Hub4... |