Results 1 - 10
of
32
Model Adaptation via Model Interpolation and Boosting for Web Search Ranking
"... This paper explores two classes of model adaptation methods for Web search ranking: Model Interpolation and error-driven learning approaches based on a boosting algorithm. The results show that model interpolation, though simple, achieves the best results on all the open test sets where the test dat ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
This paper explores two classes of model adaptation methods for Web search ranking: Model Interpolation and error-driven learning approaches based on a boosting algorithm. The results show that model interpolation, though simple, achieves the best results on all the open test sets where the test data is very different from the training data. The tree-based boosting algorithm achieves the best performance on most of the closed test sets where the test data and the training data are similar, but its performance drops significantly on the open test sets due to the instability of trees. Several methods are explored to improve the robustness of the algorithm, with limited success. 1
Unsupervised language model adaptation for meeting recognition
- in Proc. ICASSP
, 2007
"... We present an application of unsupervised language model (LM) adaptation to meeting recognition, in a scenario where sequences of multiparty meetings on related topics are to be recognized, but no prior in-domain data for LM training is available. The recognizer LMs are adapted according to the reco ..."
Abstract
-
Cited by 11 (4 self)
- Add to MetaCart
We present an application of unsupervised language model (LM) adaptation to meeting recognition, in a scenario where sequences of multiparty meetings on related topics are to be recognized, but no prior in-domain data for LM training is available. The recognizer LMs are adapted according to the recognition output on temporally preceding meetings, either in speaker-dependent or speakerindependent mode. Model adaptation is carried out by interpolating then-gram probabilities of a large generic LM with those of a small LM estimated from the adaptation data, and minimizing perplexity on the automatic transcripts of a separate meeting set, also previously recognized. The adapted LMs yield about 5-9 % relative reduction in word error compared to the baseline. This improvement is about half of what can be achieved with supervised adaptation, i.e., using human-generated speech transcripts. Index Terms — speech processing, language modeling, meeting recognition, unsupervised adaptation
Stream-based randomised language models for smt
- In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing
, 2009
"... Randomised techniques allow very big language models to be represented succinctly. However, being batch-based they are unsuitable for modelling an unbounded stream of language whilst maintaining a constant error rate. We present a novel randomised language model which uses an online perfect hash fun ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
Randomised techniques allow very big language models to be represented succinctly. However, being batch-based they are unsuitable for modelling an unbounded stream of language whilst maintaining a constant error rate. We present a novel randomised language model which uses an online perfect hash function to efficiently deal with unbounded text streams. Translation experiments over a text stream show that our online randomised model matches the performance of batch-based LMs without incurring the computational overhead associated with full retraining. This opens up the possibility of randomised language models which continuously adapt to the massive volumes of texts published on the Web each day. 1
Shrinking exponential language models
- In Proc. of HLT-NAACL
, 2009
"... In (Chen, 2009), we show that for a variety of language models belonging to the exponential family, the test set cross-entropy of a model can be accurately predicted from its training set cross-entropy and its parameter values. In this work, we show how this relationship can be used to motivate two ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
In (Chen, 2009), we show that for a variety of language models belonging to the exponential family, the test set cross-entropy of a model can be accurately predicted from its training set cross-entropy and its parameter values. In this work, we show how this relationship can be used to motivate two heuristics for “shrinking ” the size of a language model to improve its performance. We use the first heuristic to develop a novel class-based language model that outperforms a baseline word trigram model by 28 % in perplexity and 1.9% absolute in speech recognition word-error rate on Wall Street Journal data. We use the second heuristic to motivate a regularized version of minimum discrimination information models and show that this method outperforms other techniques for domain adaptation. 1
Performance Prediction for Exponential Language Models
"... We investigate the task of performance prediction for language models belonging to the exponential family. First, we attempt to empirically discover a formula for predicting test set cross-entropy for n-gram language models. We build models over varying domains, data set sizes, and n-gram orders, an ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
We investigate the task of performance prediction for language models belonging to the exponential family. First, we attempt to empirically discover a formula for predicting test set cross-entropy for n-gram language models. We build models over varying domains, data set sizes, and n-gram orders, and perform linear regression to see whether we can model test set performance as a simple function of training set performance and various model statistics. Remarkably, we find a simple relationship that predicts test set performance with a correlation of 0.9997. We analyze why this relationship holds and show that it holds for other exponential language models as well, including class-based models and minimum discrimination information models. Finally, we discuss how this relationship can be applied to improve language model performance. 1
A discriminative HMM/n-gram-based retrieval approach for Mandarin spoken documents
- ACM Transactions on Asian Language Information Processing
, 2004
"... Statistical modeling approaches have been steadily gaining popularity in the field of information retrieval in recent years. This paper presents an HMM/N-gram-based retrieval approach for Mandarin spoken documents. The underlying characteristics and different structures of this approach were extensi ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
Statistical modeling approaches have been steadily gaining popularity in the field of information retrieval in recent years. This paper presents an HMM/N-gram-based retrieval approach for Mandarin spoken documents. The underlying characteristics and different structures of this approach were extensively investigated and analyzed. The retrieval capabilities were verified by tests with indexing features of word- and syllable-levels and comparison with the conventional vector space model approach. To further improve the discrimination capabilities of the HMMs, both the expectation-maximization (EM) and minimum classification error (MCE) training algorithms were introduced in training. The information fusion of indexing features of word- and syllable-levels was also investigated. The spoken document retrieval experiments were performed on the Topic Detection and Tracking Corpora (TDT-2 and TDT-3). Very encouraging retrieval performance was obtained. 1.
2009b. Predicting concept types in user corrections in dialog
- In Proceedings of the EACL Workshop on the Semantic Representation of Spoken Language
"... Most dialog systems explicitly confirm user-provided task-relevant concepts. User responses to these system confirmations (e.g. corrections, topic changes) may be misrecognized because they contain unrequested task-related concepts. In this paper, we propose a concept-specific language model adaptat ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
Most dialog systems explicitly confirm user-provided task-relevant concepts. User responses to these system confirmations (e.g. corrections, topic changes) may be misrecognized because they contain unrequested task-related concepts. In this paper, we propose a concept-specific language model adaptation strategy where the language model (LM) is adapted to the concept type(s) actually present in the user’s post-confirmation utterance. We evaluate concept type classification and LM adaptation for post-confirmation utterances in the Let’s Go! dialog system. We achieve 93 % accuracy on concept type classification using acoustic, lexical and dialog history features. We also show that the use of concept type classification for LM adaptation can lead to improvements in speech recognition performance. 1
2009. Efficacy of a constantly adaptive language model technique for web-scale applications
- In Proc. ICASSP-2009
"... In this paper, we describe CALM, a method for building statistical language models for the Web. CALM addresses several unique challenges dealing with the Web contents. First, CALM does not rely on the whole corpus to be available to build the language model. Instead, we design CALM to progressively ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
In this paper, we describe CALM, a method for building statistical language models for the Web. CALM addresses several unique challenges dealing with the Web contents. First, CALM does not rely on the whole corpus to be available to build the language model. Instead, we design CALM to progressively adapt itself as Web chunks are made available by the crawler. Second, given the dynamic and dramatic changes in the Web contents, CALM is designed to quickly enrich its lexicon and N-grams as new vocabulary and phrases are discovered. To reduce the amount of heuristics and human interventions typically needed for model adaptation, we derive an information theoretical formula for CALM to facilitate the optimal adaptation in the maximum a posteriori (MAP) sense. Testing against a collection of Web chunks where new vocabulary and phrases are dominant, we show CALM can achieve comparable and satisfactory model measured in perplexity. We also show CALM is robust against over training and the initial condition, suggesting that any assumptions made in obtaining the initial model can gradually see their impacts diminished as CALM runs its full course and adapt to more data.
A Hierarchical Nonparametric Bayesian Approach to Statistical Language Model Domain Adaptation
"... In this paper we present a doubly hierarchical Pitman-Yor process language model. Its bottom layer of hierarchy consists of multiple hierarchical Pitman-Yor process language models, one each for some number of domains. The novel top layer of hierarchy consists of a mechanism to couple together multi ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
In this paper we present a doubly hierarchical Pitman-Yor process language model. Its bottom layer of hierarchy consists of multiple hierarchical Pitman-Yor process language models, one each for some number of domains. The novel top layer of hierarchy consists of a mechanism to couple together multiple language models such that they share statistical strength. Intuitively this sharing results in the “adaptation ” of a latent shared language model to each domain. We introduce a general formalism capable of describing the overall model which we call the graphical Pitman-Yor process and explain how to perform Bayesian inference in it. We present encouraging language model domain adaptation results that both illustrate the potential benefits of our new model and suggest new avenues of inquiry. 1
Speaker adaptation of language models for automatic dialog act segmentation of meetings
- IN: PROC. INTERSPEECH 2007
, 2007
"... Dialog act (DA) segmentation in meeting speech is important for meeting understanding. In this paper, we explore speaker adaptation of hidden event language models (LMs) for DA segmentation using the ICSI Meeting Corpus. Speaker adaptation is performed using a linear combination of the generic speak ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Dialog act (DA) segmentation in meeting speech is important for meeting understanding. In this paper, we explore speaker adaptation of hidden event language models (LMs) for DA segmentation using the ICSI Meeting Corpus. Speaker adaptation is performed using a linear combination of the generic speakerindependent LM and an LM trained on only the data from individual speakers. We test the method on 20 frequent speakers, on both reference word transcripts and the output of automatic speech recognition. Results indicate improvements for 17 speakers on reference transcripts, and for 15 speakers on automatic transcripts. Overall, the speaker-adapted LM yields statistically significant improvement over the baseline LM for both test conditions.

