Results 1 -
7 of
7
Rational Interpolation Of Maximum Likelihood Predictors In Stochastic Language Modeling
, 1997
"... In our paper, we address the problem of estimating stochastic language models based on n-gram statistics. We present a novel approach, rational interpolation, for the combination of a competing set of conditional n-gram word probability predictors, which consistently outperforms the traditional lin ..."
Abstract
-
Cited by 14 (11 self)
- Add to MetaCart
In our paper, we address the problem of estimating stochastic language models based on n-gram statistics. We present a novel approach, rational interpolation, for the combination of a competing set of conditional n-gram word probability predictors, which consistently outperforms the traditional linear interpolation scheme. The superiority of rational interpolation is substantiated by experimental results from language modeling, speech recognition, dialog act classification, and language identification. 1. INTRODUCTION In our paper, we address the problem of estimating stochastic language models P (w) for sentences w = w1 : : : wT of words w t from a finite vocabulary V. The joint distribution P (w) can be decomposed by the wellknown chain rule P (w) = T Y t=1 P (w t jw t\Gamma1 1 ) = T Y t=1 P (w t j w1 : : : w t\Gamma1 ) (1) into a product of conditional word probabilities (by w t s we denote the substring ws : : : w t of w). The latter, in turn, are usually approximate...
Adaptation of Statistical Language Models for Automatic Speech Recognition
, 1999
"... Statistical language models encode linguistic information in such a way as to be useful to systems which process human language. Such systems include those for optical character recognition and machine translation. Currently, however, the most common application of language modelling is in automatic ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Statistical language models encode linguistic information in such a way as to be useful to systems which process human language. Such systems include those for optical character recognition and machine translation. Currently, however, the most common application of language modelling is in automatic speech recognition, and it is this that forms the focus of this thesis. Most current speech recognition systems are dedicated to one specific task (for example, the recognition of broadcast news), and thus use a language model which has been trained on text which is appropriate to that task. If, however, one wants to perform recognition on more general language, then creating an appropriate language model is far from straightforward. A taskspecific language model will often perform very badly on language from a different domain, whereas a model trained on text from many diverse styles of language might perform better in general, but will not be especially well suited to any particular domai...
Statistical Grammar Models and Lexicon Acquisition
, 12
"... This paper presents a framework for developing and training statistical grammar models for the acquisition of lexicon information. Utilising a robust parsing environment and mathematically well-dened unsupervised training methods, the framework enables us to induce lexicon information from text corp ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
This paper presents a framework for developing and training statistical grammar models for the acquisition of lexicon information. Utilising a robust parsing environment and mathematically well-dened unsupervised training methods, the framework enables us to induce lexicon information from text corpora. Particular strengths of the approach concern (i) the fact that no extensive manual work is required to set up the framework, and (ii) that the framework is applicable to any desired language. It has already been applied to English and German (Carroll and Rooth 1998, Beil et al. 1999, Rooth et al. 1999, Schulte im Walde 2000a), Portuguese (de Lima 2001), and Chinese (Hockenmaier 1999)
Suprasegmental Modelling
, 1997
"... this paper, we want to show how prosodic information can be computed and used in a speech understanding system. Since the authors developed the prosody module of the VERBMOBIL system and since the use of prosody is implemented on all levels of linguistic processing in this speech--to--speech transla ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
this paper, we want to show how prosodic information can be computed and used in a speech understanding system. Since the authors developed the prosody module of the VERBMOBIL system and since the use of prosody is implemented on all levels of linguistic processing in this speech--to--speech translation system, most examples will be taken from there
A Bootstrap Training Approach for Language Model Classifiers
, 1998
"... In this paper, we present a bootstrap training approach for language model (LM) classifiers. Training class dependent LM and running them in parallel, LM can serve as classifiers with any kind of symbol sequence, e.g., word or phoneme sequences for tasks like topic spotting or language identificatio ..."
Abstract
- Add to MetaCart
In this paper, we present a bootstrap training approach for language model (LM) classifiers. Training class dependent LM and running them in parallel, LM can serve as classifiers with any kind of symbol sequence, e.g., word or phoneme sequences for tasks like topic spotting or language identification (LID). Irrespective of the special symbol sequence used for a LM classifier, the training of a LM is done with a manually labeled training set for each class obtained from not necessarily cooperative speakers. Therefore, we have to face some erroneous labels and deviations from the originally intended class specification. Both facts can worsen classification. It might therefore be better not to use all utterances for training but to automatically select those utterances that improve recognition accuracy; this can be done by a bootstrap procedure. We present the results achieved with our best approach on the VERBMOBIL corpus for the tasks dialog act classification and LID. 1. INTRODUCTION ...
Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume, pages 21--24,
, 2006
"... The performance of automatic speech summarisation has been improved in previous experiments by using linguistic model adaptation. We extend such adaptation to the use of class models, whose robustness further improves summarisation performance on a wider variety of objective evaluation metric ..."
Abstract
- Add to MetaCart
The performance of automatic speech summarisation has been improved in previous experiments by using linguistic model adaptation. We extend such adaptation to the use of class models, whose robustness further improves summarisation performance on a wider variety of objective evaluation metrics such as ROUGE-2 and ROUGE-SU4 used in the text summarisation literature. Summaries made from automatic speech recogniser transcriptions benefit from relative improvements ranging from 6.0% to 22.2% on all investigated metrics.
A Scalable Probabilistic Classifier for Language Modeling
"... We present a novel probabilistic classifier, which scales well to problems that involve a large number of classes and require training on large datasets. A prominent example of such a problem is language modeling. Our classifier is based on the assumption that each feature is associated with a predi ..."
Abstract
- Add to MetaCart
We present a novel probabilistic classifier, which scales well to problems that involve a large number of classes and require training on large datasets. A prominent example of such a problem is language modeling. Our classifier is based on the assumption that each feature is associated with a predictive strength, which quantifies how well the feature can predict the class by itself. The predictions of individual features can then be combined according to their predictive strength, resulting in a model, whose parameters can be reliably and efficiently estimated. We show that a generative language model based on our classifier consistently matches modified Kneser-Ney smoothing and can outperform it if sufficiently rich features are incorporated. 1

