Results 1 -
7 of
7
Rational Interpolation Of Maximum Likelihood Predictors In Stochastic Language Modeling
, 1997
"... In our paper, we address the problem of estimating stochastic language models based on n-gram statistics. We present a novel approach, rational interpolation, for the combination of a competing set of conditional n-gram word probability predictors, which consistently outperforms the traditional lin ..."
Abstract
-
Cited by 14 (11 self)
- Add to MetaCart
In our paper, we address the problem of estimating stochastic language models based on n-gram statistics. We present a novel approach, rational interpolation, for the combination of a competing set of conditional n-gram word probability predictors, which consistently outperforms the traditional linear interpolation scheme. The superiority of rational interpolation is substantiated by experimental results from language modeling, speech recognition, dialog act classification, and language identification. 1. INTRODUCTION In our paper, we address the problem of estimating stochastic language models P (w) for sentences w = w1 : : : wT of words w t from a finite vocabulary V. The joint distribution P (w) can be decomposed by the wellknown chain rule P (w) = T Y t=1 P (w t jw t\Gamma1 1 ) = T Y t=1 P (w t j w1 : : : w t\Gamma1 ) (1) into a product of conditional word probabilities (by w t s we denote the substring ws : : : w t of w). The latter, in turn, are usually approximate...
Permugram Language Models
, 1995
"... In natural languages, the words within an utterance are often correlated over large distances. Long-spanning contextual effects of this type cannot be efficiently and robustly captured by the traditional N-gram approaches of stochastic language modelling. We present a new kind of stochastic grammar ..."
Abstract
-
Cited by 12 (4 self)
- Add to MetaCart
In natural languages, the words within an utterance are often correlated over large distances. Long-spanning contextual effects of this type cannot be efficiently and robustly captured by the traditional N-gram approaches of stochastic language modelling. We present a new kind of stochastic grammar --- the permugram model. A permugram model is obtained by linear interpolation of a large number of conventional bigram, trigram, or polygram models which operate on different permutations of the input word sequence under consideration. This way, stochastic dependences between word pairs or word triples lying adjacent as well as remote in the input text can be captured simultaneously without the requirement of very large N-grams. Using the permugram model, we achieved test set perplexity reductions of 5--10% compared with interpolated N-gram models, depending on the application. 1. INTRODUCTION In natural languages, the words within an utterance are often correlated over large distances; fo...
A Category Based Approach for Recognition of Out-of-Vocabulary Words
- In Int. Conf. on Spoken Language Processing
, 1996
"... In almost all applications of automatic speech recognition, especially in spontaneous speech tasks, the recognizer vocabulary cannot cover all occurring words. There is always a significant amount of out-of-vocabulary words even when the vocabulary size is very large. In this paper we present a new ..."
Abstract
-
Cited by 12 (4 self)
- Add to MetaCart
In almost all applications of automatic speech recognition, especially in spontaneous speech tasks, the recognizer vocabulary cannot cover all occurring words. There is always a significant amount of out-of-vocabulary words even when the vocabulary size is very large. In this paper we present a new approach for the integration of out-of-vocabulary words into statistical language models. We use category information for all words in the training corpus to define a function that gives an approximation of the out-of-vocabulary word emission probability for each word category. This information is integrated into the language models. Although we use a simple acoustic model for out-of-vocabulary words, we achieve a 6% reduction of word error rate on spontaneous speech data with about 5% out-of-vocabulary rate.
Combining Stochastic And Linguistic Language Models For Recognition Of Spontaneous Speech
- In Proceedings of the IEEE Conference on Acoustics, Speech, and Signal Processing
, 1996
"... In this paper we present a new approach of combining stochastic language models and traditional linguistic models to enhance the performance of our spontaneous speech recognizer. We compile arbitrary large linguistic context dependencies into a category based bigram model which allows us to use a st ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
In this paper we present a new approach of combining stochastic language models and traditional linguistic models to enhance the performance of our spontaneous speech recognizer. We compile arbitrary large linguistic context dependencies into a category based bigram model which allows us to use a standard beam-search driven forward Viterbi algorithm for real time decoding. Since this recognizer is used in a dialog system, the information about the last system utterance is used to build dialogstep dependent language models. This setup is verified and tested on our corpus of spontaneous speech utterances collected with our dialog system. Experimental results show a significant reduction of word error rate. 1. INTRODUCTION In the last years it has been shown that the consideration of language constraints is vital for effective and efficient speech recognition. Typically, these language constraints are modeled in a so called language model which will restrict the allowed seqences of words...
Integrating Large Context Language Models Into A Real Time Word Recognizer
, 1996
"... In this paper we present a new recognizer architecture that allows the efficient integration of language models with arbitrary large context information, e.g. polygram models, into the recognition process. Instead of using these models for rescoring the n best word chains generated using bigram inf ..."
Abstract
-
Cited by 8 (8 self)
- Add to MetaCart
In this paper we present a new recognizer architecture that allows the efficient integration of language models with arbitrary large context information, e.g. polygram models, into the recognition process. Instead of using these models for rescoring the n best word chains generated using bigram information, we extract the best word chain, or optionally the n best word chains, directly from the word lattice using an A ? algorithm that incorporates full language model information. For comparison, we developed an improved architecture for fast generation of the n best word chains using bigram information. Experimental results show, that direct incorporation of full language model information increases word accuracy significantly even when compared to rescoring the 1000 best word chains. At the same time, computation time is drastically reduced. 1 Introduction It is well known that the consideration of language constraints is vital for effective and efficient speech recognition. Typica...
Applications of Decision Tree Methodology in Speech Recognition and Understanding
, 1994
"... This paper describes decision tree methodology and shows how it has been adapted to three different problems in speech recognition and understanding at CRIM and at FORWISS. The three problems are: 1. Development of context-dependent phone models (work carried out by CRIM researchers and partly inspi ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
This paper describes decision tree methodology and shows how it has been adapted to three different problems in speech recognition and understanding at CRIM and at FORWISS. The three problems are: 1. Development of context-dependent phone models (work carried out by CRIM researchers and partly inspired by FORWISS work on polyphones). Here, decision trees determine a characterization of the context of a phone that yields good models for recognition. 2. Deriving rules for semantic interpretation from a semantically annotated corpus. This problem led to the development of "Semantic Classification Trees" (in Ph.D. work by R. Kuhn under the supervision of R. De Mori [Kuh93a]). 3. Recognising prosodic features (work carried out in collaboration between FORWISS and CRIM). From a word sequence, Semantic Classification Trees (SCTs) can make predictions about such prosodic features as accents and phrase boundaries. In this ongoing research, we plan to create hybrid decision trees that learn rule...
A Fast Algorithm For Unsupervised Incremental Speaker Adaptation
"... Speaker adaptation algorithms often require a rather large amount of adaptation data in order to estimate the new parameters reliably. In this paper, we investigate how adaptation can be performed in real--time applications with only a few seconds of speech from each user. We propose a modified Baye ..."
Abstract
- Add to MetaCart
Speaker adaptation algorithms often require a rather large amount of adaptation data in order to estimate the new parameters reliably. In this paper, we investigate how adaptation can be performed in real--time applications with only a few seconds of speech from each user. We propose a modified Bayesian codebook reestimation which does not need the computationally intensive evaluation of normal densities and thus speeds up the adaptation remarkably, e.g. by a factor of 18 for 24--dimensional feature vectors. We performed experiments in two real--time applications with very small amounts of adaptation data, and achieved a word error reduction of up to 11%. 1 INTRODUCTION Speaker adaptation has been a field of intensive research for several years. Great progress has been made in the development of theoretically well--founded algorithms as well as in the achieved experimental results. Approaches based on optimality criteria such as Maximum Likelihood (ML) and Maximum a posteriori (MAP) h...

