Results 1 - 10
of
17
A Maximum Entropy Approach to Adaptive Statistical Language Modeling
- Computer, Speech and Language
, 1996
"... An adaptive statistical languagemodel is described, which successfullyintegrates long distancelinguistic information with other knowledge sources. Most existing statistical language models exploit only the immediate history of a text. To extract information from further back in the document's histor ..."
Abstract
-
Cited by 201 (11 self)
- Add to MetaCart
An adaptive statistical languagemodel is described, which successfullyintegrates long distancelinguistic information with other knowledge sources. Most existing statistical language models exploit only the immediate history of a text. To extract information from further back in the document's history, we propose and use trigger pairs as the basic information bearing elements. This allows the model to adapt its expectations to the topic of discourse. Next, statistical evidence from multiple sources must be combined. Traditionally, linear interpolation and its variants have been used, but these are shown here to be seriously deficient. Instead, we apply the principle of Maximum Entropy (ME). Each information source gives rise to a set of constraints, to be imposed on the combined estimate. The intersection of these constraints is the set of probability functions which are consistent with all the information sources. The function with the highest entropy within that set is the ME solution...
Error-responsive feedback mechanisms for speech recognizers
, 1997
"... This thesis is about modeling, analyzing, and predicting errorful behavior in large vocabulary continuous speech recognition systems. Because today's state-of-the-art recognizers are not designed to be situated naturally in an error feedback loop, they are ill-positioned for inclusion in multi-modal ..."
Abstract
-
Cited by 37 (4 self)
- Add to MetaCart
This thesis is about modeling, analyzing, and predicting errorful behavior in large vocabulary continuous speech recognition systems. Because today's state-of-the-art recognizers are not designed to be situated naturally in an error feedback loop, they are ill-positioned for inclusion in multi-modal interfaces, multi-media databases, and other interesting applications. I make improvements to the current approach to predicting and analyzing error behaviors, which is currently based only on the measurement ofword error rate. The speech recognizer's functionality is extended to include con dence annotations, which are \meta-level " markings that indicate how certain the recognizer is that it has decoded its input correctly. This is accomplished by feeding externally de ned error conditions back to the recognizer. Error feedback enables the construction of statistical models that map measurements of the recognizer's internal states and behaviors to externally de ned error conditions.
The Role of Voice Input for Human-Machine Communication
- Proceedings of the National Academy of Sciences
, 1994
"... Optimism is growing that the near future will witness rapid growth in human-computer interaction using voice. System prototypes have recently been built that demonstrate speaker-independent real-time speech recognition, and understanding of naturally spoken utterances with vocabularies of 1000 to 20 ..."
Abstract
-
Cited by 33 (4 self)
- Add to MetaCart
Optimism is growing that the near future will witness rapid growth in human-computer interaction using voice. System prototypes have recently been built that demonstrate speaker-independent real-time speech recognition, and understanding of naturally spoken utterances with vocabularies of 1000 to 2000 words, and larger. Already, computer manufacturers are building speech recognition subsystems into their new product lines. However, before this technology can be broadly useful, a substantial knowledge base is needed about human spoken language and performance during computer-based spoken interaction. This paper reviews application areas in which spoken interaction can play a significant role, assesses potential benefits of spoken interaction with machines, and compares voice with other modalities of human-computer interaction. It also discusses information that will be needed to build a firm empirical foundation for the design of future spoken and multimodal interfaces. Finally, it argu...
A Hybrid Approach To Adaptive Statistical Language Modeling
- Proceedings of the ARPA workshop on human language technology
, 1994
"... We desert'be our latest attempt at adaptive language modeling. At the heart of our approach is a Maximum Entropy (ME) model which inc.orlxnates many knowledge sources in a consistent manner. The other components are a selective unigram cache, a conditional bigram cache, and a conventionalstatic trig ..."
Abstract
-
Cited by 23 (2 self)
- Add to MetaCart
We desert'be our latest attempt at adaptive language modeling. At the heart of our approach is a Maximum Entropy (ME) model which inc.orlxnates many knowledge sources in a consistent manner. The other components are a selective unigram cache, a conditional bigram cache, and a conventionalstatic trigram. We describe the knowledge sources used to build such a model with ARPA's official WSJ corpus, and report on perplexity and word error rate results obtained with it. Then, three different adaptation paradigms are discussed, and an additional experiment, based on AP wire data, is used to compare them. 1. OVERVIEW OF ME FRAMEWORK Using several different probability estimates to arrive at one combined estimate is a general problem that arises in many tasks. The Maximum Entropy (ME) principle has recently been demonstrated as a powerful tool for combining statistical estimates from diverse sources[l, 2, 3]. The ME principle ([4, 5]) proposes the following: 1. Reformulate the different estimates as constraints on the expectation of various functions, to be satisfied by the target (combined) estimate. 2. Among all probability distributions that satisfy these con-straints, choose the one that has the highest entropy. More specifically, for estimating a probability function P(x), each constraint i is associated with a constraintfunctionfi(x) and a desired expectation ci. The constraint is then written as: def E Eefi = P(x)fi(x) = ci. (1) X Given consistent constraints, a unique ME solutions is guar-anteed to exist, and to be of the form: P(x) = II mf'°°, (2) i where the pi's are some unknown constants, to be found. Probability functions of the form (2) are called log-linear, and the family of functions defined by holding thefi's fixed and varying the pi's is called an exponential family.
Deleted Interpolation And Density Sharing For Continuous Hidden Markov Models
- In Proc. ICASSP, Atlanta
, 1996
"... As one of the most powerful smoothing techniques, deleted interpolation has been widely used in both discrete and semi-continuous hidden Markov model (HMM) based speech recognition systems. For continuous HMMs, most smoothing techniques are carried out on the parameters themselves such as Gaussian m ..."
Abstract
-
Cited by 17 (1 self)
- Add to MetaCart
As one of the most powerful smoothing techniques, deleted interpolation has been widely used in both discrete and semi-continuous hidden Markov model (HMM) based speech recognition systems. For continuous HMMs, most smoothing techniques are carried out on the parameters themselves such as Gaussian mean or covariance parameters. In this paper, we propose to smooth the probability density values instead of the parameters of continuous HMMs. This allows us to use most of the existing smoothing techniques for both discrete and continuous HMMs. We also point out that our deleted interpolation can be regarded as a parameter sharing technique. We further generalize this sharing to the probability density function (PDF) level, in which each PDF becomes a basic unit and can be freely shared across any Markov state. For a wide range of dictation experiments, deleted interpolation reduced the word error rate by 11% to 23% over other simple parameter smoothing techniques like flooring. Generic PD...
Rapid language model development for new task domains
- Proc. First International Conference on Language Resources and Evaluation (LREC
, 1998
"... Data sparseness has been regularly indicted as the primary problem in statistical language modelling. We go one step further to consider the situation when no text data is available for the target domain. We present two techniques for building efficient language models quickly for new domains. The f ..."
Abstract
-
Cited by 16 (6 self)
- Add to MetaCart
Data sparseness has been regularly indicted as the primary problem in statistical language modelling. We go one step further to consider the situation when no text data is available for the target domain. We present two techniques for building efficient language models quickly for new domains. The first technique is based on using a context-free grammar to generate a corpus of word collocations. The second is an adaptation technique based on using out-of-domain corpora to estimate target domain language models. We report results of successfully using these two techniques individually and in combination to build efficient models for a spontaneous speech recognition task in a medium-sized vocabulary domain. 1.
Speechacts: A Testbed For Continuous Speech Applications
, 1994
"... The SpeechActs system is a testbed for building computer applications utilizing continuous speech input and speech synthesis output. It supports a variety of speech recognition (SR) systems and text-to-speech (TTS) generators; our main goal is to produce a generalized interface scheme allowing new r ..."
Abstract
-
Cited by 12 (3 self)
- Add to MetaCart
The SpeechActs system is a testbed for building computer applications utilizing continuous speech input and speech synthesis output. It supports a variety of speech recognition (SR) systems and text-to-speech (TTS) generators; our main goal is to produce a generalized interface scheme allowing new research systems to be substituted easily. The essential components of the system, in addition to the externally-developed SR and TTS modules, are our natural language interpreter SWIFTUS, our Unified Grammar (UG) compiler, and the discourse manager. SWIFTUS provides the tools necessary to easily develop natural language interfaces between SR systems and end applications that exhibit a suitable granularity of analysis. The Unified Grammar compiler creates grammars supporting a variety of SR systems and NLP modules from one, moduleindependent formalism. The discourse manager is currently an under-developed prototype that coordinates user interactions between a co-existing suite of different sp...
Learning State-Dependent Stream Weights For Multi-Codebook Hmm Speech Recognition Systems
- HMM Speech Recognition Systems ICASSP94
, 1994
"... Many speech recognition systems [Lee88], [Shi85], [Hua92], use multiple information streams to compute HMM output probabilities (e.g. systems based on semicontinuous or discrete HMM's use one codebook for cepstral coefficients, and another one for delta cepstral coefficients). The final score is a w ..."
Abstract
-
Cited by 11 (3 self)
- Add to MetaCart
Many speech recognition systems [Lee88], [Shi85], [Hua92], use multiple information streams to compute HMM output probabilities (e.g. systems based on semicontinuous or discrete HMM's use one codebook for cepstral coefficients, and another one for delta cepstral coefficients). The final score is a weighted sum of the contributions of every stream. These weights can be found empirically and usually the same set of weights is used for every acoustic model. There is reason to believe that there are features which are more important for some acoustic models than for others. Especially, one would expect the beginning and ending segment of a phoneme to be more context dependent than the middle part, so in that case the probability estimator of the speech recognizer should put more emphasis on the delta-spectrum than on the spectrum. Experiments [Shi85], [Boc93], have shown that spectral or cepstral coefficients are more important than their derivatives and more important than power or delta-...
Estimation of Language Models for New Spoken Language Applications
, 1996
"... Spoken language interfaces can provide natural communication for many database retrieval tasks. The CMU ATIS system provides an example of accessing airline information using spoken natural language queries. However, a lot of training data is needed to develop a spoken language application. For exam ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Spoken language interfaces can provide natural communication for many database retrieval tasks. The CMU ATIS system provides an example of accessing airline information using spoken natural language queries. However, a lot of training data is needed to develop a spoken language application. For example, we need training data to generate a language model that can be used by the recognizer to reduce the search space. In this paper, we will address some issues arising from small amount of training data available for a new spoken language application.
Large Vocabulary Continuous Speech Recognition: from Laboratory Systems towards Real-World Applications
, 1996
"... This paper provides an overview of the state-of-the-art in laboratory speaker-independent, large vocabulary continuous speech recognition (LVCSR) systems with a view towards adapting such technology to the requirements of real-world applications. While in speech recognition the principal concern is ..."
Abstract
-
Cited by 6 (4 self)
- Add to MetaCart
This paper provides an overview of the state-of-the-art in laboratory speaker-independent, large vocabulary continuous speech recognition (LVCSR) systems with a view towards adapting such technology to the requirements of real-world applications. While in speech recognition the principal concern is to transcribe the speech signal as a sequence of words, the same core technology can be applied to domains other than dictation. The main topics addressed are acoustic-phonetic modeling, lexical representation, language modeling, decoding and model adaptation. After a brief summary of experimental results some directions towards usable systems are given. In moving from laboratory systems towards real-world applications, different constraints arise which influence the system design. The application imposes limitations on computational resources, constraints on signal capture, requirements for noise and channel compensation, and rejection capability. The difficulties and costs of adapting existing technology to new languages and application need to be assessed. Near term applications for LVCSR technology are likely to grow in somewhat limited domains such as spoken language systems for information retrieval, and limited domain dictation. Perspectives on some unresolved problems are given, indicating areas for future research

