Results 1 - 10
of
11
From HMM's to Segment Models: A Unified View of Stochastic Modeling for Speech Recognition
, 1996
"... ..."
Modeling Out-Of-Vocabulary Words For Robust Speech Recognition
, 2000
"... This thesis concerns the problem of unknown or out-of-vocabulary (00V) words in continuous speech recognition. Most of today's state-of-the-art speech recognition systems can recognize only words that belong to some predefined finite word vocabulary. When encountering an OOV word, a speech recognize ..."
Abstract
-
Cited by 43 (5 self)
- Add to MetaCart
This thesis concerns the problem of unknown or out-of-vocabulary (00V) words in continuous speech recognition. Most of today's state-of-the-art speech recognition systems can recognize only words that belong to some predefined finite word vocabulary. When encountering an OOV word, a speech recognizer erroneously substitutes the OOV word with a similarly sounding word from its vocabulary. Furthermore, a recognition error due to an OOV word tends to spread errors into neighboring words; dramatically degrading overall recognition performance.
SpeechBuilder: Facilitating Spoken Dialogue System Development
, 2001
"... SpeechBuilder is a suite of tools that helps facilitate the creation of mixed-initiative spoken dialogue systems for both novice and experienced developers of human language applications. SpeechBuilder employs intuitive methods of specification to allow developers to create human language interfaces ..."
Abstract
-
Cited by 40 (6 self)
- Add to MetaCart
SpeechBuilder is a suite of tools that helps facilitate the creation of mixed-initiative spoken dialogue systems for both novice and experienced developers of human language applications. SpeechBuilder employs intuitive methods of specification to allow developers to create human language interfaces to structured information stored in a relational database, or to control- and transaction-based applications. The goal of this project has been both to robustly accommodate the various scenarios where spoken dialogue systems may be needed, and to provide a stable and reliable infrastructure for design and deployment of applications. SpeechBuilder has been used in various spoken language domains, including a directory of the people working at the MIT Laboratory for Computer Science, an application to control the various physical items in a typical office environment, and a system for real-time weather information access.
Class phrase models for language modeling
- In Proceedings of ICSLP
, 1996
"... Previous attempts to automatically determine multi-words as the basic unit for language modeling have been successful for extending bigram models [10, 9, 2, 8] to improve the perplexity ofthelanguage model and/or the word accuracy of the speech decoder. However, none ofthese techniques gave improvem ..."
Abstract
-
Cited by 19 (3 self)
- Add to MetaCart
Previous attempts to automatically determine multi-words as the basic unit for language modeling have been successful for extending bigram models [10, 9, 2, 8] to improve the perplexity ofthelanguage model and/or the word accuracy of the speech decoder. However, none ofthese techniques gave improvements over the trigram model so far, except for the rather controlled ATIS task [8]. We therefore propose an algorithm, that minimizes the perplexity improvement ofa bigram model directly. The new algorithm is able to reduce the trigram perplexity andalso achieves word accuracy improvements in the Verbmobil task. It is the natural counterpart of successful word classi cation algorithms for language modeling [4, 7] that minimize the leaving-one-out bigram perplexity. Wealso give some details on the usage of class nding techniques and m-gram models, which can be crucial to successful applications of this technique. 1.
Learning Units for Domain-Independent Out-of-Vocabulary Word Modelling
, 2001
"... This paper describes our recent work on detecting and recognizing out-of-vocabulary (OOV) words for robust speech recognition and understanding. To allow for OOV recognition within a word-based recognizer, the in-vocabulary (IV) word network is augmented with an OOV word model so that OOV words are ..."
Abstract
-
Cited by 14 (3 self)
- Add to MetaCart
This paper describes our recent work on detecting and recognizing out-of-vocabulary (OOV) words for robust speech recognition and understanding. To allow for OOV recognition within a word-based recognizer, the in-vocabulary (IV) word network is augmented with an OOV word model so that OOV words are considered simultaneously with IV words during recognition. We explore several configurations for the OOV model, the best of which utilizes a set of domain-independent, automatically derived, variable-length units. The units are created using an iterative bottom-up procedure where, at each iteration, the unit pairs with maximum mutual information are merged. When evaluating this method on a weather information domain, the false alarm rate of our baseline OOV model [1] is reduced by over 60%. For example, with an OOV detection rate of 70%, the OOV false alarm rate is reduced from 8.5% to 3.2%. At these settings the addition of the OOV model degrades the word error rate on IV data by only 0.3% absolute (3% relative). 1.
Grammar Inference and Statistical Machine Translation
, 1998
"... NLP researchers face a dilemma: on one side, it is unarguably accepted that languages have internal structure rather than strings of words. On the other side, they find it very difficult and expensive to write grammars that have good coverage of language structures. Statistical machine translation ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
NLP researchers face a dilemma: on one side, it is unarguably accepted that languages have internal structure rather than strings of words. On the other side, they find it very difficult and expensive to write grammars that have good coverage of language structures. Statistical machine translation tries to cope with this problem by ignoring language structures and using a statistical models to depict the translation process. Most of the translation models are word-based. While the approach has achieved surprisingly good performance comparable to the best commercial systems, many questions remain in the machine translation community. Can the statistical word-based translation still perform well on language pairs with radically different linguistic structures? How would it function with less training data or with spoken languages? The thesis work investigated these questions. In summary, word-based alignment model is a major cause of errors in German-English statistical spoken language...
SLS-Lite: Enabling Spoken Language Systems Design for Non-Experts
, 2000
"... In this thesis, I designed and implemented SLS-Lite, a utility for allowing non-experts to build and run spoken language systems. This involved the creation of both a web interface for the developer and a set of programs to support the construction and execution of the required internal systems. We ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
In this thesis, I designed and implemented SLS-Lite, a utility for allowing non-experts to build and run spoken language systems. This involved the creation of both a web interface for the developer and a set of programs to support the construction and execution of the required internal systems. We concentrated on simplifying the task of configuring the language understanding components of a spoken language system. Any application-specific functionality required could be provided by the developer in a simple, CGI-based back-end. By learning the required grammar from a set of simple concepts and sentence examples provided by the developer, we were able to build a system where non-experts could build grammars and speech systems. Developers could also easily specify hierarchy in domains where a more complex grammar was appropriate. We demonstrated SLS-Lite by building several domains ourselves, and allowing others to build their own. These included domains for controlling the appliances i...
Automatic Acquisition of Language Model based on Head-Dependent Relation between Words
- In 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics
, 1998
"... Language modeling is to associate a sequence of words with a priori probability, which is key part of many natural language applications such as speech recognition and statistical chine translation. In this paper, we present language modeling based on a kind of simple dependency grammar. The grammar ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Language modeling is to associate a sequence of words with a priori probability, which is key part of many natural language applications such as speech recognition and statistical chine translation. In this paper, we present language modeling based on a kind of simple dependency grammar. The grammar consists of head-dependent relations between words and can be learned automatically from a raw corpus using the reestimation algorithm which is also introduced in this paper. Our experiments show that the proposed model performs better than n-gram models at 11% to 11.5% reductions in test corpus entropy.
Integrating A Context-Dependent Phrase Grammar In The Variable N-Gram Framework
, 2000
"... This paper focuses on the learning of multi-word lexical units, or phrases, and how to model them within the variable n-gram framework. We introduce the notion of contextdependent phrases and suggest an algorithm for unsupervised learning of phrases. Also, we propose an approach to integrate a phras ..."
Abstract
- Add to MetaCart
This paper focuses on the learning of multi-word lexical units, or phrases, and how to model them within the variable n-gram framework. We introduce the notion of contextdependent phrases and suggest an algorithm for unsupervised learning of phrases. Also, we propose an approach to integrate a phrase grammar and a variable n-gram without the need of explicitly handling multi-word lexical items. The combined variable n-gram phrase grammar improves recognition accuracy on the Switchboard corpus over both the baseline trigram and using a variable n-gram alone. 1. INTRODUCTION Although words in English are reasonable lexical units for language modeling, there are many cases that longer lexical units may be more appropriate. Frequently used word sequences, such as I mean or you know, are so common in conversational speech that they may be effectively used by the speaker as a single lexical item. We call these multiword units "phrases". There are several ways of treating a multi-word sequ...
Towards a Unified Framework for Sub-lexical and Supra-lexical Linguistic Modeling
, 2002
"... Conversational interfaces have received much attention as a promising natural communication channel between humans and computers. A typical conversational interface consists of three major systems: speech understanding, dialog management and spoken language generation. In such a conversational inter ..."
Abstract
- Add to MetaCart
Conversational interfaces have received much attention as a promising natural communication channel between humans and computers. A typical conversational interface consists of three major systems: speech understanding, dialog management and spoken language generation. In such a conversational interface, speech recognition as the front-end of speech understanding remains to be one of the fundamental challenges for establishing robust and effective human/computer communications. On the one hand, the speech recognition component in a conversational interface lives in a rich system environment. Diverse sources of knowledge are available and can potentially be beneficial to its robustness and accuracy. For example, the natural language understanding component can provide linguistic knowledge in syntax and semantics that helps constrain the recognition search space. On the other hand, the speech recognition component also faces the challenge of spontaneous speech, and it is important to address the casualness of speech using the knowledge sources available. For example, sub-lexical linguistic information would be very useful in providing linguistic support for previously unseen words, and dynamic reliability modeling may help improve recognition robustness for poorly articulated speech.

