Results 1 -
8 of
8
Acoustical and Environmental Robustness in Automatic Speech Recognition
, 1990
"... This dissertation describes a number of algorithms developed to increase the robustness of automatic speech recognition systems with respect to changes in the environment. These algorithms attempt to improve the recognition accuracy of speech recognition systems when they are trained and tested in d ..."
Abstract
-
Cited by 145 (8 self)
- Add to MetaCart
This dissertation describes a number of algorithms developed to increase the robustness of automatic speech recognition systems with respect to changes in the environment. These algorithms attempt to improve the recognition accuracy of speech recognition systems when they are trained and tested in different acoustical environments, and when a desk-top microphone (rather than a close-talking microphone) is used for speech input. Without such processing, mismatches between training and testing conditions produce an unacceptable degradation in recognition accuracy. Two kinds of
The SPHINX-II Speech Recognition System: An Overview
- Computer, Speech and Language
, 1992
"... In order for speech recognizers to deal with increased task perplexity, speaker variation, and environment variation, improved speech recognition is critical. Steady progress has been made along these three dimensions at Carnegie Mellon. In this paper, we review the SPHINX-II speech recognition syst ..."
Abstract
-
Cited by 137 (7 self)
- Add to MetaCart
In order for speech recognizers to deal with increased task perplexity, speaker variation, and environment variation, improved speech recognition is critical. Steady progress has been made along these three dimensions at Carnegie Mellon. In this paper, we review the SPHINX-II speech recognition system and summarize our recent efforts on improved speech recognition. This research was sponsored by the Defense Advanced Research Projects Agency and monitored by the Space and Naval Warfare Systems Command under Contract N00039-91-C-0158, ARPA Order No. 7239. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the U.S. Government. Keywords: Speech recognition, hidden Markov models, SPHINX-II 1. INTRODUCTION At Carnegie Mellon, wehave made significant progress in large-vocabulary speaker-independent continuous speech recognition during the past years [1, 2, 3]. SP...
Hybrid HMM/ANN Systems for Training Independent Tasks: Experiments on Phonebook and Related Improvements
, 1997
"... In this paper, we evaluate multi-Gaussian HMM systems and hybrid HMM/ANN systems in the framework of task independent training for small size (75 words) and medium size (600 words) vocabularies. To do this, we use the Phonebook database [6] which is particularly well suited to this kind of experimen ..."
Abstract
-
Cited by 22 (4 self)
- Add to MetaCart
In this paper, we evaluate multi-Gaussian HMM systems and hybrid HMM/ANN systems in the framework of task independent training for small size (75 words) and medium size (600 words) vocabularies. To do this, we use the Phonebook database [6] which is particularly well suited to this kind of experiments since (1) it is a very large telephone database and (2) the size and content of the test vocabulary is very flexible. For each system, different HMM topologies are compared to test the influence of state tying (with a number of parameters approximately kept constant) on the recognition performance. Two lexica (Phonebook and CMU) are also compared and it is shown that the CMU lexicon is leading to significantly better performance. Finally, it is shown that with a quite simple system and a few adaptations to the basic HMM/ANN scheme, recognition performance of 98.5% and 94.7% can easily be achieved, respectively on a lexicon of 75 and 600 words (isolated words, telephone speech and lexicon ...
Speaker-Independent Phone Recognition Using BREF
, 1992
"... A series of experiments on speaker-independent phone recognition of continuous speech have been carried out using the recently recorded BREF corpus. These experiments are the first to use this large corpus, and are meant to provide a baseline performance evaluation for vocabulary-independent phone r ..."
Abstract
-
Cited by 16 (11 self)
- Add to MetaCart
A series of experiments on speaker-independent phone recognition of continuous speech have been carried out using the recently recorded BREF corpus. These experiments are the first to use this large corpus, and are meant to provide a baseline performance evaluation for vocabulary-independent phone recognition of French. The HMM-based recognizer was trained with hand-verified data from 43 speakers. Using 35 context-independent phone models, a baseline phone accuracy of 60% (no phone grammar) was obtained on an independent test set of 7635 phone segments from 19 new speakers. Including phone bigram probabilities as phonotactic constraints resulted in a performance of 63.5%. A phone accuracy of 68.6% was obtained with 428 context dependent models and the bigram phone language model. Vocabulary-independent word recognition results with no grammar are also reported for the same test data. INTRODUCTION This paper reports on a series of experiments for speakerindependent, continuous speech ...
Speech-to-text conversion in French
, 1994
"... Speech-to-text conversion of French necessitates that both the acoustic level recognition and language modeling be tailored to the French language. Work in this area was initiated at LIMSI over 10 years ago. In this paper a summary of the ongoing research in this direction is presented. Included are ..."
Abstract
-
Cited by 6 (6 self)
- Add to MetaCart
Speech-to-text conversion of French necessitates that both the acoustic level recognition and language modeling be tailored to the French language. Work in this area was initiated at LIMSI over 10 years ago. In this paper a summary of the ongoing research in this direction is presented. Included are studies on distributional properties of French text materials; problems specific to speech-to-text conversion particular to French; studies in phoneme-to-grapheme conversion, for continuous, error-free phonemic strings; past work on isolated-word speech-totext conversion; and more recent work on continuous-speech speech-to-text conversion. Also demonstrated is the use of phone recognition for both language and speaker identification. The
Confidence and Rejection in Automatic Speech Recognition
, 1997
"... : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : xiii 1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1 1.1 Research Goals : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1 1.2 Male/Female Versus Last Na ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : xiii 1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1 1.1 Research Goals : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1 1.2 Male/Female Versus Last Names : : : : : : : : : : : : : : : : : : : : : : : : 2 1.3 Scaling Up: 58 Phrases : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 4 1.4 Vocabulary Independence : : : : : : : : : : : : : : : : : : : : : : : : : : : : 5 1.5 Thesis Overview : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 6 1.6 Tutorial on Automatic Speech Recognition : : : : : : : : : : : : : : : : : : : 7 1.6.1 A Setting for Automatic Speech Recognition : : : : : : : : : : : : : 7 1.6.2 Overview of Speech Recognition : : : : : : : : : : : : : : : : : : : : 8 1.6.3 Artificial Neural Network : : : : : : : : : : : : : : : : : : : : : : : : 12 1.6.4 Context-Dependent Modeling : : : : : : : : : : : : : ...
Context Independent And Context Dependent Hybrid HMM/ANN Systems For Vocabulary Independent Tasks
"... In this paper, hybrid HMM/ANN systems are used to model context dependent phones. In order to reduce the number of parameters as well as to better catch the dynamics of the phonetic segments, we combine (context dependent) diphone models with context independent phone models. Transitions from phone ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
In this paper, hybrid HMM/ANN systems are used to model context dependent phones. In order to reduce the number of parameters as well as to better catch the dynamics of the phonetic segments, we combine (context dependent) diphone models with context independent phone models. Transitions from phone to phone are modeled as generalized context dependent distributions while phonetic units are context independent models trained on the less coarticulated middle part of each phone. Words are thus modeled as a sequence of probability distributions alternatively representing the middle part of the phonemes and the transitions from phone to phone. A single neural network is used to estimate both context independent phone probabilities and generalized context dependent diphone (phone to phone transition) probabilities. Resulting systems are compared to classical context independent phone-based HMM/ANN systems with the same number of parameters. The Phonebook isolated word database has been used for training the systems. Testing is done on small (75 words), medium (600 words) and large (8000 words) lexicons. Test words were not present in the training vocabulary.
Task Adaptation For Dialogues Via Telephone Lines
"... This paper describes our successful ongoing approaches toward better recognition accuracy for flexible interactive systems in automatic speech recognition. Degradation in performance of speech recognition systems is observed whenever any current application differs from the conditions during trainin ..."
Abstract
- Add to MetaCart
This paper describes our successful ongoing approaches toward better recognition accuracy for flexible interactive systems in automatic speech recognition. Degradation in performance of speech recognition systems is observed whenever any current application differs from the conditions during training time. Main speaker independent causes for these deteriorations are changes in transmission channels and changes in the task to be fulfilled. We present our results of researchonchanging tasks, i.e. more specifically on changing dictionaries. We propose an in-service adaptation technique that is speaker independent, works under unsupervised conditions, and has a long term memory. On 2000 adaptation words a reduction of error rate of more than 40% at negligible computational costs is achieved.

