Results 1 -
9 of
9
Modeling Out-Of-Vocabulary Words For Robust Speech Recognition
, 2000
"... This thesis concerns the problem of unknown or out-of-vocabulary (00V) words in continuous speech recognition. Most of today's state-of-the-art speech recognition systems can recognize only words that belong to some predefined finite word vocabulary. When encountering an OOV word, a speech recognize ..."
Abstract
-
Cited by 43 (5 self)
- Add to MetaCart
This thesis concerns the problem of unknown or out-of-vocabulary (00V) words in continuous speech recognition. Most of today's state-of-the-art speech recognition systems can recognize only words that belong to some predefined finite word vocabulary. When encountering an OOV word, a speech recognizer erroneously substitutes the OOV word with a similarly sounding word from its vocabulary. Furthermore, a recognition error due to an OOV word tends to spread errors into neighboring words; dramatically degrading overall recognition performance.
Minimum classification error training of landmark models for real-time continuous speech recognition
- in Proc. IEEE ICASSP
, 2004
"... Though many studies have shown the effectiveness of the Minimum Classification Error (MCE) approach to discriminative training of HMMs for speech recognition, few if any have reported MCE results for large (> 100 hours) training sets in the context of real-world, continuous speech recognition. Here ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
Though many studies have shown the effectiveness of the Minimum Classification Error (MCE) approach to discriminative training of HMMs for speech recognition, few if any have reported MCE results for large (> 100 hours) training sets in the context of real-world, continuous speech recognition. Here we report large gains in performance for the MIT JUPITER weather information task as a result of MCE-based batch optimization of acoustic models. Investigation of word error rate vs. computation time showed that small MCE models significantly outperform the Maximum Likelihood (ML) baseline at all points of equal computation time, resulting in up to 20 % word error rate reduction for in-vocabulary utterances. The overall MCE loss function was minimized using Quickprop, a simple but effective second-order optimization method suited to parallelization over large training sets.
Developing city name acquisition strategies in spoken dialogue systems via user simulation
- Proc. of 6th SIGDial Workshop on Discourse and Dialogue
, 2005
"... This paper describes our recent work on mechanisms for error recovery in spoken dialogue systems. We focus on the acquisition of city names and dates in the flight reservation domain. We are specifically interested in addressing the issue of acquiring out-of-vocabulary city names through a speak-and ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
This paper describes our recent work on mechanisms for error recovery in spoken dialogue systems. We focus on the acquisition of city names and dates in the flight reservation domain. We are specifically interested in addressing the issue of acquiring out-of-vocabulary city names through a speak-and-spell mode subdialogue. In order to explore various dialogue strategies, we developed a user simulation system, which includes a configurable simulated user and a novel method of utterance generation. The latter utilizes a concatenative speech synthesizer, along with an existing corpus of dialogues, to produce a large variety of simulated inputs. The results from various simulated user configurations are presented, along with a discussion of how the simulated user facilitates the debugging of dialogue strategies and the discovery of situations unanticipated by the system developer.
Error Detection and Recovery in Spoken Dialogue Systems
- IN PROC. WORKSHOP ON SPOKEN LANGUAGE UNDERSTANDING FOR CONVERSATIONAL SYSTEMS
, 2004
"... This paper describes our research on both the detection and subsequent resolution of recognition errors in spoken dialogue systems. The paper consists of two major components. The first half concerns the design of the error detection mechanism for resolving city names in our MERCURY flight res ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
This paper describes our research on both the detection and subsequent resolution of recognition errors in spoken dialogue systems. The paper consists of two major components. The first half concerns the design of the error detection mechanism for resolving city names in our MERCURY flight reservation system, and an investigation of the behavioral patterns of users in subsequent subdialogues involving keypad entry for disambiguation. An important observation is that, upon a request for keypad entry, users are frequently unresponsive to the extent of waiting for a time-out or hanging up the phone. The second half concerns a pilot experiment investigating the feasibility of replacing the solicitation of a keypad entry with that of a "speak-and-spell" entry. A novelty of our work is the introduction of a speech synthesizer to simulate the user, which facilitates development and evaluation of our proposed strategy. We have
Morph-Based Speech Recognition and Modeling of Out-of-Vocabulary Words Across Languages
"... We explore the use of morph-based language models in large-vocabulary continuous speech recognition systems across four so-called “morphologically rich ” languages: Finnish, Estonian, Turkish, and Egyptian Colloquial Arabic. The morphs are subword units discovered in an unsupervised, data-driven way ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
We explore the use of morph-based language models in large-vocabulary continuous speech recognition systems across four so-called “morphologically rich ” languages: Finnish, Estonian, Turkish, and Egyptian Colloquial Arabic. The morphs are subword units discovered in an unsupervised, data-driven way using the Morfessor algorithm. By estimating n-gram language models over sequences of morphs instead of words, the quality of the language model is improved through better vocabulary coverage and reduced data sparsity. Standard word models suffer from high out-of-vocabulary (OOV) rates, whereas the morph models can recognize previously unseen word forms by concatenating morphs. It is shown that the morph models do perform fairly well on OOVs without compromising the recognition accuracy on in-vocabulary words. The Arabic experiment constitutes the only exception, since here the standard word model outperforms the morph model. Differences in the data sets and the amount of data are discussed as a plausible explanation.
Discriminative Training for Large-Vocabulary Speech Recognition Using Minimum Classification Error
"... Abstract—The minimum classification error (MCE) framework for discriminative training is a simple and general formalism for directly optimizing recognition accuracy in pattern recognition problems. The framework applies directly to the optimization of hidden Markov models (HMMs) used for speech reco ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Abstract—The minimum classification error (MCE) framework for discriminative training is a simple and general formalism for directly optimizing recognition accuracy in pattern recognition problems. The framework applies directly to the optimization of hidden Markov models (HMMs) used for speech recognition problems. However, few if any studies have reported results for the application of MCE training to large-vocabulary, continuous-speech recognition tasks. This article reports significant gains in recognition performance and model compactness as a result of discriminative training based on MCE training applied to HMMs, in the context of three challenging large-vocabulary (up to 100 k word) speech recognition tasks: the Corpus of Spontaneous Japanese lecture speech transcription task, a telephone-based name recognition task, and the MIT JUPITER telephone-based conversational weather information task. On these tasks, starting from maximum likelihood (ML) baselines, MCE training yielded relative reductions in word error ranging from 7 % to 20%. Furthermore, this paper evaluates the use of different methods for optimizing the MCE criterion function, as well as the use of precomputed recognition lattices to speed up training. An overview of the MCE framework is given, with an emphasis on practical implementation issues. Index Terms—Discriminative training, pattern recognition, speech recognition. I.
Acoustic-based improving pronunciation inference using n-best list, acoustics and orthography
- in Proc. ICASSP
, 2007
"... In this paper, we tackle the problem of pronunciation inference and Out-of-Vocabulary (OOV) enrollment in Automatic Speech Recognition (ASR) applications. We combine linguistic and acoustic information of the OOV word using its spelling and a single instance of its utterance to derive an appropriate ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
In this paper, we tackle the problem of pronunciation inference and Out-of-Vocabulary (OOV) enrollment in Automatic Speech Recognition (ASR) applications. We combine linguistic and acoustic information of the OOV word using its spelling and a single instance of its utterance to derive an appropriate phonetic baseform. The novelty of the approach is in its employment of an orthography-driven n-best hypothesis and rescoring strategy of the pronunciation alternatives. We make use of decision trees and heuristic tree search to construct and score the n-best hypotheses space. We use acoustic alignment likelihood and phone transition cost to leverage the empirical evidence and phonotactic priors to rescore the hypotheses and refine the baseforms. Index Terms — n-best list, Out-of-Vocabulary, letter-tosound rules, pronunciation modeling, automatic pronunciation learning 1.
Handling OOV Words In Arabic ASR Via Flexible Morphological Constraints
- INTERSPEECH
, 2007
"... We propose a novel framework to detect and recognize out-of-vocabulary (OOV) words in automated speech recognition (ASR). In the proposed framework a hybrid language model combining words and sub-word units is incorporated during ASR decoding then three different OOV words recognition methods are ap ..."
Abstract
- Add to MetaCart
We propose a novel framework to detect and recognize out-of-vocabulary (OOV) words in automated speech recognition (ASR). In the proposed framework a hybrid language model combining words and sub-word units is incorporated during ASR decoding then three different OOV words recognition methods are applied to generate OOV word hypotheses. Specifically, dictionary lookup, morphological composition, and direct phoneme-to-grapheme. The proposed approach successfully reduced WER by 1.9% and 1.6% for ASR systems with recognition vocabularies of 30K and 219K. Moreover, the proposed approach correctly recognized 5% of OOV words.

