Results 1 -
4 of
4
Contextual Information Improves OOV Detection in Speech
"... Out-of-vocabulary (OOV) words represent an important source of error in large vocabulary continuous speech recognition (LVCSR) systems. These words cause recognition failures, which propagate through pipeline systems impacting the performance of downstream applications. The detection of OOV regions ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
Out-of-vocabulary (OOV) words represent an important source of error in large vocabulary continuous speech recognition (LVCSR) systems. These words cause recognition failures, which propagate through pipeline systems impacting the performance of downstream applications. The detection of OOV regions in the output of a LVCSR system is typically addressed as a binary classification task, where each region is independently classified using local information. In this paper, we show that jointly predicting OOV regions, and including contextual information from each region, leads to substantial improvement in OOV detection. Compared to the state-of-the-art, we reduce the missed OOV rate from 42.6 % to 28.4 % at 10 % false alarm rate. 1
Large Vocabulary Continuous Speech Recognition (LVCSR)
"... This paper addresses the issue of Out-Of-Vocabulary (OOV) word detection in Large Vocabulary Continuous Speech Recognition (LVCSR) systems. We propose a method inspired by confidence measures, that consists in analyzing the recognition system outputs in order to automatically detect errors due to OO ..."
Abstract
- Add to MetaCart
This paper addresses the issue of Out-Of-Vocabulary (OOV) word detection in Large Vocabulary Continuous Speech Recognition (LVCSR) systems. We propose a method inspired by confidence measures, that consists in analyzing the recognition system outputs in order to automatically detect errors due to OOV words. This method combines various features based on acoustic, linguistic, decoding graph and semantics. We evaluate separately each feature and we estimate their complementarity. Experiments are conducted on a large French broadcast news corpus from the ESTER evaluation campaign. Results show good performance in real conditions: the method obtains an OOV word detection rate of 43%-90 % with 2.5%-17.5 % of false detection. Index Terms: OOV word detection, confidence measures, speech recognition
Learning Sub-Word Units for Open Vocabulary Speech Recognition
"... Large vocabulary speech recognition systems fail to recognize words beyond their vocabulary, many of which are information rich terms, like named entities or foreign words. Hybrid word/sub-word systems solve this problem by adding sub-word units to large vocabulary word based systems; new words can ..."
Abstract
- Add to MetaCart
Large vocabulary speech recognition systems fail to recognize words beyond their vocabulary, many of which are information rich terms, like named entities or foreign words. Hybrid word/sub-word systems solve this problem by adding sub-word units to large vocabulary word based systems; new words can then be represented by combinations of subword units. Previous work heuristically created the sub-word lexicon from phonetic representations of text using simple statistics to select common phone sequences. We propose a probabilistic model to learn the subword lexicon optimized for a given task. We consider the task of out of vocabulary (OOV) word detection, which relies on output from a hybrid model. A hybrid model with our learned sub-word lexicon reduces error by 6.3 % and 7.6 % (absolute) at a 5 % false alarm rate on an English Broadcast News and MIT Lectures task respectively. 1
Subword-based Automatic Lexicon Learning for ASR
"... Abstract—We present a framework for learning a pronunciation lexicon for an Automatic Speech Recognition (ASR) system from multiple utterances of the same training words, where the lexical identities of the words are unknown. Instead of only trying to learn pronunciations for known words we go one s ..."
Abstract
- Add to MetaCart
Abstract—We present a framework for learning a pronunciation lexicon for an Automatic Speech Recognition (ASR) system from multiple utterances of the same training words, where the lexical identities of the words are unknown. Instead of only trying to learn pronunciations for known words we go one step further and try to learn both spelling and pronunciation in a joint optimization. Decoding based on linguistically motivated hybrid subword units generates the joint lexical search space, which is reduced to the most appropriate lexical entries based on a set of simple pruning techniques. A cascade of letter and acoustic pruning, followed by re-scoring N-best hypotheses with discriminative decoder statistics resulted optimal lexical entries in terms of both spelling and pronunciation. Evaluating the framework on English isolated word recognition, we achieve reductions of 7.7 % absolute on word error rate and 14.4 % absolute on character error rate. I.

