Results 11 -
18 of
18
BYBLOS Speech Recognition Benchmark Results
- In DARPA speech and natural language workshop
, 1991
"... This paper presents speech recognition test results from the BBN BYBLOS system on the Feb 91 DARPA benchmarks in both the Resource Management (RM) and the Air Travel Information System (ATIS) domains. In the RM test, we report on speaker-independent (SI) recognition performance for the standard trai ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
This paper presents speech recognition test results from the BBN BYBLOS system on the Feb 91 DARPA benchmarks in both the Resource Management (RM) and the Air Travel Information System (ATIS) domains. In the RM test, we report on speaker-independent (SI) recognition performance for the standard training condition using 109 speakers and for our recently proposed SI model made from only 12 training speakers. Surprisingly, the 12-speaker model performs as well as the one made from 109 speakers. Also within the RM do-main, we demonstrate that state-of-the-art SI models perform poorly for speakers with strong dialects. But we show that this degradation can be overcome by using speaker adaptation from multiple-reference speakers. For the ATIS benchmarks, we ran a new system conligu-ration which first produced a rank-ordered list of the N--best word-sequence hypotheses. The list of hypotheses was then reordered using more detailed acoustic and language models. In the ATIS bench-marks, we report SI recognition results on two conditions. The first is a baseline condition using only training data available from NIST on CD-ROM and a word-based statistical hi-gram grammar developed at MIT/Lincoln. In the second condition, we added training data from speakers collected at BBN and used a 4-gram class grammar. These changes reduced the word error rate by 25%.
Advances in speech transcriptions at IBM under the DARPA EARS program
- IEEE Transactions on Audio, Speech, and Language Processing, accepted for publication
, 2000
"... Abstract—This paper describes the technical and system building advances made in IBM’s speech recognition technology over the course of the Defense Advanced Research Projects Agency (DARPA) Effective Affordable Reusable Speech-to-Text (EARS) program. At a technical level, these advances include the ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Abstract—This paper describes the technical and system building advances made in IBM’s speech recognition technology over the course of the Defense Advanced Research Projects Agency (DARPA) Effective Affordable Reusable Speech-to-Text (EARS) program. At a technical level, these advances include the development of a new form of feature-based minimum phone error training (fMPE), the use of large-scale discriminatively trained full-covariance Gaussian models, the use of septaphone acoustic context in static decoding graphs, and improvements in basic decoding algorithms. At a system building level, the advances include a system architecture based on cross-adaptation and the incorporation of 2100 h of training data in every system component. We present results on English conversational telephony test data from the 2003 and 2004 NIST evaluations. The combination of technical advances and an order of magnitude more training data in 2004 reduced the error rate on the 2003 test set by approximately 21 % relative—from 20.4 % to 16.1%—over the most accurate system in the 2003 evaluation and produced the most accurate results on the 2004 test sets in every speed category. Index Terms—Discriminative training, Effective Affordable Reusable Speech-to-Text (EARS), finite-state transducer, full
Using natural-language knowledge sources in speech recognition
- Computational Models of Speech Pattern Processing
, 1999
"... At the current state of the art, high-accuracy speech recognition with moderate to large vocabularies ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
At the current state of the art, high-accuracy speech recognition with moderate to large vocabularies
Hidden Markov Models And Selectively Trained Neural Networks For Connected Confusable Word Recognition
- In ICSLP, pp
, 1994
"... This paper presents a new method for connected-word recognition with confusable vocabularies, such as connected letters. The recognition process is performed in two steps. First, a second-order HMM provides N-best word strings. Then, the strings of confusable letters are discriminated by a procedure ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
This paper presents a new method for connected-word recognition with confusable vocabularies, such as connected letters. The recognition process is performed in two steps. First, a second-order HMM provides N-best word strings. Then, the strings of confusable letters are discriminated by a procedure based on acoustic knowledge and artificial neural networks (ANN). This method has been tested on an American-English database containing spelled names collected through the telephone network. The results obtained with the first HMM pass and the improvements made with the ANN are presented and discussed. When a 3,300 name dictionary and a retrieval procedure based on a DTW alignment algorithm were used, 96% recognition accuracy was obtained. I. INTRODUCTION The performances of HMM recognizers are now quite satisfactory for small vocabularies. However, in the case of confusable words, results are not sufficient for real-world applications, especially in adverse conditions like noisy or tele...
Robust Estimation of Stocchastic Segment Models for Word Recognition
, 1990
"... In this work, we develop robust estimation techniques for a continuous-word recognition system using the Stochastic Segment model (SSM). This work is done under the N-best rescoring formalism, where a less complex system than the SSM is used to generate candidate hypotheses which are then rescored a ..."
Abstract
- Add to MetaCart
In this work, we develop robust estimation techniques for a continuous-word recognition system using the Stochastic Segment model (SSM). This work is done under the N-best rescoring formalism, where a less complex system than the SSM is used to generate candidate hypotheses which are then rescored and reranked by the SSM. Components of the system that are the focus of this work include estimation of weights for score combination and robust parameter estimation using clustering techniques to model context. In particular, we develop several agglomerative and divisive clustering techniques for multivariate Gaussian distributions, which we use to cluster triphone models. This leads to better estimates with fewer parameters resulting in reduction in word error and storage/computation costs over using unclustered triphones. We also implement an SSM system based on microsegments which combines mixture modeling with trajectory modeling and examine the tradeoffs involved between the allocation ...
ISADORA - a Speech Modelling Network Based on Hidden Markov Models
- on Hidden Markov Models. Computer Speech & Language
, 1993
"... In this paper we present the ISADORA system which provides highly flexible speech recognition based on HMM technology together with an hierarchical representation of speech units. Markov model topologies, subword unit inventories, regular grammars expressed in finite-state or phrase structure style, ..."
Abstract
- Add to MetaCart
In this paper we present the ISADORA system which provides highly flexible speech recognition based on HMM technology together with an hierarchical representation of speech units. Markov model topologies, subword unit inventories, regular grammars expressed in finite-state or phrase structure style, and even the analysis tasks themselves are explicitly represented by the nodes of a large speech unit network. Thus, nothing that can be "said in the language of Markov models" needs to be hard-wired in the program code. In contrast to traditional compiled network recognizers, units, grammars, and tasks may be created or modified at analysis time, and the outcome of the decoding process is a structured symbolic description of the sensory input. Our architecture has proven extremely useful in prototyping new kinds of subword units. Besides generalized triphones and context-freezing units, a new subword speech unit for automatic speech recognition has been implemented. The so-called polyphone...
Incremental Generation Of Word Graphs
"... We present an algorithm for the incremental generation of word graphs. Incremental means that the speech signal is processed left-to-rightby a time synchronous Viterbi algorithm and word hypotheses are generated with some delay to Viterbi decoding. The incrementally generated word hypotheses can be ..."
Abstract
- Add to MetaCart
We present an algorithm for the incremental generation of word graphs. Incremental means that the speech signal is processed left-to-rightby a time synchronous Viterbi algorithm and word hypotheses are generated with some delay to Viterbi decoding. The incrementally generated word hypotheses can be used for early interaction between linguistic analysis and acoustic recognition. Therefore, it is possible to derive acoustic constraints from linguistic restrictions dynamically. 1.
continuous speech understanding for
, 1996
"... Integrated speech and morphological processing in a connectionist ..."

