Results 1  10
of
31
Weighted finitestate transducers in speech recognition
 COMPUTER SPEECH & LANGUAGE
, 2002
"... We survey the use of weighted finitestate transducers (WFSTs) in speech recognition. We show that WFSTs provide a common and natural representation for hidden Markov models (HMMs), contextdependency, pronunciation dictionaries, grammars, and alternative recognition outputs. Furthermore, general tr ..."
Abstract

Cited by 199 (5 self)
 Add to MetaCart
(Show Context)
We survey the use of weighted finitestate transducers (WFSTs) in speech recognition. We show that WFSTs provide a common and natural representation for hidden Markov models (HMMs), contextdependency, pronunciation dictionaries, grammars, and alternative recognition outputs. Furthermore, general transducer operations combine these representations flexibly and efficiently. Weighted determinization and minimization algorithms optimize their time and space requirements, and a weight pushing algorithm distributes the weights along the paths of a weighted transducer optimally for speech recognition. As an example, we describe a North American Business News (NAB) recognition system built using these techniques that combines the HMMs, full crossword triphones, a lexicon of 40 000 words, and a large trigram grammar into a single weighted transducer that is only somewhat larger than the trigram word grammar and that runs NAB in realtime on a very simple decoder. In another example, we show that the same techniques can be used to optimize lattices for secondpass recognition. In a third example, we show how general automata operations can be used to assemble lattices from different recognizers to improve recognition performance.
Discriminative speaker adaptation with conditional maximum likelihood linear regression
 In Eurospeech
, 2001
"... We present a simplified derivation of the extended BaumWelch procedure, which shows that it can be used for Maximum Mutual Information (MMI) of a large class of continuous emission density hidden Markov models (HMMs). We use the extended BaumWelch procedure for discriminative estimation of MLLRty ..."
Abstract

Cited by 31 (2 self)
 Add to MetaCart
We present a simplified derivation of the extended BaumWelch procedure, which shows that it can be used for Maximum Mutual Information (MMI) of a large class of continuous emission density hidden Markov models (HMMs). We use the extended BaumWelch procedure for discriminative estimation of MLLRtype speaker adaptation transformations. The resulting adaptation procedure, termed Conditional Maximum Likelihood Linear Regression (CMLLR), is used successfully for supervised and unsupervised adaptation tasks on the Switchboard corpus, yielding an improvement over MLLR. The interaction of unsupervised CMLLR with segmental minimum Bayes risk lattice voting procedures is also explored, showing that the two procedures are complimentary. 1.
Discriminative linear transforms for feature normalization and speaker adaptation in HMM estimation
, 2002
"... Linear transforms have been used extensively for training and adaptation of HMMbased ASR systems. Recently procedures have been developed for the estimation of linear transforms under the Maximum Mutual Information (MMI) criterion. In this paper we introduce discriminative training procedures that ..."
Abstract

Cited by 27 (3 self)
 Add to MetaCart
Linear transforms have been used extensively for training and adaptation of HMMbased ASR systems. Recently procedures have been developed for the estimation of linear transforms under the Maximum Mutual Information (MMI) criterion. In this paper we introduce discriminative training procedures that employ linear transforms for feature normalization and for speaker adaptive training. We integrate these discriminative linear transforms into MMI estimation of HMM parameters for improvement of large vocabulary conversational speech recognition systems. 1.
Towards automatic closed captioning: Low latency real time broadcast news transcription
, 2002
"... In this paper, we present a low latency realtime Broadcast News recognition system capable of transcribing live television newscasts with reasonable accuracy. We describe our recent modeling and efficiency improvements that yield a 22 % word error rate on the Hub4e98 test set while running faster t ..."
Abstract

Cited by 22 (5 self)
 Add to MetaCart
(Show Context)
In this paper, we present a low latency realtime Broadcast News recognition system capable of transcribing live television newscasts with reasonable accuracy. We describe our recent modeling and efficiency improvements that yield a 22 % word error rate on the Hub4e98 test set while running faster than realtime. These include the discriminative training of a feature transform and the acoustic model, and the optimization of the likelihood computation. We give experimental results that show the accuracy of the system at different speeds. We also explain how we achieved low latency, presenting measurements that show the typical system latency is less than 1 second. 1.
Lattice Segmentation and Minimum Bayes Risk Discriminative Training for Large . . .
 IN PROC. EUROSPEECH
, 2005
"... Lattice segmentation techniques developed for Minimum Bayes Risk decoding in large vocabulary speech recognition tasks are used to compute the statistics for discriminative training algorithms that estimate HMM parameters so as to reduce the overall risk over the training data. New estimation proced ..."
Abstract

Cited by 20 (6 self)
 Add to MetaCart
Lattice segmentation techniques developed for Minimum Bayes Risk decoding in large vocabulary speech recognition tasks are used to compute the statistics for discriminative training algorithms that estimate HMM parameters so as to reduce the overall risk over the training data. New estimation procedures are developed and evaluated for small vocabulary and large vocabulary recognition tasks, and additive performance improvements are shown relative to maximum mutual information estimation. These relative gains are explained through a detailed analysis of individual word recognition errors.
A Weight Pushing Algorithm for Large Vocabulary Speech Recognition
 IN EUROPEAN CONF. ON SPEECH COMMUNICATION AND TECHNOLOGY
, 2001
"... Weighted finitestate transducers provide a general framework for the representation of the components of speech recognition systems; language models, pronunciation dictionaries, contextdependent models, HMMlevel acoustic models, and the output word or phone lattices can all be represented by weigh ..."
Abstract

Cited by 19 (10 self)
 Add to MetaCart
Weighted finitestate transducers provide a general framework for the representation of the components of speech recognition systems; language models, pronunciation dictionaries, contextdependent models, HMMlevel acoustic models, and the output word or phone lattices can all be represented by weighted automata and transducers. In general, a representation is not unique and there may be different weighted transducers realizing the same mapping. In particular, even when they have exactly the same topology with the same input and output labels, two equivalent transducers may differ by the way the weights are distributed along each path. We present
A generalized construction of integrated speech recognition transducers
 In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP
, 2004
"... We showed in previous work that weighted finitestate transducers provide a common representation for many components of a speech recognition system and described general algorithms for combining these representations to build a single optimized and compact transducer integrating all these component ..."
Abstract

Cited by 19 (0 self)
 Add to MetaCart
(Show Context)
We showed in previous work that weighted finitestate transducers provide a common representation for many components of a speech recognition system and described general algorithms for combining these representations to build a single optimized and compact transducer integrating all these components, directly mapping from HMM states to words. This approach works well for certain wellcontrolled input transducers, but presents some problems related to the efficiency of composition and the applicability of determinization and weightpushing with more general transducers. We generalize our prior construction of the integrated speech recognition transducer to work with an arbitrary number of component transducers and, to a large extent, release the constraints imposed to the type of input transducers by providing more general solutions to these problems. This generalization allowed us to deal with cases where our prior optimization did not apply. Our experiments in the AT&T HMIHY 0300 task and an AT&T VoiceTone task show the efficiency of our generalized optimization technique. We report a 1.6 recognition speedup in the HMIHY 0300 task, 1.8 speedup in a VoiceTone task using a wordbased language model, and 1.7 using a classbased model. 1.
Exact AlphaBeta Computation in Logarithmic Space with Application to MAP Word Graph Construction
 Int. Conf. on Spoken Lanugage Processing
, 2000
"... The classical dynamic programming recursions for the forwardsbackwards and Viterbi HMM algorithms are linear in the number of time frames being processed. Adapting the method of [8] to the context of speech recognition, this paper uses a recursive divideandconquer algorithm to reduce the space req ..."
Abstract

Cited by 16 (4 self)
 Add to MetaCart
(Show Context)
The classical dynamic programming recursions for the forwardsbackwards and Viterbi HMM algorithms are linear in the number of time frames being processed. Adapting the method of [8] to the context of speech recognition, this paper uses a recursive divideandconquer algorithm to reduce the space requirement to logarithmic in the number of frames. With this procedure, it is possible to do exact computations for observation sequences of essentially arbitrary length. The procedure works by manipulating a stack of alpha vectors, and by using sparse vectors, the space savings can be combined with those of traditional pruning techniques. We apply this technique to MAP lattice construction, and present the first results in the literature for that technique. We find that it is an effective way of creating word lattices, and that doing the exact computations enabled by the logspace technique results in lower word error rates than space saving via traditional pruning. 1 Introduction Hidden Ma...
A Comparison Of Two LVR Search Optimization Techniques
 in Proc. Int. Conf. Spoken Language Processing
, 2002
"... This paper presents a detailed comparison between two search optimization techniques for large vocabulary speech recognition  one based on wordconditioned tree search (WCTS) and one based on weighted finitestate transducers (WFSTs). Existing North American Business News systems from RWTH and AT& ..."
Abstract

Cited by 15 (1 self)
 Add to MetaCart
(Show Context)
This paper presents a detailed comparison between two search optimization techniques for large vocabulary speech recognition  one based on wordconditioned tree search (WCTS) and one based on weighted finitestate transducers (WFSTs). Existing North American Business News systems from RWTH and AT&T representing each of the two approaches, were modified to remove variations in model data and acoustic likelihood computation. An experimental comparison showed that the WFSTbased system explored fewer search states and had less runtime overhead than the WCTSbased system for a given word error rate. This is attributed to differences in the precompilation, degree of nondeterminism, and path weight distribution in the respective search graphs.
Strategies de perception par vision active pour la reconstruction et l'exploration de scnes statiques
 PhD Thesis, Universit de Rennes 1, IRISA
, 1996
"... Abstract—This paper presents an algorithm for the composition of weighted finitestate transducers which is specially tailored to speech recognition applications: it composes the lexicon with the language model while simultaneously optimizing the resulting transducer. Furthermore, it performs these ..."
Abstract

Cited by 14 (1 self)
 Add to MetaCart
(Show Context)
Abstract—This paper presents an algorithm for the composition of weighted finitestate transducers which is specially tailored to speech recognition applications: it composes the lexicon with the language model while simultaneously optimizing the resulting transducer. Furthermore, it performs these computations “onthefly ” to allow easier management of the tradeoff between offline and online computation and memory. The algorithm is exact for local knowledge integration and optimization operations such as composition and determinization. Minimization and pushing operations are approximated. Our results have confirmed the efficiency of these approximations. Index Terms—Speech recognition, weighted finitestate transducers (WFSTs). I.