Results 1 - 10
of
18
Weighted Finite-State Transducers in Speech Recognition
, 2001
"... We survey the use of weighted finite-state transducers (WFSTs) in speech recognition. We show that WFSTs provide a common and natural representation for HMM models, context-dependency, pronunciation dictionaries, grammars, and alternative recognition outputs. Furthermore, general transducer oper ..."
Abstract
-
Cited by 101 (3 self)
- Add to MetaCart
We survey the use of weighted finite-state transducers (WFSTs) in speech recognition. We show that WFSTs provide a common and natural representation for HMM models, context-dependency, pronunciation dictionaries, grammars, and alternative recognition outputs. Furthermore, general transducer operations combine these representations flexibly and efficiently. Weighted
Full Expansion Of Context-Dependent Networks In Large Vocabulary Speech Recognition
- Proceedings of ICASSP 98
, 1998
"... We combine our earlier approach to context-dependent network representation with our algorithm for determinizing weighted networks to build optimized networks for large-vocabulary speech recognition combining an n-gram language model, a pronunciation dictionary and context-dependency modeling. While ..."
Abstract
-
Cited by 32 (12 self)
- Add to MetaCart
We combine our earlier approach to context-dependent network representation with our algorithm for determinizing weighted networks to build optimized networks for large-vocabulary speech recognition combining an n-gram language model, a pronunciation dictionary and context-dependency modeling. While fullyexpanded networks have been used before in restrictive settings (medium vocabulary or no cross-word contexts), we demonstrate that our network determinization method makes it practical to use fully-expanded networks also in large-vocabulary recognition with full cross-word context modeling. For the DARPA North American Business News task (NAB), we give network sizes and recognition speeds and accuracies using bigram and trigram grammars with vocabulary sizes ranging from 10,000 to 160,000 words. With our construction, the fully-expanded NAB context-dependent networks contain only about twice as many arcs as the corresponding language models. Interestingly, we also find that, with these...
Dynamic Programming Search for Continuous Speech Recognition
, 1999
"... . Initially introduced in the late 1960s and early 1970s, dynamic programming algorithms have become increasingly popular in automatic speech recognition. There are two reasons why this has occurred: First, the dynamic programming strategy can be combined with avery e#cient and practical pruning str ..."
Abstract
-
Cited by 30 (0 self)
- Add to MetaCart
. Initially introduced in the late 1960s and early 1970s, dynamic programming algorithms have become increasingly popular in automatic speech recognition. There are two reasons why this has occurred: First, the dynamic programming strategy can be combined with avery e#cient and practical pruning strategy so that very large search spaces can be handled. Second, the dynamic programming strategy has turned out to be extremely #exible in adapting to new requirements. Examples of such requirements are the lexical tree organization of the pronunciation lexicon and the generation of a word graph instead of the single best sentence. In this paper, we attempt to systematically review the use of dynamic programming search strategies for small#vocabulary and large#vocabulary continuous speech recognition. The following methods are described in detail: search using a linear lexicon, search using a lexical tree, language-model look-ahead and word graph generation. 1 Introduction Search strategie...
Look-Ahead Techniques For Fast Beam Search
, 1997
"... this paper, we present two efficient look-ahead pruning techniques in beam search for large vocabulary continuous speech recognition. Both techniques, the language model look-ahead and the phoneme look-ahead, are incorporated into the word conditioned search algorithm using a bigram language model a ..."
Abstract
-
Cited by 20 (8 self)
- Add to MetaCart
this paper, we present two efficient look-ahead pruning techniques in beam search for large vocabulary continuous speech recognition. Both techniques, the language model look-ahead and the phoneme look-ahead, are incorporated into the word conditioned search algorithm using a bigram language model and a lexical prefix tree [5]. The paper present the following novel contributions: ffl We describe a method for language model (LM) look-ahead pruning which is similar to [1, 9]. We show special techniques to reduce the memory and computational requirements. These techniques are based on a compressed LM look-ahead tree. To compute the LM look-ahead tree probabilites in an efficient way, we present a backward dynamic programming scheme
A Comparison Of Time Conditioned And Word Conditioned Search Techniques For Large Vocabulary Speech Recognition
- Proc. Int. Conf. on Spoken Language Processing
, 1996
"... In this paper, we compare the search effort of the word conditioned and the time conditioned tree search methods. Both methods are based on a time-synchronous, left-to-right beam search using a treeorganized lexicon. Whereas the word conditioned method is well known and widely used, the time conditi ..."
Abstract
-
Cited by 19 (14 self)
- Add to MetaCart
In this paper, we compare the search effort of the word conditioned and the time conditioned tree search methods. Both methods are based on a time-synchronous, left-to-right beam search using a treeorganized lexicon. Whereas the word conditioned method is well known and widely used, the time conditioned method is novel in the context of 20 000--word vocabulary recognition. We extend both methods to handle trigram language models in a one--pass strategy. Both methods were tested on a train schedule inquiry task (1 850 words, telephone speech) and on the North American Business (Nov.'94) development corpus (20 000 words).
Network Optimizations for Large Vocabulary Speech Recognition
- Speech Communication
, 1998
"... The redundancy and the size of networks in large-vocabulary speech recognition systems can have a critical effect on their overall performance. We describe the use of two new algorithms: weighted determinization and minimization [12]. These algorithms transform recognition labeled networks into equi ..."
Abstract
-
Cited by 16 (7 self)
- Add to MetaCart
The redundancy and the size of networks in large-vocabulary speech recognition systems can have a critical effect on their overall performance. We describe the use of two new algorithms: weighted determinization and minimization [12]. These algorithms transform recognition labeled networks into equivalent ones that require much less time and space in large-vocabulary speech recognition. They are both optimal: weighted determinization eliminates the number of alternatives at each state to the minimum, and weighted minimization reduces the size of deterministic networks to the smallest possible number of states and transitions. These algorithms generalize classical automata determinization and minimization to deal properly with the probabilities of alternative hypotheses and with the relationships between units (distributions, phones, words) at different levels in the recognition system. We illustrate their use in several applications, and report the results of our experiments. Key words...
The MIT finite-state transducer toolkit for speech and language processing
- in Proc. ICSLP
, 2004
"... We present the MIT Finite-State Transducer Toolkit and briefly describe research that it has benefitted. The toolkit is a collection of command-line tools and associated C++ API for manipulating finite-state transducers (FSTs) and acceptors (FSAs) and has been designed to enable research through its ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
We present the MIT Finite-State Transducer Toolkit and briefly describe research that it has benefitted. The toolkit is a collection of command-line tools and associated C++ API for manipulating finite-state transducers (FSTs) and acceptors (FSAs) and has been designed to enable research through its flexibility, yet remain efficient enough to aid real-world computationally demanding applications such as automatic speech recognition. The toolkit supports the construction, combination, optimization, and training of weighted FSTs and FSAs, and as such is useful in many areas of human language technology. 1.
Fast search for large vocabulary speech recognition
- in Verbmobil: Foundations of Speech-to-Speech Translation, W. Wahlster, Ed
, 2000
"... Abstract. In this article we describe methods for improving the RWTH German speech recognizer used within the VERBMOBIL project. In particular, we present acceleration methods for the search based on both within-word and across-word phoneme models. We also study incremental methods to reduce the res ..."
Abstract
-
Cited by 11 (11 self)
- Add to MetaCart
Abstract. In this article we describe methods for improving the RWTH German speech recognizer used within the VERBMOBIL project. In particular, we present acceleration methods for the search based on both within-word and across-word phoneme models. We also study incremental methods to reduce the response time of the online speech recognizer. Finally, we present experimental off-line results for the three VERBMOBIL scenarios. We report on word error rates and real-time factors for both speaker independent and speaker dependent recognition. 1
Recent Improvements Of The RWTH Large Vocabulary Speech Recognition System On Spontaneous Speech
- Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing
, 2000
"... This paper presents recent improvements of the RWTH large vocabulary continuous speech recognition system (LVCSR). In particular, we will report on the integration of across-word models into the rst recognition pass, and describe better algorithms for fast vocal tract normalization (VTN). We will fo ..."
Abstract
-
Cited by 8 (5 self)
- Add to MetaCart
This paper presents recent improvements of the RWTH large vocabulary continuous speech recognition system (LVCSR). In particular, we will report on the integration of across-word models into the rst recognition pass, and describe better algorithms for fast vocal tract normalization (VTN). We will focus both on the improvements in word error rate and how to speed up the recognizer with only minimal loss in recognition accuracy. Implementation details and experimental results are given for the VerbMobil task, a German spontaneous speech corpus. The 25.0% word error rate (WER) of our within-word baseline system was reduced to 21.4% with VTN and across-word models. Decreasing the real-time factor (RTF) by up to 85% resulted in only a small degradation in recognition performance of 2% relative on average. 1. INTRODUCTION The RWTH LVCSR system is a continuous Gaussian mixture density speech recognition system, which has been described in detail in [6]. The baseline system is a trigram Vit...
Strategies de perception par vision active pour la reconstruction et l'exploration de scnes statiques
- PhD Thesis, Universit de Rennes 1, IRISA
, 1996
"... Abstract—This paper presents an algorithm for the composition of weighted finite-state transducers which is specially tailored to speech recognition applications: it composes the lexicon with the language model while simultaneously optimizing the resulting transducer. Furthermore, it performs these ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Abstract—This paper presents an algorithm for the composition of weighted finite-state transducers which is specially tailored to speech recognition applications: it composes the lexicon with the language model while simultaneously optimizing the resulting transducer. Furthermore, it performs these computations “on-the-fly ” to allow easier management of the tradeoff between offline and online computation and memory. The algorithm is exact for local knowledge integration and optimization operations such as composition and determinization. Minimization and pushing operations are approximated. Our results have confirmed the efficiency of these approximations. Index Terms—Speech recognition, weighted finite-state transducers (WFSTs). I.

