Results 1 - 10
of
14
Speech Recognition by Composition of Weighted Finite Automata
- FINITE-STATE LANGUAGE PROCESSING
, 1996
"... We present a general framework based on weighted finite automata and weighted finite-state transducers for describing and implementing speech recognizers. The framework allows us to represent uniformly the information sources and data structures used in recognition, including context-dependent u ..."
Abstract
-
Cited by 103 (11 self)
- Add to MetaCart
We present a general framework based on weighted finite automata and weighted finite-state transducers for describing and implementing speech recognizers. The framework allows us to represent uniformly the information sources and data structures used in recognition, including context-dependent units, pronunciation dictionaries, language models and lattices. Furthermore, general but efficient algorithms can used for combining information sources in actual recognizers and for optimizing their application. In particular, a single composition algorithm is used both to combine in advance information sources such as language models and dictionaries, and to combine acoustic observations and information sources dynamically during recognition.
The Design Principles of a Weighted Finite-State Transducer Library
- THEORETICAL COMPUTER SCIENCE
, 2000
"... We describe the algorithmic and software design principles of an object-oriented library for weighted finite-state transducers. By taking advantage of the theory of rational power series, we were able to achieve high degrees of generality, modularity and irredundancy, while attaining competitive eff ..."
Abstract
-
Cited by 82 (19 self)
- Add to MetaCart
We describe the algorithmic and software design principles of an object-oriented library for weighted finite-state transducers. By taking advantage of the theory of rational power series, we were able to achieve high degrees of generality, modularity and irredundancy, while attaining competitive efficiency in demanding speech processing applications involving weighted automata of more than 10^7 states and transitions. Besides its mathematical foundation, the design also draws from important ideas in algorithm design and programming languages: dynamic programming and shortest-paths algorithms over general semirings, object-oriented programming, lazy evaluation and memoization.
A Rational Design for a Weighted Finite-State Transducer Library
- LECTURE NOTES IN COMPUTER SCIENCE
, 1998
"... ..."
Full Expansion Of Context-Dependent Networks In Large Vocabulary Speech Recognition
- Proceedings of ICASSP 98
, 1998
"... We combine our earlier approach to context-dependent network representation with our algorithm for determinizing weighted networks to build optimized networks for large-vocabulary speech recognition combining an n-gram language model, a pronunciation dictionary and context-dependency modeling. While ..."
Abstract
-
Cited by 32 (12 self)
- Add to MetaCart
We combine our earlier approach to context-dependent network representation with our algorithm for determinizing weighted networks to build optimized networks for large-vocabulary speech recognition combining an n-gram language model, a pronunciation dictionary and context-dependency modeling. While fullyexpanded networks have been used before in restrictive settings (medium vocabulary or no cross-word contexts), we demonstrate that our network determinization method makes it practical to use fully-expanded networks also in large-vocabulary recognition with full cross-word context modeling. For the DARPA North American Business News task (NAB), we give network sizes and recognition speeds and accuracies using bigram and trigram grammars with vocabulary sizes ranging from 10,000 to 160,000 words. With our construction, the fully-expanded NAB context-dependent networks contain only about twice as many arcs as the corresponding language models. Interestingly, we also find that, with these...
A Finite-State Approach to Machine Translation
- In Proc. of the North American Chapter of the Association for Computational Linguistics
, 2001
"... The problem of machine translation can be viewed as consisting of two subproblems (a) Lexical Selection and (b) Lexical Reordering. We propose stochas- tic finite-state models for these two subproblems in this paper. Stochastic finite-state models are efficiently learnable from data, effective for d ..."
Abstract
-
Cited by 20 (1 self)
- Add to MetaCart
The problem of machine translation can be viewed as consisting of two subproblems (a) Lexical Selection and (b) Lexical Reordering. We propose stochas- tic finite-state models for these two subproblems in this paper. Stochastic finite-state models are efficiently learnable from data, effective for decoding and are associated with a calculus for composing models which allows for tight integration of constraints from various levels of language processing. We present a method for learning stochastic finitestate models for lexical choice and lexical reordering that are trained automatically from pairs of source and target utterances. We use this method to develop models for English-Japanese translation and present the performance of these models for translation on speech and text. We also evaluate the efficacy of such a translation model in the context of a call routing task of unconstrained speech utter- ances.
Stochastic Language Adaptation over Time and State in Natural Spoken Dialogue Systems
, 2000
"... We are interested in adaptive spoken dialogue systems for automated services. Peoples' spoken language usage varies over time for a given task, and furthermore varies depending on the state of the dialogue. Thus, it is crucial to adapt ASR language models to these varying conditions. We characterize ..."
Abstract
-
Cited by 20 (1 self)
- Add to MetaCart
We are interested in adaptive spoken dialogue systems for automated services. Peoples' spoken language usage varies over time for a given task, and furthermore varies depending on the state of the dialogue. Thus, it is crucial to adapt ASR language models to these varying conditions. We characterize and quantify these variations based on a database of 30K user-transactions with AT&T's experimental How May I Help You ? spoken dialogue system. We describe a novel adaptation algorithm for language models with time and dialogue-state varying parameters. Our language adaptation framework allows for recognizing and understanding unconstrained speech at each stage of the dialogue, enabling context-switching and error recovery. These models have been used to train state-dependent ASR language models. We have evaluated their performance with respect to word accuracy and perplexity over time and dialogue states. We have achieved a reduction of 40% in perplexity and of 8:4% in word error rate ov...
A Spoken Language System For Automated Call Routing
, 1997
"... We are interested in the problem of understanding fluently spoken language. In particular, we consider people's responses to the open-ended prompt of 'How May I help you?'. We then further restrict the problem to classifying and automatically routing such a call, based on the meaning of the user's r ..."
Abstract
-
Cited by 15 (3 self)
- Add to MetaCart
We are interested in the problem of understanding fluently spoken language. In particular, we consider people's responses to the open-ended prompt of 'How May I help you?'. We then further restrict the problem to classifying and automatically routing such a call, based on the meaning of the user's response. Thus, we aim at extracting a relatively small number of semantic actions from the utterances of a very large set of users who are not trained to the system's capabilities and limitations. In this paper, we describe the main components of our speech understanding system: the large vocabulary recognizer and the language understanding module performing the call-type classification. In particular, we propose automatic algorithms for selecting phrases from a training corpus in order to enhance the prediction power of the standard word n-gram The phrase language models are integrated into stochastic finite state machines which outperform standard word n-gram language models. From the spee...
Transducer Composition for Context-Dependent Network Expansion
- In Proceedings of Eurospeech'97. Rhodes
, 1997
"... Context-dependent models for language units are essential in highaccuracy speech recognition. However, standard speech recognition frameworks are based on the substitution of lower-level models for higher-level units. Since substitution cannot express context-dependency constraints, actual recog ..."
Abstract
-
Cited by 10 (7 self)
- Add to MetaCart
Context-dependent models for language units are essential in highaccuracy speech recognition. However, standard speech recognition frameworks are based on the substitution of lower-level models for higher-level units. Since substitution cannot express context-dependency constraints, actual recognizers use restrictive model-structure assumptions and specialized code for context-dependent models, leading to decreased flexibility and lost opportunities for automatic model optimization. Instead, we propose a recognition framework that builds in the possibility of context dependency from the start by using weighted finite-state transduction rather than substitution. The framework is implemented with a general demand-driven transducer composition algorithm that allows great flexibility in model structure, form of context dependency and network expansion method, while achieving competitive recognition performance. 1 Introduction 1.1 The Substitution Architecture In the standard...
On Integrating the Lexicon with the Language Model
, 2001
"... The goal of this work was to develop an algorithm for the integration of the lexicon with the language model which would be computationally efficient in terms of memory requirements, even in the case of large trigram models. Two specialized versions of the algorithm for transducer composition were i ..."
Abstract
-
Cited by 9 (6 self)
- Add to MetaCart
The goal of this work was to develop an algorithm for the integration of the lexicon with the language model which would be computationally efficient in terms of memory requirements, even in the case of large trigram models. Two specialized versions of the algorithm for transducer composition were implemented. The first one is basically a composition algorithm that uses the precomputed set of the output labels that can be reached from a particular epsilon edge of the lexicon
A Comparison Of Two LVR Search Optimization Techniques
- in Proc. Int. Conf. Spoken Language Processing
, 2002
"... This paper presents a detailed comparison between two search optimization techniques for large vocabulary speech recognition -- one based on word-conditioned tree search (WCTS) and one based on weighted finite-state transducers (WFSTs). Existing North American Business News systems from RWTH and AT& ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
This paper presents a detailed comparison between two search optimization techniques for large vocabulary speech recognition -- one based on word-conditioned tree search (WCTS) and one based on weighted finite-state transducers (WFSTs). Existing North American Business News systems from RWTH and AT&T representing each of the two approaches, were modified to remove variations in model data and acoustic likelihood computation. An experimental comparison showed that the WFST-based system explored fewer search states and had less runtime overhead than the WCTS-based system for a given word error rate. This is attributed to differences in the pre-compilation, degree of non-determinism, and path weight distribution in the respective search graphs.

