Results 1 -
7 of
7
The MIT finite-state transducer toolkit for speech and language processing
- in Proc. ICSLP
, 2004
"... We present the MIT Finite-State Transducer Toolkit and briefly describe research that it has benefitted. The toolkit is a collection of command-line tools and associated C++ API for manipulating finite-state transducers (FSTs) and acceptors (FSAs) and has been designed to enable research through its ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
We present the MIT Finite-State Transducer Toolkit and briefly describe research that it has benefitted. The toolkit is a collection of command-line tools and associated C++ API for manipulating finite-state transducers (FSTs) and acceptors (FSAs) and has been designed to enable research through its flexibility, yet remain efficient enough to aid real-world computationally demanding applications such as automatic speech recognition. The toolkit supports the construction, combination, optimization, and training of weighted FSTs and FSAs, and as such is useful in many areas of human language technology. 1.
Using Dynamic Wfst Composition For Recognizing Broadcast News
, 2002
"... Our first application of weighted finite state transducers to the recognition of broadcast news provided us with an interesting framework to study several problems related to the optimization of the search space. The paper starts by describing how the use of our lexicon and language model "on-the-fl ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
Our first application of weighted finite state transducers to the recognition of broadcast news provided us with an interesting framework to study several problems related to the optimization of the search space. The paper starts by describing how the use of our lexicon and language model "on-the-fly" composition algorithm is crucial in extending the transducer approach to large systems. We present an efficient representation for WFSTs, that allowed us to reduce runtime memory requirements, and discuss several types of language model optimizations, including a context-sharing algorithm. Experimental results obtained with the broadcast news corpus collected for European Portuguese illustrate the impact of the various possible optimizations of the components on the performance of the system.
Strategies de perception par vision active pour la reconstruction et l'exploration de scnes statiques
- PhD Thesis, Universit de Rennes 1, IRISA
, 1996
"... Abstract—This paper presents an algorithm for the composition of weighted finite-state transducers which is specially tailored to speech recognition applications: it composes the lexicon with the language model while simultaneously optimizing the resulting transducer. Furthermore, it performs these ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Abstract—This paper presents an algorithm for the composition of weighted finite-state transducers which is specially tailored to speech recognition applications: it composes the lexicon with the language model while simultaneously optimizing the resulting transducer. Furthermore, it performs these computations “on-the-fly ” to allow easier management of the tradeoff between offline and online computation and memory. The algorithm is exact for local knowledge integration and optimization operations such as composition and determinization. Minimization and pushing operations are approximated. Our results have confirmed the efficiency of these approximations. Index Terms—Speech recognition, weighted finite-state transducers (WFSTs). I.
Juicer: A Weighted Finite-State Transducer speech decoder
"... Abstract. A major component in the development of any speech recognition system is the decoder. As task complexities and, consequently, system complexities have continued to increase the decoding problem has become an increasingly significant component in the overall speech recognition system develo ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Abstract. A major component in the development of any speech recognition system is the decoder. As task complexities and, consequently, system complexities have continued to increase the decoding problem has become an increasingly significant component in the overall speech recognition system development effort, with efficient decoder design contributing to significantly improve the trade-off between decoding time and search errors. In this paper we present the“Juicer”(from transducer) large vocabulary continuous speech recognition (LVCSR) decoder based on weighted finite-State transducer (WFST). We begin with a discussion of the need for open source, state-of-the-art decoding software in LVCSR research and how this lead to the development of Juicer, followed by a brief overview of decoding techniques and major issues in decoder design. We present Juicer and its major features, emphasising its potential not only as a critical component in the development of LVCSR systems, but also as an important research tool in itself, being based around the flexible WFST paradigm. We also provide results of benchmarking tests that have been carried out to date, demonstrating that in many respects Juicer, while still in its early development, is already achieving stateof-the-art. These benchmarking tests serve to not only demonstrate the utility of Juicer in its present state, but are also being used to guide future development, hence, we conclude with a brief discussion of some of the extensions that are currently under way or being considered for Juicer. 1
Integration Of Supra-Lexical Linguistic Models With Speech Recognition Using Shallow Parsing And Finite State Transducers
, 2002
"... This paper proposes a layered Finite State Transducer (FST) framework integrating hierarchical supra-lexical linguistic knowledge into speech recognition based on shallow parsing. The shallow parsing grammar is derived directly from the full fledged grammar for natural language understanding, and au ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
This paper proposes a layered Finite State Transducer (FST) framework integrating hierarchical supra-lexical linguistic knowledge into speech recognition based on shallow parsing. The shallow parsing grammar is derived directly from the full fledged grammar for natural language understanding, and augmented with top-level n-gram probabilities and phrase-level context-dependent probabilities, which is beyond the standard context-free grammar (CFG) formalism. Such a shallow parsing approach can help balance sufficient grammar coverage and tight structure constraints. The context-dependent probabilistic shallow parsing model is represented by layered FSTs, which can be integrated with speech recognition seamlessly to impose early phrase-level structural constraints consistent with natural language understanding. It is shown that in the JUPITER [1] weather information domain, the shallow parsing model achieves lower recognition word error rates, compared to a regular class n-gram model with the same order. However, we find that, with a higher order top-level n-gram model, pre-composition and optimization of the FSTs are highly restricted by the computational resources available. Given the potential of such models, it may be worth pursing an incremental approximation strategy [2], which includes part of the linguistic model FST in early optimization, while introducing the complete model through dynamic composition.
Towards a Unified Framework for Sub-lexical and Supra-lexical Linguistic Modeling
, 2002
"... Conversational interfaces have received much attention as a promising natural communication channel between humans and computers. A typical conversational interface consists of three major systems: speech understanding, dialog management and spoken language generation. In such a conversational inter ..."
Abstract
- Add to MetaCart
Conversational interfaces have received much attention as a promising natural communication channel between humans and computers. A typical conversational interface consists of three major systems: speech understanding, dialog management and spoken language generation. In such a conversational interface, speech recognition as the front-end of speech understanding remains to be one of the fundamental challenges for establishing robust and effective human/computer communications. On the one hand, the speech recognition component in a conversational interface lives in a rich system environment. Diverse sources of knowledge are available and can potentially be beneficial to its robustness and accuracy. For example, the natural language understanding component can provide linguistic knowledge in syntax and semantics that helps constrain the recognition search space. On the other hand, the speech recognition component also faces the challenge of spontaneous speech, and it is important to address the casualness of speech using the knowledge sources available. For example, sub-lexical linguistic information would be very useful in providing linguistic support for previously unseen words, and dynamic reliability modeling may help improve recognition robustness for poorly articulated speech.
Applying Transducers To Spoken Language Processing For Portuguese
, 2002
"... This paper has two different goals. The primary aim is to illustrate the advantages of weighted finite state transducers for spoken language processing, namely in terms of their capacity to efficiently integrate different types of knowledge sources. We have chosen three areas to emphasize several as ..."
Abstract
- Add to MetaCart
This paper has two different goals. The primary aim is to illustrate the advantages of weighted finite state transducers for spoken language processing, namely in terms of their capacity to efficiently integrate different types of knowledge sources. We have chosen three areas to emphasize several aspects of the application of transducers: large vocabulary continuous speech recognition, automatic alignment and grapheme-to-phone conversion. The secondary goal is to simultaneously present the state of the art in these areas for European Portuguese.

