Results 11 - 20
of
58
Corrective Language Modeling For Large Vocabulary ASR With The Perceptron Algorithm
- PROC. ICASSP
, 2004
"... This paper investigates error-corrective language modeling using the perceptron algorithm on word lattices. The resulting model is encoded as a weighted finite-state automaton, and is used by intersecting the model with word lattices, making it simple and inexpensive to apply during decoding. We pre ..."
Abstract
-
Cited by 16 (3 self)
- Add to MetaCart
This paper investigates error-corrective language modeling using the perceptron algorithm on word lattices. The resulting model is encoded as a weighted finite-state automaton, and is used by intersecting the model with word lattices, making it simple and inexpensive to apply during decoding. We present results for various training scenarios for the Switchboard task, including using ngram features of different orders, and performing n-best extraction versus using full word lattices. We demonstrate the importance of making the training conditions as close as possible to testing conditions. The best approach yields a 1.3 percent improvement in first pass accuracy, which translates to 0.5 percent improvement after other rescoring passes.
FSA: An Efficient and Flexible C++ Toolkit for Finite State Automata Using On-Demand Computation
- IN: ACL PROCEEDINGS. (2004
, 2004
"... In this paper we present the RWTH FSA toolkit -- an efficient implementation of algorithms for creating and manipulating weighted finite-state automata. The toolkit has been designed using the principle of on-demand computation and offers a large range of widely used algorithms. To prove the superio ..."
Abstract
-
Cited by 14 (4 self)
- Add to MetaCart
In this paper we present the RWTH FSA toolkit -- an efficient implementation of algorithms for creating and manipulating weighted finite-state automata. The toolkit has been designed using the principle of on-demand computation and offers a large range of widely used algorithms. To prove the superior efficiency of the toolkit, we compare the implementation to that of other publically available toolkits. We also show that on-demand computations help to reduce memory requirements significantly without any loss in speed. To increase its flexibility, the RWTH FSA toolkit supports high-level interfaces to the programming language Python as well as a command-line tool for interactive manipulation of FSAs. Furthermore, we show how to utilize the toolkit to rapidly build a fast and accurate statistical machine translation system. Future extensibility of the toolkit is ensured as it will be publically available as open source software.
Hierarchical phrase-based translation with weighted finite state transducers and . . .
- IN PROCEEDINGS OF HLT/NAACL
, 2010
"... In this article we describe HiFST, a lattice-based decoder for hierarchical phrase-based translation and alignment. The decoder is implemented with standard Weighted Finite-State Transducer (WFST) operations as an alternative to the well-known cube pruning procedure. We find that the use of WFSTs ra ..."
Abstract
-
Cited by 14 (7 self)
- Add to MetaCart
In this article we describe HiFST, a lattice-based decoder for hierarchical phrase-based translation and alignment. The decoder is implemented with standard Weighted Finite-State Transducer (WFST) operations as an alternative to the well-known cube pruning procedure. We find that the use of WFSTs rather than k-best lists requires less pruning in translation search, resulting in fewer search errors, better parameter optimization, and improved translation performance. The direct generation of translation lattices in the target language can improve subsequent rescoring procedures, yielding further gains when applying long-span language models and Minimum Bayes Risk decoding. We also provide insights as to how to control the size of the search space defined by hierarchical rules. We show that shallow-n grammars, low-level rule catenation, and other search constraints can help to match the power of the translation system to specific language pairs.
S.: Finite-state registered automata for nonconcatenative morphology
- Computational Linguistics
, 2006
"... We introduce finite-state registered automata (FSRAs), a new computational device within the framework of finite-state technology, specifically tailored for implementing non-concatenative morphological processes. This model extends and augments existing finite-state techniques, which are presently n ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
We introduce finite-state registered automata (FSRAs), a new computational device within the framework of finite-state technology, specifically tailored for implementing non-concatenative morphological processes. This model extends and augments existing finite-state techniques, which are presently not optimized for describing this kind of phenomena. We first define the model and discuss its mathematical and computational properties. Then, we provide an extended regular language whose expressions denote FSRAs. Finally, we exemplify the utility of the model by providing several examples of complex morphological and phonological phenomena which are elegantly implemented with FSRAs. 1.
A Weight Pushing Algorithm for Large Vocabulary Speech Recognition
- IN EUROPEAN CONF. ON SPEECH COMMUNICATION AND TECHNOLOGY
, 2001
"... Weighted finite-state transducers provide a general framework for the representation of the components of speech recognition systems; language models, pronunciation dictionaries, contextdependent models, HMM-level acoustic models, and the output word or phone lattices can all be represented by weigh ..."
Abstract
-
Cited by 10 (6 self)
- Add to MetaCart
Weighted finite-state transducers provide a general framework for the representation of the components of speech recognition systems; language models, pronunciation dictionaries, contextdependent models, HMM-level acoustic models, and the output word or phone lattices can all be represented by weighted automata and transducers. In general, a representation is not unique and there may be different weighted transducers realizing the same mapping. In particular, even when they have exactly the same topology with the same input and output labels, two equivalent transducers may differ by the way the weights are distributed along each path. We present
General Indexation of Weighted Automata - Application to Spoken Utterance Retrieval
- In Proceedings of the Workshop on Interdisciplinary Approaches to Speech Indexing and Retrieval (HLT/NAACL 2004
, 2004
"... Much of the massive quantities of digitized data widely available, e.g., text, speech, handwritten sequences, are either given directly, or, as a result of some prior processing, as weighted automata. These are compact representations of a large number of alternative sequences and their weight ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
Much of the massive quantities of digitized data widely available, e.g., text, speech, handwritten sequences, are either given directly, or, as a result of some prior processing, as weighted automata. These are compact representations of a large number of alternative sequences and their weights reflecting the uncertainty or variability of the data. Thus, the indexation of such data requires indexing weighted automata.
Probabilistic Finite-State Machines - Part I
"... Probabilistic finite-state machines are used today in a variety of areas in pattern recognition, or in fields to which pattern recognition is linked: computational linguistics, machine learning, time series analysis, circuit testing, computational biology, speech recognition and machine translatio ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
Probabilistic finite-state machines are used today in a variety of areas in pattern recognition, or in fields to which pattern recognition is linked: computational linguistics, machine learning, time series analysis, circuit testing, computational biology, speech recognition and machine translation are some of them. In part I of this paper we survey these generative objects and study their definitions and properties. In part II, we will study the relation of probabilistic finite-state automata with other well known devices that generate strings as hidden Markov models and n-grams, and provide theorems, algorithms and properties that represent a current state of the art of these objects.
Context-Free Recognition with Weighted Automata
- Grammars
, 2000
"... We introduce the definition of language recognition with weighted automata, a generalization of the classical definition of recognition with unweighted acceptors. We show that, with our definition of recognition, weighted automata can be used to recognize a class of languages that strictly includes ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
We introduce the definition of language recognition with weighted automata, a generalization of the classical definition of recognition with unweighted acceptors. We show that, with our definition of recognition, weighted automata can be used to recognize a class of languages that strictly includes regular languages. The class of languages accepted depends on the weight set which has the algebraic structure of a semiring. We give a generic linear time algorithm for recognition with weighted automata and describe examples with various weight sets illustrating the recognition of several classes of context-free languages. We prove, in particular, that the class of languages equivalent to the language of palindromes can be recognized by weighted automata over the (+; \Delta)-semiring, and that the class of languages equivalent to the 1 can be recognized by weighted automata over the real tropical semiring. We also prove that weighted automata over the real tropical semiring can be used to recognize regular expressions.
Abstract error projection
- IN SAS
, 2007
"... To improve the reporting of results from model checking and programanalysis systems, we introduce the notion of an error projection and annotated error projection. An error projection is a set of program nodes N such that for each node n ∈ N there exists an (abstract) error path from the program ent ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
To improve the reporting of results from model checking and programanalysis systems, we introduce the notion of an error projection and annotated error projection. An error projection is a set of program nodes N such that for each node n ∈ N there exists an (abstract) error path from the program entry s through n to a specified target node t. An annotated error projection associates with each node n in the error projection an (abstract) counterexample that validates the error along with an abstract store, whose presence at n induces the error. We present novel algorithms for computing (annotated) error projections and discuss additional applications for these algorithms. Our experiments show that error projections can be computed efficiently.
A General Weighted Grammar Library
- IN PROCEEDINGS OF THE NINTH INTERNATIONAL CONFERENCE ON AUTOMATA (CIAA 2004
, 2004
"... We present a general weighted grammar software library, the GRM Library, that can be used in a variety of applications in text, speech, and biosequence processing. The underlying algorithms were designed to support a wide variety of semirings and the representation and use of very large grammars ..."
Abstract
-
Cited by 7 (4 self)
- Add to MetaCart
We present a general weighted grammar software library, the GRM Library, that can be used in a variety of applications in text, speech, and biosequence processing. The underlying algorithms were designed to support a wide variety of semirings and the representation and use of very large grammars and automata of several hundred million rules or transitions. We describe several algorithms and utilities of this library and point out in each case their application to several text and speech processing tasks.

