Results 1  10
of
17
FiniteState Transducers in Language and Speech Processing
 Computational Linguistics
, 1997
"... Finitestate machines have been used in various domains of natural language processing. We consider here the use of a type of transducers that supports very efficient programs: sequential transducers. We recall classical theorems and give new ones characterizing sequential stringtostring transducer ..."
Abstract

Cited by 308 (41 self)
 Add to MetaCart
Finitestate machines have been used in various domains of natural language processing. We consider here the use of a type of transducers that supports very efficient programs: sequential transducers. We recall classical theorems and give new ones characterizing sequential stringtostring transducers. Transducers that output weights also play an important role in language and speech processing. We give a specific study of stringtoweight transducers, including algorithms for determinizing and minimizing these transducers very efficiently, and characterizations of the transducers admitting determinization and the corresponding algorithms. Some applications of these algorithms in speech recognition are described and illustrated. 1.
The Design Principles of a Weighted FiniteState Transducer Library
 THEORETICAL COMPUTER SCIENCE
, 2000
"... We describe the algorithmic and software design principles of an objectoriented library for weighted finitestate transducers. By taking advantage of the theory of rational power series, we were able to achieve high degrees of generality, modularity and irredundancy, while attaining competitive eff ..."
Abstract

Cited by 99 (23 self)
 Add to MetaCart
We describe the algorithmic and software design principles of an objectoriented library for weighted finitestate transducers. By taking advantage of the theory of rational power series, we were able to achieve high degrees of generality, modularity and irredundancy, while attaining competitive efficiency in demanding speech processing applications involving weighted automata of more than 10^7 states and transitions. Besides its mathematical foundation, the design also draws from important ideas in algorithm design and programming languages: dynamic programming and shortestpaths algorithms over general semirings, objectoriented programming, lazy evaluation and memoization.
Deterministic PartofSpeech Tagging with FiniteState Transducers
 Computational Linguistics
, 1995
"... Stochastic approaches to natural language processing have often been preferred to rulebased approaches because of their robustness and their automatic training capabilities. This was the case for partofspeech tagging until Brill showed how stateoftheart partofspeech tagging can be achieved w ..."
Abstract

Cited by 82 (0 self)
 Add to MetaCart
Stochastic approaches to natural language processing have often been preferred to rulebased approaches because of their robustness and their automatic training capabilities. This was the case for partofspeech tagging until Brill showed how stateoftheart partofspeech tagging can be achieved with a rulebased tagger by inferring rules from a training corpus. However, current implementations of the rulebased tagger run more slowly than previous approaches. In this paper, we present a finitestate tagger, inspired by the rulebased tagger, that operates in optimal time in the sense that the time to assign tags to a sentence corresponds to the time required to follow a single path in a deterministic finitestate machine. This result is achieved by encoding the application of the rules found in the tagger as a nondeterministic finitestate transducer and then turning it into a deterministic transducer. The resulting deterministic transducer yields a partofspeech tagger whose speed is dominated by the access time of mass storage devices. We then generalize the techniques to the class of transformationbased systems. 1.
A Rational Design for a Weighted FiniteState Transducer Library
 LECTURE NOTES IN COMPUTER SCIENCE
, 1998
"... ..."
Minimization Algorithms for Sequential Transducers
, 2000
"... We present general algorithms for minimizing sequential finitestate transducers that output strings or numbers. The algorithms are shown to be efficient since in the case of acyclic transducers and for output strings they operate in O(S+E+V+(EV+F)x(Pmax+1)) steps, where S is the sum of ..."
Abstract

Cited by 55 (12 self)
 Add to MetaCart
We present general algorithms for minimizing sequential finitestate transducers that output strings or numbers. The algorithms are shown to be efficient since in the case of acyclic transducers and for output strings they operate in O(S+E+V+(EV+F)x(Pmax+1)) steps, where S is the sum of the lengths of all output labels of the resulting transducer, E the set of transitions of the given transducer, V the set of its states, F the set of final states, and Pmax one of the longest of the longest common prefixes of the output paths leaving each state of the transducer. The algorithms apply to a larger class of transducers which includes subsequential transducers.
Weighted Determinization and Minimization for Large Vocabulary Speech Recognition
 In Proc. EUROSPEECH
, 1997
"... Speech recognition requires solving many space and time problems that can have a critical effect on the overall system performance. We describe the use of two general new algorithms [5] that transform recognition networks into equivalent ones that require much less time and space in largevocabulary ..."
Abstract

Cited by 25 (4 self)
 Add to MetaCart
Speech recognition requires solving many space and time problems that can have a critical effect on the overall system performance. We describe the use of two general new algorithms [5] that transform recognition networks into equivalent ones that require much less time and space in largevocabulary speech recognition. The new algorithms generalize classical automata determinization and minimization to deal properly with the probabilities of alternative hypotheses and with the relationships between units (distributions, phones, words) at different levels in the recognition system. 1. INTRODUCTION The networks used in the search stage of speech recognition systems are often highly redundant. Many paths correspond to the same word contents (word lattices and language models), or to the same phonemes (pronunciation dictionaries) for instance, with distinct weights or probabilities. More generally, at a given state of a network there might be several thousand alternative outgoing arcs, ma...
The Suffix Tree of a Tree and Minimizing Sequential Transducers
, 1995
"... This paper gives a lineartime algorithm for the construction of the suffix tree of a tree. The suffix tree of a tree is used to obtain an efficient algorithm for the minimization of sequential transducers. 1 Introduction The suffix tree of a string is a compact trie representing all suffixes of ..."
Abstract

Cited by 20 (0 self)
 Add to MetaCart
This paper gives a lineartime algorithm for the construction of the suffix tree of a tree. The suffix tree of a tree is used to obtain an efficient algorithm for the minimization of sequential transducers. 1 Introduction The suffix tree of a string is a compact trie representing all suffixes of the string. The suffix tree has proven to be an extremely useful data structure in a wide variety of string processing algorithms [3, 6]. Kosaraju [10] defined the generalized suffix tree of all suffixes of a set of strings which are represented by a tree. Kosaraju mentions that Weiner's [14] suffix tree construction algorithm can be easily modified to construct the suffix tree of a tree in O(n log n) time, and that it might even be possible to do so in O(n) time. In this paper we give an O(n) time algorithm for the construction of the suffix tree of a tree, if the input symbols are drawen from a constant size alphabet. We then use the new suffix tree construction algorithm in the minimizat...
Network Optimizations for Large Vocabulary Speech Recognition
 Speech Communication
, 1998
"... The redundancy and the size of networks in largevocabulary speech recognition systems can have a critical effect on their overall performance. We describe the use of two new algorithms: weighted determinization and minimization [12]. These algorithms transform recognition labeled networks into equi ..."
Abstract

Cited by 20 (8 self)
 Add to MetaCart
The redundancy and the size of networks in largevocabulary speech recognition systems can have a critical effect on their overall performance. We describe the use of two new algorithms: weighted determinization and minimization [12]. These algorithms transform recognition labeled networks into equivalent ones that require much less time and space in largevocabulary speech recognition. They are both optimal: weighted determinization eliminates the number of alternatives at each state to the minimum, and weighted minimization reduces the size of deterministic networks to the smallest possible number of states and transitions. These algorithms generalize classical automata determinization and minimization to deal properly with the probabilities of alternative hypotheses and with the relationships between units (distributions, phones, words) at different levels in the recognition system. We illustrate their use in several applications, and report the results of our experiments. Key words...
Compact Representations By FiniteState Transducers
 In 32 nd Meeting of the Association for Computational Linguistics (ACL 94), Proceedings of the Conference, Las Cruces
, 1994
"... Finitestate transducers give efficient representations of many Natural Language phenomena. They allow to account for complex lexicon restrictions encountered, without involving the use of a large set of complex rules difficult to analyze. We here show that these representations can be made very com ..."
Abstract

Cited by 15 (7 self)
 Add to MetaCart
Finitestate transducers give efficient representations of many Natural Language phenomena. They allow to account for complex lexicon restrictions encountered, without involving the use of a large set of complex rules difficult to analyze. We here show that these representations can be made very compact, indicate how to perform the corresponding minimization, and point out interesting linguistic sideeffects of this operation.
Typographical nearestneighbor search in a finitestate lexicon and its application to spelling correction
 Lecture Notes in Computer Science
, 2001
"... Abstract. A method of errortolerant lookup in a finitestate lexicon is described, as well as its application to automatic spelling correction. We compare our method to the algorithm by K. Oflazer [14]. While Oflazerâ€™s algorithm searches for all possible corrections of a misspelled word that are wi ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
Abstract. A method of errortolerant lookup in a finitestate lexicon is described, as well as its application to automatic spelling correction. We compare our method to the algorithm by K. Oflazer [14]. While Oflazerâ€™s algorithm searches for all possible corrections of a misspelled word that are within a given similarity threshold, our approach is to retain only the most similar corrections (nearest neighbours), reducing dynamically the search space in the lexicon, and to reach the first correction as soon as possible. 1