FiniteState Transducers in Language and Speech Processing
 Computational Linguistics
, 1997
"... Finitestate machines have been used in various domains of natural language processing. We consider here the use of a type of transducers that supports very efficient programs: sequential transducers. We recall classical theorems and give new ones characterizing sequential stringtostring transducer ..."
Cited by 308 (41 self)
Finitestate machines have been used in various domains of natural language processing. We consider here the use of a type of transducers that supports very efficient programs: sequential transducers. We recall classical theorems and give new ones characterizing sequential stringtostring transducers. Transducers that output weights also play an important role in language and speech processing. We give a specific study of stringtoweight transducers, including algorithms for determinizing and minimizing these transducers very efficiently, and characterizations of the transducers admitting determinization and the corresponding algorithms. Some applications of these algorithms in speech recognition are described and illustrated. 1.
Weighted finitestate transducers in speech recognition
 COMPUTER SPEECH & LANGUAGE
, 2002
"... We survey the use of weighted finitestate transducers (WFSTs) in speech recognition. We show that WFSTs provide a common and natural representation for hidden Markov models (HMMs), contextdependency, pronunciation dictionaries, grammars, and alternative recognition outputs. Furthermore, general tr ..."
Cited by 143 (4 self)
We survey the use of weighted finitestate transducers (WFSTs) in speech recognition. We show that WFSTs provide a common and natural representation for hidden Markov models (HMMs), contextdependency, pronunciation dictionaries, grammars, and alternative recognition outputs. Furthermore, general transducer operations combine these representations flexibly and efficiently. Weighted determinization and minimization algorithms optimize their time and space requirements, and a weight pushing algorithm distributes the weights along the paths of a weighted transducer optimally for speech recognition. As an example, we describe a North American Business News (NAB) recognition system built using these techniques that combines the HMMs, full crossword triphones, a lexicon of 40 000 words, and a large trigram grammar into a single weighted transducer that is only somewhat larger than the trigram word grammar and that runs NAB in realtime on a very simple decoder. In another example, we show that the same techniques can be used to optimize lattices for secondpass recognition. In a third example, we show how general automata operations can be used to assemble lattices from different recognizers to improve recognition performance.
A Survey of Computational Complexity Results in Systems and Control
, 2000
"... The purpose of this paper is twofold: (a) to provide a tutorial introduction to some key concepts from the theory of computational complexity, highlighting their relevance to systems and control theory, and (b) to survey the relatively recent research activity lying at the interface between these fi ..."
Cited by 116 (21 self)
The purpose of this paper is twofold: (a) to provide a tutorial introduction to some key concepts from the theory of computational complexity, highlighting their relevance to systems and control theory, and (b) to survey the relatively recent research activity lying at the interface between these fields. We begin with a brief introduction to models of computation, the concepts of undecidability, polynomial time algorithms, NPcompleteness, and the implications of intractability results. We then survey a number of problems that arise in systems and control theory, some of them classical, some of them related to current research. We discuss them from the point of view of computational complexity and also point out many open problems. In particular, we consider problems related to stability or stabilizability of linear systems with parametric uncertainty, robust control, timevarying linear systems, nonlinear and hybrid systems, and stochastic optimal control.
The Design Principles of a Weighted FiniteState Transducer Library
 THEORETICAL COMPUTER SCIENCE
, 2000
"... We describe the algorithmic and software design principles of an objectoriented library for weighted finitestate transducers. By taking advantage of the theory of rational power series, we were able to achieve high degrees of generality, modularity and irredundancy, while attaining competitive eff ..."
Cited by 99 (23 self)
We describe the algorithmic and software design principles of an objectoriented library for weighted finitestate transducers. By taking advantage of the theory of rational power series, we were able to achieve high degrees of generality, modularity and irredundancy, while attaining competitive efficiency in demanding speech processing applications involving weighted automata of more than 10^7 states and transitions. Besides its mathematical foundation, the design also draws from important ideas in algorithm design and programming languages: dynamic programming and shortestpaths algorithms over general semirings, objectoriented programming, lazy evaluation and memoization.
Parsing InsideOut
, 1998
"... Probabilistic ContextFree Grammars (PCFGs) and variations on them have recently become some of the most common formalisms for parsing. It is common with PCFGs to compute the inside and outside probabilities. When these probabilities are multiplied together and normalized, they produce the probabili ..."
Cited by 82 (2 self)
Probabilistic ContextFree Grammars (PCFGs) and variations on them have recently become some of the most common formalisms for parsing. It is common with PCFGs to compute the inside and outside probabilities. When these probabilities are multiplied together and normalized, they produce the probability that any given nonterminal covers any piece of the input sentence. The traditional use of these probabilities is to improve the probabilities of grammar rules. In this thesis we show that these values are useful for solving many other problems in Statistical Natural Language Processing. We give a framework for describing parsers. The framework generalizes the inside and outside values to semirings. It makes it easy to describe parsers that compute a wide variety of interesting quantities, including the inside and outside probabilities, as well as related quantities such as Viterbi probabilities and nbest lists. We also present three novel uses for the inside and outside probabilities. T...
An Efficient Compiler for Weighted Rewrite Rules
 IN 34TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS
, 1996
"... Contextdependent rewrite rules are used in many areas of natural language and speech processing. Work in computational phonology has demonstrated that, given certain conditions, such rewrite rules can be represented as finitestate transducers (FSTs). We describe a new algorithm for compilin ..."
Cited by 74 (25 self)
Contextdependent rewrite rules are used in many areas of natural language and speech processing. Work in computational phonology has demonstrated that, given certain conditions, such rewrite rules can be represented as finitestate transducers (FSTs). We describe a new algorithm for compiling rewrite rules into FSTs. We show the algorithm to be simpler and more efficient than existing algorithms. Further, many
Methods and Applications of (max,+) Linear Algebra
 STACS'97, NUMBER 1200 IN LNCS, LUBECK
, 1997
"... Exotic semirings such as the "(max, +) semiring" (R # {#},max,+), or the "tropical semiring" (N #{+#},min,+), have been invented and reinvented many times since the late fifties, in relation with various fields: performance evaluation of manufacturing systems and discrete event system theory; g ..."
Cited by 73 (26 self)
Exotic semirings such as the "(max, +) semiring" (R # {#},max,+), or the "tropical semiring" (N #{+#},min,+), have been invented and reinvented many times since the late fifties, in relation with various fields: performance evaluation of manufacturing systems and discrete event system theory; graph theory (path algebra) and Markov decision processes, HamiltonJacobi theory; asymptotic analysis (low temperature asymptotics in statistical physics, large deviations, WKB method); language theory (automata with multiplicities) . Despite this apparent profusion, there is a small set of common, nonnaive, basic results and problems, in general not known outside the (max, +) community, which seem to be useful in most applications. The aim of this short survey paper is to present what we believe to be the minimal core of (max, +) results, and to illustrate these results by typical applications, at the frontier of language theory, control, and operations research (performance evaluation of...
SEMIRING FRAMEWORKS AND ALGORITHMS FOR SHORTESTDISTANCE PROBLEMS
, 2002
"... We define general algebraic frameworks for shortestdistance problems based on the structure of semirings. We give a generic algorithm for finding singlesource shortest distances in a weighted directed graph when the weights satisfy the conditions of our general semiring framework. The same algorit ..."
Cited by 72 (20 self)
We define general algebraic frameworks for shortestdistance problems based on the structure of semirings. We give a generic algorithm for finding singlesource shortest distances in a weighted directed graph when the weights satisfy the conditions of our general semiring framework. The same algorithm can be used to solve efficiently classical shortest paths problems or to find the kshortest distances in a directed graph. It can be used to solve singlesource shortestdistance problems in weighted directed acyclic graphs over any semiring. We examine several semirings and describe some specific instances of our generic algorithms to illustrate their use and compare them with existing methods and algorithms. The proof of the soundness of all algorithms is given in detail, including their pseudocode and a full analysis of their running time complexity.
OpenFst: A general and efficient weighted finitestate transducer library. Implementation and Application of Automata
, 2007
"... Abstract. We describe OpenFst, an opensource library for weighted finitestate transducers (WFSTs). OpenFst consists of a C++ template library with efficient WFST representations and over twentyfive operations for constructing, combining, optimizing, and searching them. At the shellcommand level, ..."
Cited by 72 (8 self)
Abstract. We describe OpenFst, an opensource library for weighted finitestate transducers (WFSTs). OpenFst consists of a C++ template library with efficient WFST representations and over twentyfive operations for constructing, combining, optimizing, and searching them. At the shellcommand level, there are corresponding transducer file representations and programs that operate on them. OpenFst is designed to be both very efficient in time and space and to scale to very large problems. This library has key applications speech, image, and natural language processing, pattern and string matching, and machine learning. We give an overview of the library, examples of its use, details of its design that allow customizing the labels, states, and weights and the lazy evaluation of many of its operations. Further information and a download of the OpenFst library can be obtained from