Results 1 - 10
of
69
The Design Principles of a Weighted Finite-State Transducer Library
- THEORETICAL COMPUTER SCIENCE
, 2000
"... We describe the algorithmic and software design principles of an object-oriented library for weighted finite-state transducers. By taking advantage of the theory of rational power series, we were able to achieve high degrees of generality, modularity and irredundancy, while attaining competitive eff ..."
Abstract
-
Cited by 82 (19 self)
- Add to MetaCart
We describe the algorithmic and software design principles of an object-oriented library for weighted finite-state transducers. By taking advantage of the theory of rational power series, we were able to achieve high degrees of generality, modularity and irredundancy, while attaining competitive efficiency in demanding speech processing applications involving weighted automata of more than 10^7 states and transitions. Besides its mathematical foundation, the design also draws from important ideas in algorithm design and programming languages: dynamic programming and shortest-paths algorithms over general semirings, object-oriented programming, lazy evaluation and memoization.
A Rational Design for a Weighted Finite-State Transducer Library
- LECTURE NOTES IN COMPUTER SCIENCE
, 1998
"... ..."
Minor Identities For Quasi-Determinants And Quantum Determinants
- COMM. MATH. PHYS
, 1994
"... We present several identities involving quasi-minors of noncommutative generic matrices. These identities are specialized to quantum matrices, yielding q-analogues of various classical determinantal formulas. ..."
Abstract
-
Cited by 36 (4 self)
- Add to MetaCart
We present several identities involving quasi-minors of noncommutative generic matrices. These identities are specialized to quantum matrices, yielding q-analogues of various classical determinantal formulas.
The Ring of k-Regular Sequences
, 1992
"... The automatic sequence is the central concept at the intersection of formal language theory and number theory. It was introduced by Cobham, and has been extensively studied by Christol, Kamae, Mendes France and Rauzy, and other writers. Since the range of automatic sequences is nite, however, their ..."
Abstract
-
Cited by 29 (7 self)
- Add to MetaCart
The automatic sequence is the central concept at the intersection of formal language theory and number theory. It was introduced by Cobham, and has been extensively studied by Christol, Kamae, Mendes France and Rauzy, and other writers. Since the range of automatic sequences is nite, however, their descriptive power is severely limited.
Weighted automata and weighted logics
- In Automata, Languages and Programming – 32nd International Colloquium, ICALP 2005
, 2005
"... Abstract. Weighted automata are used to describe quantitative properties in various areas such as probabilistic systems, image compression, speech-to-text processing. The behaviour of such an automaton is a mapping, called a formal power series, assigning to each word a weight in some semiring. We g ..."
Abstract
-
Cited by 28 (4 self)
- Add to MetaCart
Abstract. Weighted automata are used to describe quantitative properties in various areas such as probabilistic systems, image compression, speech-to-text processing. The behaviour of such an automaton is a mapping, called a formal power series, assigning to each word a weight in some semiring. We generalize Büchi’s and Elgot’s fundamental theorems to this quantitative setting. We introduce a weighted version of MSO logic and prove that, for commutative semirings, the behaviours of weighted automata are precisely the formal power series definable with our weighted logic. We also consider weighted first-order logic and show that aperiodic series coincide with the first-order definable ones, if the semiring is locally finite, commutative and has some aperiodicity property. 1
A Spectral Algorithm for Learning Hidden Markov Models
"... Hidden Markov Models (HMMs) are one of the most fundamental and widely used statistical tools for modeling discrete time series. In general, learning HMMs from data is computationally hard; practitioners typically resort to search heuristics (such as the Baum-Welch / EM algorithm) which suffer from ..."
Abstract
-
Cited by 27 (3 self)
- Add to MetaCart
Hidden Markov Models (HMMs) are one of the most fundamental and widely used statistical tools for modeling discrete time series. In general, learning HMMs from data is computationally hard; practitioners typically resort to search heuristics (such as the Baum-Welch / EM algorithm) which suffer from the usual local optima issues. We prove that under a natural separation condition (roughly analogous to those considered for learning mixture models), there is an efficient and provably correct algorithm for learning HMMs. The sample complexity of the algorithm does not explicitly depend on the number of distinct (discrete) observations—it implicitly depends on this number through spectral properties of the underlying HMM. This makes the algorithm particularly applicable to settings with a large number of observations, such as those in natural language processing where the space of observation is sometimes the words in a language. The algorithm is also simple: it employs only a singular value decomposition and matrix multiplications. 1
Generalized Algorithms for Constructing Statistical Language Models
- IN PROC. OF THE 41ST MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS
, 2003
"... Recent text and speech processing applications such as speech mining raise new and more general problems related to the construction of language models. We present and describe in detail several new and efficient algorithms to address these more general problems and report experimental results demon ..."
Abstract
-
Cited by 22 (2 self)
- Add to MetaCart
Recent text and speech processing applications such as speech mining raise new and more general problems related to the construction of language models. We present and describe in detail several new and efficient algorithms to address these more general problems and report experimental results demonstrating their usefulness. We give an algorithm for computing efficiently the expected counts of any sequence in a word lattice output by a speech recognizer or any arbitrary weighted automaton; describe a new technique for creating exact representations -gram language models by weighted automata whose size is practical for offline use even for a vocabulary size of about 500,000 words and an n-gram order n = 6; and present a simple and more general technique for constructing class-based language models that allows each class to represent an arbitrary weighted automaton. An efficient implementation of our algorithms and techniques has been incorporated in a general software library for language modeling, the GRM Library, that includes many other text and grammar processing functionalities.
A Kleene theorem for weighted tree automata
- Theory of Computing Systems
, 2002
"... In this paper we prove Kleene's result for tree series over a commutative and idempotent semiring A (which is not necessarily complete or continuous), i.e., the class of recognizable tree series over A and the class of rational tree series over A are equal. We show the result by direct automata-theo ..."
Abstract
-
Cited by 16 (8 self)
- Add to MetaCart
In this paper we prove Kleene's result for tree series over a commutative and idempotent semiring A (which is not necessarily complete or continuous), i.e., the class of recognizable tree series over A and the class of rational tree series over A are equal. We show the result by direct automata-theoretic constructions and prove their correctness.
Compositional Analysis of Expected Delays in Networks of Probabilistic I/O Automata
, 1998
"... Probabilistic I/O automata (PIOA) constitute a model for distributed or concurrent systems that incorporates a notion of probabilistic choice. The PIOA model provides a notion of composition, for constructing a PIOA for a composite system from a collection of PIOAs representing the components. We pr ..."
Abstract
-
Cited by 16 (7 self)
- Add to MetaCart
Probabilistic I/O automata (PIOA) constitute a model for distributed or concurrent systems that incorporates a notion of probabilistic choice. The PIOA model provides a notion of composition, for constructing a PIOA for a composite system from a collection of PIOAs representing the components. We present a method for computing completion probability and expected completion time for PIOAs. Our method is compositional, in the sense that it can be applied to a system of PIOAs, one component at a time, without ever calculating the global state space of the system (i.e. the composite PIOA). The method is based on symbolic calculations with vectors and matrices of rational functions, and it draws upon a theory of observables, which are mappings from delayed traces to real numbers that generalize the classical "formal power series " from algebra and combinatorics. Central to the theory is a notion of representation for an observable, which generalizes the clasical notion "linear representation " for formal power series. As in the classical case, the representable observables coincide with an abstractly defined class of "rational" observables; this fact forms the foundation of our method. 1
Quantitative languages
"... Quantitative generalizations of classical languages, which assign to each word a real number instead of a boolean value, have applications in modeling resource-constrained computation. We use weighted automata (finite automata with transition weights) to define several natural classes of quantitativ ..."
Abstract
-
Cited by 15 (7 self)
- Add to MetaCart
Quantitative generalizations of classical languages, which assign to each word a real number instead of a boolean value, have applications in modeling resource-constrained computation. We use weighted automata (finite automata with transition weights) to define several natural classes of quantitative languages over finite and infinite words; in particular, the real value of an infinite run is computed as the maximum, limsup, liminf, limit average, or discounted sum of the transition weights. We define the classical decision problems of automata theory (emptiness, universality, language inclusion, and language equivalence) in the quantitative setting and study their computational complexity. As the decidability of the language-inclusion problem remains open for some classes of weighted automata, we introduce a notion of quantitative simulation that is decidable and implies language inclusion. We also give a complete characterization of the expressive power of the various classes of weighted automata. In particular, we show that most classes of weighted

