Results 1  10
of
11
FiniteState Transducers in Language and Speech Processing
 Computational Linguistics
, 1997
"... Finitestate machines have been used in various domains of natural language processing. We consider here the use of a type of transducers that supports very efficient programs: sequential transducers. We recall classical theorems and give new ones characterizing sequential stringtostring transducer ..."
Abstract

Cited by 308 (41 self)
 Add to MetaCart
Finitestate machines have been used in various domains of natural language processing. We consider here the use of a type of transducers that supports very efficient programs: sequential transducers. We recall classical theorems and give new ones characterizing sequential stringtostring transducers. Transducers that output weights also play an important role in language and speech processing. We give a specific study of stringtoweight transducers, including algorithms for determinizing and minimizing these transducers very efficiently, and characterizations of the transducers admitting determinization and the corresponding algorithms. Some applications of these algorithms in speech recognition are described and illustrated. 1.
Weighted finitestate transducers in speech recognition
 COMPUTER SPEECH & LANGUAGE
, 2002
"... We survey the use of weighted finitestate transducers (WFSTs) in speech recognition. We show that WFSTs provide a common and natural representation for hidden Markov models (HMMs), contextdependency, pronunciation dictionaries, grammars, and alternative recognition outputs. Furthermore, general tr ..."
Abstract

Cited by 143 (4 self)
 Add to MetaCart
We survey the use of weighted finitestate transducers (WFSTs) in speech recognition. We show that WFSTs provide a common and natural representation for hidden Markov models (HMMs), contextdependency, pronunciation dictionaries, grammars, and alternative recognition outputs. Furthermore, general transducer operations combine these representations flexibly and efficiently. Weighted determinization and minimization algorithms optimize their time and space requirements, and a weight pushing algorithm distributes the weights along the paths of a weighted transducer optimally for speech recognition. As an example, we describe a North American Business News (NAB) recognition system built using these techniques that combines the HMMs, full crossword triphones, a lexicon of 40 000 words, and a large trigram grammar into a single weighted transducer that is only somewhat larger than the trigram word grammar and that runs NAB in realtime on a very simple decoder. In another example, we show that the same techniques can be used to optimize lattices for secondpass recognition. In a third example, we show how general automata operations can be used to assemble lattices from different recognizers to improve recognition performance.
A Spectral Algorithm for Learning Hidden Markov Models
"... Hidden Markov Models (HMMs) are one of the most fundamental and widely used statistical tools for modeling discrete time series. In general, learning HMMs from data is computationally hard; practitioners typically resort to search heuristics (such as the BaumWelch / EM algorithm) which suffer from ..."
Abstract

Cited by 56 (3 self)
 Add to MetaCart
Hidden Markov Models (HMMs) are one of the most fundamental and widely used statistical tools for modeling discrete time series. In general, learning HMMs from data is computationally hard; practitioners typically resort to search heuristics (such as the BaumWelch / EM algorithm) which suffer from the usual local optima issues. We prove that under a natural separation condition (roughly analogous to those considered for learning mixture models), there is an efficient and provably correct algorithm for learning HMMs. The sample complexity of the algorithm does not explicitly depend on the number of distinct (discrete) observations—it implicitly depends on this number through spectral properties of the underlying HMM. This makes the algorithm particularly applicable to settings with a large number of observations, such as those in natural language processing where the space of observation is sometimes the words in a language. The algorithm is also simple: it employs only a singular value decomposition and matrix multiplications. 1
Learning Functions Represented as Multiplicity Automata
, 2000
"... We study the learnability of multiplicity automata in Angluin’s exact learning model, and we investigate its applications. Our starting point is a known theorem from automata theory relating the ..."
Abstract

Cited by 27 (2 self)
 Add to MetaCart
We study the learnability of multiplicity automata in Angluin’s exact learning model, and we investigate its applications. Our starting point is a known theorem from automata theory relating the
On the Applications of Multiplicity Automata in Learning
 In Proc. of the 37th Annu. IEEE Symp. on Foundations of Computer Science
, 1996
"... Recently the learnability of multiplicity automata [8, 24] attracted a lot of attention, mainly because of its implications on the learnability of several classes of DNF formulae [7]. In this paper we further study the learnability of multiplicity automata. Our starting point is a known theorem from ..."
Abstract

Cited by 20 (9 self)
 Add to MetaCart
Recently the learnability of multiplicity automata [8, 24] attracted a lot of attention, mainly because of its implications on the learnability of several classes of DNF formulae [7]. In this paper we further study the learnability of multiplicity automata. Our starting point is a known theorem from automata theory relating the number of states in a minimal multiplicity automaton for a function f to the rank of a certain matrix F . With this theorem in hand we obtain the following results: ffl A new simple algorithm for learning multiplicity automata in the spirit of [24] with a better query complexity. As a result, we improve the complexity for all classes that use the algorithms of [8, 24] and also obtain the best query complexity for several classes known to be learnable by other methods such as decision trees [13] and polynomials over GF(2) [26]. ffl We prove the learnability of some new classes that were not known to be learnable before. Most notably, the class of polynomials ov...
A Weight Pushing Algorithm for Large Vocabulary Speech Recognition
 IN EUROPEAN CONF. ON SPEECH COMMUNICATION AND TECHNOLOGY
, 2001
"... Weighted finitestate transducers provide a general framework for the representation of the components of speech recognition systems; language models, pronunciation dictionaries, contextdependent models, HMMlevel acoustic models, and the output word or phone lattices can all be represented by weigh ..."
Abstract

Cited by 16 (10 self)
 Add to MetaCart
Weighted finitestate transducers provide a general framework for the representation of the components of speech recognition systems; language models, pronunciation dictionaries, contextdependent models, HMMlevel acoustic models, and the output word or phone lattices can all be represented by weighted automata and transducers. In general, a representation is not unique and there may be different weighted transducers realizing the same mapping. In particular, even when they have exactly the same topology with the same input and output labels, two equivalent transducers may differ by the way the weights are distributed along each path. We present
Y.: Planning in pomdps using multiplicity automata
 In: Proceedings of 21st Conference on Uncertainty in Artificial Intelligence (UAI
, 2005
"... Planning and learning in Partially Observable MDPs (POMDPs) are among the most challenging tasks in both the AI and Operation Research communities. Although solutions to these problems are intractable in general, there might be special cases, such as structured POMDPs, which can be solved efficientl ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
Planning and learning in Partially Observable MDPs (POMDPs) are among the most challenging tasks in both the AI and Operation Research communities. Although solutions to these problems are intractable in general, there might be special cases, such as structured POMDPs, which can be solved efficiently. A natural and possibly efficient way to represent a POMDP is through the predictive state representation (PSR) — a representation which recently has been receiving increasing attention. In this work, we relate POMDPs to multiplicity automata — showing that POMDPs can be represented by multiplicity automata with no increase in the representation size. Furthermore, we show that the size of the multiplicity automaton is equal to the rank of the predictive state representation. Therefore, we relate both the predictive state representation and POMDPs to the wellfounded multiplicity automata literature. Based on the multiplicity automata representation, we provide a planning algorithm which is exponential only in the multiplicity automata rank rather than the number of states of the POMDP. As a result, whenever the predictive state representation is logarithmic in the standard POMDP representation, our planning algorithm is efficient. 1
A Spectral Learning Algorithm for Finite State
"... Abstract. FiniteState Transducers (FSTs) are a popular tool for modeling paired inputoutput sequences, and have numerous applications in realworld problems. Most training algorithms for learning FSTs rely on gradientbased or EM optimizations which can be computationally expensive and suffer from ..."
Abstract

Cited by 9 (2 self)
 Add to MetaCart
Abstract. FiniteState Transducers (FSTs) are a popular tool for modeling paired inputoutput sequences, and have numerous applications in realworld problems. Most training algorithms for learning FSTs rely on gradientbased or EM optimizations which can be computationally expensive and suffer from local optima issues. Recently, Hsu et al. [13] proposed a spectral method for learning Hidden Markov Models (HMMs) which is based on an Observable Operator Model (OOM) view of HMMs. Following this line of work we present a spectral algorithm to learn FSTs with strong PACstyle guarantees. To the best of our knowledge, ours is the first result of this type for FST learning. At its core, the algorithm is simple, and scalable to large data sets. We present experiments that validate the effectiveness of the algorithm on synthetic and real data. 1
Using multiplicity automata to identify transducer relations from membership and equivalence queries
 In Proc. 9th Int. Coll. Grammatical Inference, volume 5278 of LNCS
, 2008
"... Abstract. Multiplicity Automata are devices that implement functions from a string space to a field. Usually the real number’s field is used. From a learning point of view there exist some algorithms that are able to identify any multiplicity automaton from membership and equivalence queries. In thi ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Abstract. Multiplicity Automata are devices that implement functions from a string space to a field. Usually the real number’s field is used. From a learning point of view there exist some algorithms that are able to identify any multiplicity automaton from membership and equivalence queries. In this work we show that those algorithms can also be used if the algebraic structure of a field is relaxed to a divisive ring structure, that is, the commutativity of the product operation is dropped. Moreover, we define an algebraic structure, which is an extension of the string monoid, that allows the identification of any transduction that can be realized by finite state machines without emptytransitions. 1
www.elsevier.com/locate/jcss A spectral algorithm for learning Hidden Markov Models
"... This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal noncommercial research and education use, including for instruction at the authors institution and sharing with colleagues. Other uses, including reproduction and distribution, or sel ..."
Abstract
 Add to MetaCart
This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal noncommercial research and education use, including for instruction at the authors institution and sharing with colleagues. Other uses, including reproduction and distribution, or selling or licensing copies, or posting to personal, institutional or third party websites are prohibited. In most cases authors are permitted to post their version of the article (e.g. in Word or Tex form) to their personal website or institutional repository. Authors requiring further information regarding Elsevier’s archiving and manuscript policies are