## On the Learnability and Usage of Acyclic Probabilistic Finite Automata (1995)

### Cached

### Download Links

- [www.cs.huji.ac.il]
- [www.cs.huji.ac.il]
- [www.cs.huji.ac.il]
- [www.eng.tau.ac.il]
- [portal.research.bell-labs.com]
- [www.cs.huji.ac.il]
- DBLP

### Other Repositories/Bibliography

Venue: | JOURNAL OF COMPUTER AND SYSTEM SCIENCES |

Citations: | 72 - 3 self |

### BibTeX

@INPROCEEDINGS{Ron95onthe,

author = {Dana Ron and Yoram Singer and Naftali Tishby},

title = {On the Learnability and Usage of Acyclic Probabilistic Finite Automata},

booktitle = {JOURNAL OF COMPUTER AND SYSTEM SCIENCES},

year = {1995},

pages = {31--40},

publisher = {ACM Press}

}

### Years of Citing Articles

### OpenURL

### Abstract

We propose and analyze a distribution learning algorithm for a subclass of Acyclic Probabilistic Finite Automata (APFA). This subclass is characterized by a certain distinguishability property of the automata's states. Though hardness results are known for learning distributions generated by general APFAs, we prove that our algorithm can efficiently learn distributions generated by the subclass of APFAs we consider. In particular, we show that the KL-divergence between the distribution generated by the target source and the distribution generated by our hypothesis can be made arbitrarily small with high confidence in polynomial time. We present two applications of our algorithm. In the first, we show how to model cursively written letters. The resulting models are part of a complete cursive handwriting recognition system. In the second application we demonstrate how APFAs can be used to build multiplepronunciation models for spoken words. We evaluate the APFA based pronunciation models...

### Citations

4742 | A tutorial on hidden markov models and selected applications in speech recognition
- Rabiner
- 1989
(Show Context)
Citation Context ...e modeling and recognition of sequences such as those studied in this paper are string matching algorithms (e.g. Dynamic Time Warping [15]) and Hidden Markov Models (in particular left-to-right HMMs) =-=[11, 12]-=-. The string matching approach usually assumes the existence of a sequence prototype (reference template) together with a local noise model, from which the probabilities of deletions, insertions, and ... |

1612 | Probability inequalities for sums of bounded random variables - Hoeffding - 1963 |

933 | An introduction to hidden markov models
- Rabiner, Juang
- 1986
(Show Context)
Citation Context ...e modeling and recognition of sequences such as those studied in this paper are string matching algorithms (e.g. Dynamic Time Warping [15]) and Hidden Markov Models (in particular left-to-right HMMs) =-=[11, 12]-=-. The string matching approach usually assumes the existence of a sequence prototype (reference template) together with a local noise model, from which the probabilities of deletions, insertions, and ... |

431 |
Time Warps, String Edits and Macromolecules: the Theory and Practice of Sequence Comparison
- Sankoff, Kruskal
- 1983
(Show Context)
Citation Context ...ognition [16]. Other Related Work The most common approaches to the modeling and recognition of sequences such as those studied in this paper are string matching algorithms (e.g. Dynamic Time Warping =-=[15]-=-) and Hidden Markov Models (in particular left-to-right HMMs) [11, 12]. The string matching approach usually assumes the existence of a sequence prototype (reference template) together with a local no... |

413 |
Statistical Inference for Probabilistic Functions of Finite State Markov Chains
- Baum, Petrie
- 1966
(Show Context)
Citation Context ...d have better ability than the string matching based techniques to capture context dependent variations. The commonly used training procedure for HMMs which is based on the forward-backward algorithm =-=[2]-=- is guaranteed to converge only to a local maximum of the likelihood function. Furthermore, there are theoretical results indicating that the problem of learning distributions generated by HMMs is har... |

310 | Efficient noise-tolerant learning from statistical queries
- Kearns
- 1998
(Show Context)
Citation Context .... 2 The problem of learning parity with noise is closely related to the long standing problem of decoding random linear codes. Additional evidence to the intractability of this problem is provided in =-=[9, 3]-=-. 3 They define typical DFAs to be DFAs in which the underlying graph is arbitrary, but the accept/reject labels on the states are chosen randomly. 1 sequence, named the sample tree, is constructed ba... |

188 | The state of the art in online handwriting recognition - Tappert, Suen, et al. - 1990 |

145 | Learning stochastic regular grammars by means of a state merging method
- Carrasco, Oncina
- 1994
(Show Context)
Citation Context ... HMM training algorithms are neither on-line nor adaptive in the model's topology. A technique of merging states which is similar to the one used in this paper was also applied by Carrasco and Oncina =-=[4]-=-, and by Stolcke and Omohundro [18]. Carrasco and Oncina give an algorithm which identifies distributions generated by PFAs in the limit of infinite examples. Stolcke and Omohundro describe a learning... |

141 | Hidden Markov model induction by Bayesian model merging
- Stolcke, Omohundro
- 1993
(Show Context)
Citation Context ...er on-line nor adaptive in the model's topology. A technique of merging states which is similar to the one used in this paper was also applied by Carrasco and Oncina [4], and by Stolcke and Omohundro =-=[18]-=-. Carrasco and Oncina give an algorithm which identifies distributions generated by PFAs in the limit of infinite examples. Stolcke and Omohundro describe a learning algorithm for HMMs which merges st... |

122 |
Probability inequalities for sums of bounded random variables
- Hoeding
- 1963
(Show Context)
Citation Context ...0 , if P M 0 (q 0 ) 0 , then necessarily m q 0 =m P M 0 (q 0 ) 0 . There are at most 1= 0 states in each of the D levels for which P M 0 (q 0 ) 0 , and hence, using Hoeding's inequality =-=[7]-=- and the fact thatm 1 2 2 0 ln (2D=( 0 )) 2 jjn 2 D , with probability at least 1 (=2)2 (jjn 2 D) , for each such q 0 , m q 0 =m P M 0 (q 0 ) 0 . Since the size of M is bounded by... |

96 | On the learnability of discrete distributions
- Kearns, Mansour, et al.
- 1994
(Show Context)
Citation Context ...ficient in the sense that its running time is polynomial in the parameters of the problem. Our result should be contrasted with the intractability result for learning PFAs described by Kearns et. al. =-=[8]-=-. They show that PFAs are not efficiently learnable under the assumption that there is no efficient algorithm for learning parity functions in the presence of noise in the PAC model. 2 Furthermore, th... |

91 | On the computational complexity of approximating distributions by probabilistic automata
- Abe, Warmuth
- 1990
(Show Context)
Citation Context ...s guaranteed to converge only to a local maximum of the likelihood function. Furthermore, there are theoretical results indicating that the problem of learning distributions generated by HMMs is hard =-=[1, 8]-=-. In addition, the successful applications of the HMM approach occur mostly in cases where its full power is not utilized, and the hypothesis constructed is essentially a PFA (or even an APFA). Anothe... |

90 | Cryptographic primitives based on hard learning problems
- Blum, Furst, et al.
- 1993
(Show Context)
Citation Context .... 2 The problem of learning parity with noise is closely related to the long standing problem of decoding random linear codes. Additional evidence to the intractability of this problem is provided in =-=[9, 3]-=-. 3 They define typical DFAs to be DFAs in which the underlying graph is arbitrary, but the accept/reject labels on the states are chosen randomly. 1 sequence, named the sample tree, is constructed ba... |

47 | Sellie L., Efficient Learning of Typical Finite Automata from Random Walks
- Freund, Kearns, et al.
- 1997
(Show Context)
Citation Context ...his technique was presented in the pioneering work of Trakhtenbrot and Brazdin' [20] in the context of learning deterministic finite automata (DFAs). The same idea was later applied by Freund et. al. =-=[6]-=- in their work on learning typical DFAs 3 . In the same work they proposed to apply the notion of statistical signatures to learning typical PFAs. The outline of our learning algorithm is roughly the ... |

43 | Learning probabilistic automata with variable memory length
- Ron, Singer, et al.
- 1994
(Show Context)
Citation Context ...gorithm only folds pairs of nodes that in fact correspond to the same state, and the nodes which are left unmerged can be shown to contribute little to the error of the hypothesis. In a previous work =-=[14]-=- we introduced an algorithm for learning distributions (on long strings) generated by ergodic Markovian sources that can be characterized by a different subclass of PFAs which we refer to as Variable ... |

35 | Globally trained handwritten word recognizer using spatial representation, convolutional neural networks and hidden markov models
- Bengio, LeCun, et al.
- 1994
(Show Context)
Citation Context ... to build pronunciation models for spoken words. Examples and reviews on practical models and algorithms for multiple-pronunciation can be found in [5, 13], and for cursive handwriting recognition in =-=[10, 9, 19, 3]-=-. Organization of the Paper The paper is organized as follows. In Sections 2 and 3 we give several definitions related to APFAs, and define our learning model. In Section 4 we present our learning alg... |

29 | Dynamical encoding of cursive handwriting
- Singer, Tishby
- 1994
(Show Context)
Citation Context ...me for the following applications: (a) A part of a complete cursive handwriting recognition system (b) Pronunciation models for spoken words. 7.1 Building Stochastic Models for Cursive Handwriting In =-=[17]-=-, a dynamic encoding scheme for cursive handwriting based on an oscillatory model of handwriting was proposed and analysed. The process described in [17] performs mapping from continuous pen trajector... |

16 |
Maximum likelihood hidden Markov modeling using a dominant sequence of states
- Merhav, Ephraim
- 1991
(Show Context)
Citation Context ...mely, there is one, most probable, state sequence (the Viterbi sequence) which captures most of the likelihood of the model given the observations, so that practically the states are not truly hidden =-=[8]-=-. Another drawback of HMMs is that the current HMM training algorithms are neither online nor adaptive in the model's topology. These weak aspects of the hidden Markov model motivate our present model... |

6 |
Identification of contextual factors for pronounciation networks
- Chen
- 1990
(Show Context)
Citation Context ...n approach, and apply their algorithm to build pronunciation models for spoken words. Examples of 2 alternative approaches for modeling multiple-pronunciation, such as decision trees, can be found in =-=[5, 13]-=-. For general reviews on cursive handwriting recognition see [10, 19]. Organization of the Paper The paper is organized as follows. In Sections 2 and 3 we give several definitions related to APFAs, an... |

6 |
Brazdin'. Finite Automata: Behavior and Synthesis
- unknown authors
- 1973
(Show Context)
Citation Context ...is that of using some form of signatures of states in order to distinguish between the states of the target automaton. This technique was presented in the pioneering work of Trakhtenbrot and Brazdin' =-=[20]-=- in the context of learning deterministic finite automata (DFAs). The same idea was later applied by Freund et. al. [6] in their work on learning typical DFAs 1 . In the same work they proposed to app... |

6 |
Ecient learning of typical automata from random walks
- Freund, Kearns, et al.
- 1993
(Show Context)
Citation Context ... This technique was presented in the pioneering work of Trakhtenbrot and Brazdin' [20] in the context of learning deterministicsnite automata (DFAs). The same idea was later applied by Freund et. al. =-=[6]-=- in their work on learning typical DFAs 1 . In the same work they proposed to apply the notion of statistical signatures 1 They dene typical DFAs to be DFAs in which the underlying graph is arbitrary... |

3 |
A statistical model for generating pronounication networks
- Riley
- 1991
(Show Context)
Citation Context ...n approach, and apply their algorithm to build pronunciation models for spoken words. Examples of 2 alternative approaches for modeling multiple-pronunciation, such as decision trees, can be found in =-=[5, 13]-=-. For general reviews on cursive handwriting recognition see [10, 19]. Organization of the Paper The paper is organized as follows. In Sections 2 and 3 we give several definitions related to APFAs, an... |

3 |
An adaptive cursive handwriting recognition system
- Singer, Tishby
- 1995
(Show Context)
Citation Context ...ce, the APFAs capture the short sequence statistics. Together, these algorithm constitute a complete language modeling scheme, which we applied to cursive handwriting recognition and similar problems =-=[18]-=-. More formally, we present an algorithm for efficiently learning distributions on strings generated by a subclass of APFAs which have the following property. For every pair of states in an automaton ... |

2 |
Finite Automata: Behavior and Synthesis
- Brazdin'
- 1973
(Show Context)
Citation Context ...is that of using some form of signatures of states in order to distinguish between the states of the target automaton. This technique was presented in the pioneering work of Trakhtenbrot and Brazdin' =-=[20]-=- in the context of learning deterministic finite automata (DFAs). The same idea was later applied by Freund et. al. [6] in their work on learning typical DFAs 3 . In the same work they proposed to app... |

2 |
Identification of contextual factos for pronounciation networks
- Chen
- 1990
(Show Context)
Citation Context ...ed on a Bayesian approach, and apply their algorithm to build pronunciation models for spoken words. Examples and reviews on practical models and algorithms for multiple-pronunciation can be found in =-=[5, 13]-=-, and for cursive handwriting recognition in [10, 9, 19, 3]. Organization of the Paper The paper is organized as follows. In Sections 2 and 3 we give several definitions related to APFAs, and define o... |

2 | What has been will be again": A Machine Learning Approach to the Analysis of Natural Language
- Singer
- 1995
(Show Context)
Citation Context ...roperties of the source, the APFAs capture the short sequence statistics. Together, these algorithm constitute a complete language modeling scheme, which we applied to cursive handwriting recognition =-=[16]-=-. The algorithm described in this paper is an ecient algorithm for learning distributions on strings generated by all APFAs M which have the following property. For every pair of states in M , the di... |

1 |
What has been will be again": A Machine Learning Approach to the Analysis of Natural Language
- Singer
- 1995
(Show Context)
Citation Context ...roperties of the source, the APFAs capture the short sequence statistics. Together, these algorithm constitute a complete language modeling scheme, which we applied to cursive handwriting recognition =-=[16]-=-. Other Related Work The most common approaches to the modeling and recognition of sequences such as those studied in this paper are string matching algorithms (e.g. Dynamic Time Warping [15]) and Hid... |