Results 1  10
of
50
The Power of Amnesia: Learning Probabilistic Automata with Variable Memory Length
 Machine Learning
, 1996
"... . We propose and analyze a distribution learning algorithm for variable memory length Markov processes. These processes can be described by a subclass of probabilistic finite automata which we name Probabilistic Suffix Automata (PSA). Though hardness results are known for learning distributions gene ..."
Abstract

Cited by 226 (18 self)
 Add to MetaCart
(Show Context)
. We propose and analyze a distribution learning algorithm for variable memory length Markov processes. These processes can be described by a subclass of probabilistic finite automata which we name Probabilistic Suffix Automata (PSA). Though hardness results are known for learning distributions generated by general probabilistic automata, we prove that the algorithm we present can efficiently learn distributions generated by PSAs. In particular, we show that for any target PSA, the KLdivergence between the distribution generated by the target and the distribution generated by the hypothesis the learning algorithm outputs, can be made small with high confidence in polynomial time and sample complexity. The learning algorithm is motivated by applications in humanmachine interaction. Here we present two applications of the algorithm. In the first one we apply the algorithm in order to construct a model of the English language, and use this model to correct corrupted text. In the second ...
The Power of a Pebble: Exploring and Mapping Directed Graphs
 A PRELIMINARY VERSION OF THIS WORK APPEARED IN STOC `98
, 1998
"... ..."
Results of the Abbadingo One DFA Learning Competition and a New EvidenceDriven State Merging Algorithm
, 1998
"... . This paper first describes the structure and results of the Abbadingo One DFA Learning Competition. The competition was designed to encourage work on algorithms that scale wellboth to larger DFAs and to sparser training data. We then describe and discuss the winning algorithm of Rodney Price, w ..."
Abstract

Cited by 114 (1 self)
 Add to MetaCart
(Show Context)
. This paper first describes the structure and results of the Abbadingo One DFA Learning Competition. The competition was designed to encourage work on algorithms that scale wellboth to larger DFAs and to sparser training data. We then describe and discuss the winning algorithm of Rodney Price, which orders state merges according to the amount of evidence in their favor. A second winning algorithm, of Hugues Juille, will be described in a separate paper. Part I: Abbadingo 1 Introduction The Abbadingo One DFA Learning Competition was organized by two of the authors (Lang and Pearlmutter) and consisted of a set of challenge problems posted to the internet and token cash prizes of $1024. The organizers had the following goals:  Promote the development of new and better algorithms.  Encourage learning theorists to implement some of their ideas and gather empirical data concerning their performance on concrete problems which lie beyond proven bounds, particulary in the direction o...
The Power of Amnesia
 Machine Learning
, 1994
"... We propose a learning algorithm for a variable memory length Markov process. Human communication, whether given as text, handwriting, or speech, has multi characteristic time scales. On short scales it is characterized mostly by the dynamics that generate the process, whereas on large scales, more s ..."
Abstract

Cited by 90 (4 self)
 Add to MetaCart
We propose a learning algorithm for a variable memory length Markov process. Human communication, whether given as text, handwriting, or speech, has multi characteristic time scales. On short scales it is characterized mostly by the dynamics that generate the process, whereas on large scales, more syntactic and semantic information is carried. For that reason the conventionally used fixed memory Markov models cannot capture effectively the complexity of such structures. On the other hand using long memory models uniformly is not practical even for as short memory as four. The algorithm we propose is based on minimizing the statistical prediction error by extending the memory, or state length, adaptively, until the total prediction error is sufficiently small. We demonstrate the algorithm by learning the structure of natural English text and applying the learned model to the correction of corrupted text. Using less than 3000 states the model's performance is far superior to that of fixe...
On the Learnability and Usage of Acyclic Probabilistic Finite Automata
 JOURNAL OF COMPUTER AND SYSTEM SCIENCES
, 1995
"... We propose and analyze a distribution learning algorithm for a subclass of Acyclic Probabilistic Finite Automata (APFA). This subclass is characterized by a certain distinguishability property of the automata's states. Though hardness results are known for learning distributions generated by ge ..."
Abstract

Cited by 74 (3 self)
 Add to MetaCart
(Show Context)
We propose and analyze a distribution learning algorithm for a subclass of Acyclic Probabilistic Finite Automata (APFA). This subclass is characterized by a certain distinguishability property of the automata's states. Though hardness results are known for learning distributions generated by general APFAs, we prove that our algorithm can efficiently learn distributions generated by the subclass of APFAs we consider. In particular, we show that the KLdivergence between the distribution generated by the target source and the distribution generated by our hypothesis can be made arbitrarily small with high confidence in polynomial time. We present two applications of our algorithm. In the first, we show how to model cursively written letters. The resulting models are part of a complete cursive handwriting recognition system. In the second application we demonstrate how APFAs can be used to build multiplepronunciation models for spoken words. We evaluate the APFA based pronunciation models...
Learning bias and phonologicalrule induction
 Computational Linguistics
, 1996
"... A fundamental debate in the machine learning of language has been the role of prior knowledge in the learning process. Purely nativist approaches, such as the Principles and Parameters model, build parameterized linguistic generalizations directly into the learning system. Purely empirical approache ..."
Abstract

Cited by 33 (0 self)
 Add to MetaCart
A fundamental debate in the machine learning of language has been the role of prior knowledge in the learning process. Purely nativist approaches, such as the Principles and Parameters model, build parameterized linguistic generalizations directly into the learning system. Purely empirical approaches use a general, domainindependent learning rule (Error BackPropagation, Instancebased Generalization, Minimum Description Length) to learn linguistic generalizations directly from the data. In this paper we suggest that an alternative to the purely nativist or purely empiricist learning paradigms is to represent the prior knowledge of language as a set of abstract learning biases, which guide an empirical inductive learning algorithm. We test our idea by examining the machine learning of simple Sound Pattern of English ( S P E)style phonological rules. We represent phonological rules as finitestate transducers that accept underlying forms as input and generate surface forms as output. We show that OSTIA, a generalpurpose transducer induction algorithm, was incapable of learning simple phonological rules like flapping. We then augmented OSTIA with three kinds of learning biases that are specific to natural language phonology, and that are assumed explicitly or implicitly by every theory of phonology: faithfulness (underlying segments
Two statebased approaches to programbased anomaly detection
 In Proceedings of the 16th Annual Computer Security Applications Conference
, 2000
"... AbstractThis paper describes two recently developed intrusion detection algorithms, and gives experimental results on theirperformance. The algorithms detect anomalies in execution audit data. One is a simply constructed finitestate machine,and the other monitors statistical deviations from normal ..."
Abstract

Cited by 29 (0 self)
 Add to MetaCart
AbstractThis paper describes two recently developed intrusion detection algorithms, and gives experimental results on theirperformance. The algorithms detect anomalies in execution audit data. One is a simply constructed finitestate machine,and the other monitors statistical deviations from normal program behavior. The performance of these algorithms isevaluated as a function of the amount of available training data, and they are compared to the wellknown intrusion detection technique of looking for novel ngrams in computeraudit data.
Evaluation, Implementation, and Extension of Primitive Optimality Theory
, 1997
"... Eisner's (1997a) Primitive Optimality Theory is a simple formal model of a subset of Optimality Theory (Prince and Smolensky 1993). The work presented here implements this model and extends it. The implementation is used to evaluate the Primitive Optimality Theory model, and is in itself a usef ..."
Abstract

Cited by 29 (4 self)
 Add to MetaCart
Eisner's (1997a) Primitive Optimality Theory is a simple formal model of a subset of Optimality Theory (Prince and Smolensky 1993). The work presented here implements this model and extends it. The implementation is used to evaluate the Primitive Optimality Theory model, and is in itself a useful tool for linguistic analysis. The model is evaluated in terms of its success or failure as an attempt to formulate a cognitively plausible, computationally tractable, and mathematically formal model of the Optimality Theoretic framework of phonological theory. As part of this evaluation, a comprehensive, implemented analysis is given for the harmony and disharmony phenomena of Turkish. In addition to an evaluation of the Primitive Optimality Theory model, concrete proposals are suggested for possible extensions to the model, and for improved models that, unlike Primitive Optimality Theory, can model nonconcatenative morphology, Paradigm Uniformity, and reduplication.
Inducing Grammars from Sparse Data Sets: A Survey of Algorithms and Results
, 2003
"... This paper provides a comprehensive survey of the field of grammar induction applied to randomly generated languages using sparse example sets. ..."
Abstract

Cited by 28 (0 self)
 Add to MetaCart
This paper provides a comprehensive survey of the field of grammar induction applied to randomly generated languages using sparse example sets.
Looping suffix treebased inference of partially observable hidden state
 In Proceedings of the twentythird international conference on Machine learning (ICML 2006
, 2006
"... We present a solution for inferring hidden state from sensorimotor experience when the environment takes the form of a POMDP with deterministic transition and observation functions. Such environments can appear to be arbitrarily complex and nondeterministic on the surface, but are actually determin ..."
Abstract

Cited by 22 (0 self)
 Add to MetaCart
(Show Context)
We present a solution for inferring hidden state from sensorimotor experience when the environment takes the form of a POMDP with deterministic transition and observation functions. Such environments can appear to be arbitrarily complex and nondeterministic on the surface, but are actually deterministic with respect to the unobserved underlying state. We show that there always exists a finite historybased representation that fully captures the unobserved world state, allowing for perfect prediction of action effects. This representation takes the form of a looping prediction suffix tree (PST). We derive a sound and complete algorithm for learning a looping PST from a sufficient sample of sensorimotor experience. We also give empirical illustrations of the advantages conferred by this approach, and characterize the approximations to the looping PST that are made by existing algorithms such as Variable