Results 1  10
of
36
The Power of Amnesia: Learning Probabilistic Automata with Variable Memory Length
 Machine Learning
, 1996
"... . We propose and analyze a distribution learning algorithm for variable memory length Markov processes. These processes can be described by a subclass of probabilistic finite automata which we name Probabilistic Suffix Automata (PSA). Though hardness results are known for learning distributions gene ..."
Abstract

Cited by 173 (16 self)
 Add to MetaCart
. We propose and analyze a distribution learning algorithm for variable memory length Markov processes. These processes can be described by a subclass of probabilistic finite automata which we name Probabilistic Suffix Automata (PSA). Though hardness results are known for learning distributions generated by general probabilistic automata, we prove that the algorithm we present can efficiently learn distributions generated by PSAs. In particular, we show that for any target PSA, the KLdivergence between the distribution generated by the target and the distribution generated by the hypothesis the learning algorithm outputs, can be made small with high confidence in polynomial time and sample complexity. The learning algorithm is motivated by applications in humanmachine interaction. Here we present two applications of the algorithm. In the first one we apply the algorithm in order to construct a model of the English language, and use this model to correct corrupted text. In the second ...
The Power of a Pebble: Exploring and Mapping Directed Graphs
, 1998
"... Exploring and mapping an unknown environment is a fundamental problem, which is studied in various contexts. Many works have focused on finding efficient solutions to restricted versions of the problem. In this paper, we consider a model that makes very limited assumptions on the environment and ..."
Abstract

Cited by 107 (4 self)
 Add to MetaCart
Exploring and mapping an unknown environment is a fundamental problem, which is studied in various contexts. Many works have focused on finding efficient solutions to restricted versions of the problem. In this paper, we consider a model that makes very limited assumptions on the environment and solve the mapping problem in this general setting. We model
Results of the Abbadingo One DFA Learning Competition and a New EvidenceDriven State Merging Algorithm
, 1998
"... . This paper first describes the structure and results of the Abbadingo One DFA Learning Competition. The competition was designed to encourage work on algorithms that scale wellboth to larger DFAs and to sparser training data. We then describe and discuss the winning algorithm of Rodney Price, w ..."
Abstract

Cited by 90 (1 self)
 Add to MetaCart
. This paper first describes the structure and results of the Abbadingo One DFA Learning Competition. The competition was designed to encourage work on algorithms that scale wellboth to larger DFAs and to sparser training data. We then describe and discuss the winning algorithm of Rodney Price, which orders state merges according to the amount of evidence in their favor. A second winning algorithm, of Hugues Juille, will be described in a separate paper. Part I: Abbadingo 1 Introduction The Abbadingo One DFA Learning Competition was organized by two of the authors (Lang and Pearlmutter) and consisted of a set of challenge problems posted to the internet and token cash prizes of $1024. The organizers had the following goals:  Promote the development of new and better algorithms.  Encourage learning theorists to implement some of their ideas and gather empirical data concerning their performance on concrete problems which lie beyond proven bounds, particulary in the direction o...
The Power of Amnesia
 Machine Learning
, 1994
"... We propose a learning algorithm for a variable memory length Markov process. Human communication, whether given as text, handwriting, or speech, has multi characteristic time scales. On short scales it is characterized mostly by the dynamics that generate the process, whereas on large scales, more s ..."
Abstract

Cited by 77 (4 self)
 Add to MetaCart
We propose a learning algorithm for a variable memory length Markov process. Human communication, whether given as text, handwriting, or speech, has multi characteristic time scales. On short scales it is characterized mostly by the dynamics that generate the process, whereas on large scales, more syntactic and semantic information is carried. For that reason the conventionally used fixed memory Markov models cannot capture effectively the complexity of such structures. On the other hand using long memory models uniformly is not practical even for as short memory as four. The algorithm we propose is based on minimizing the statistical prediction error by extending the memory, or state length, adaptively, until the total prediction error is sufficiently small. We demonstrate the algorithm by learning the structure of natural English text and applying the learned model to the correction of corrupted text. Using less than 3000 states the model's performance is far superior to that of fixe...
On the Learnability and Usage of Acyclic Probabilistic Finite Automata
 JOURNAL OF COMPUTER AND SYSTEM SCIENCES
, 1995
"... We propose and analyze a distribution learning algorithm for a subclass of Acyclic Probabilistic Finite Automata (APFA). This subclass is characterized by a certain distinguishability property of the automata's states. Though hardness results are known for learning distributions generated by general ..."
Abstract

Cited by 71 (3 self)
 Add to MetaCart
We propose and analyze a distribution learning algorithm for a subclass of Acyclic Probabilistic Finite Automata (APFA). This subclass is characterized by a certain distinguishability property of the automata's states. Though hardness results are known for learning distributions generated by general APFAs, we prove that our algorithm can efficiently learn distributions generated by the subclass of APFAs we consider. In particular, we show that the KLdivergence between the distribution generated by the target source and the distribution generated by our hypothesis can be made arbitrarily small with high confidence in polynomial time. We present two applications of our algorithm. In the first, we show how to model cursively written letters. The resulting models are part of a complete cursive handwriting recognition system. In the second application we demonstrate how APFAs can be used to build multiplepronunciation models for spoken words. We evaluate the APFA based pronunciation models...
Evaluation, Implementation, and Extension of Primitive Optimality Theory
, 1997
"... Eisner's (1997a) Primitive Optimality Theory is a simple formal model of a subset of Optimality Theory (Prince and Smolensky 1993). The work presented here implements this model and extends it. The implementation is used to evaluate the Primitive Optimality Theory model, and is in itself a useful to ..."
Abstract

Cited by 30 (4 self)
 Add to MetaCart
Eisner's (1997a) Primitive Optimality Theory is a simple formal model of a subset of Optimality Theory (Prince and Smolensky 1993). The work presented here implements this model and extends it. The implementation is used to evaluate the Primitive Optimality Theory model, and is in itself a useful tool for linguistic analysis. The model is evaluated in terms of its success or failure as an attempt to formulate a cognitively plausible, computationally tractable, and mathematically formal model of the Optimality Theoretic framework of phonological theory. As part of this evaluation, a comprehensive, implemented analysis is given for the harmony and disharmony phenomena of Turkish. In addition to an evaluation of the Primitive Optimality Theory model, concrete proposals are suggested for possible extensions to the model, and for improved models that, unlike Primitive Optimality Theory, can model nonconcatenative morphology, Paradigm Uniformity, and reduplication.
Learning bias and phonologicalrule induction
 Computational Linguistics
, 1996
"... A fundamental debate in the machine learning of language has been the role of prior knowledge in the learning process. Purely nativist approaches, such as the Principles and Parameters model, build parameterized linguistic generalizations directly into the learning system. Purely empirical approache ..."
Abstract

Cited by 29 (0 self)
 Add to MetaCart
A fundamental debate in the machine learning of language has been the role of prior knowledge in the learning process. Purely nativist approaches, such as the Principles and Parameters model, build parameterized linguistic generalizations directly into the learning system. Purely empirical approaches use a general, domainindependent learning rule (Error BackPropagation, Instancebased Generalization, Minimum Description Length) to learn linguistic generalizations directly from the data. In this paper we suggest that an alternative to the purely nativist or purely empiricist learning paradigms is to represent the prior knowledge of language as a set of abstract learning biases, which guide an empirical inductive learning algorithm. We test our idea by examining the machine learning of simple Sound Pattern of English ( S P E)style phonological rules. We represent phonological rules as finitestate transducers that accept underlying forms as input and generate surface forms as output. We show that OSTIA, a generalpurpose transducer induction algorithm, was incapable of learning simple phonological rules like flapping. We then augmented OSTIA with three kinds of learning biases that are specific to natural language phonology, and that are assumed explicitly or implicitly by every theory of phonology: faithfulness (underlying segments
Inducing Grammars from Sparse Data Sets: A Survey of Algorithms and Results
, 2003
"... This paper provides a comprehensive survey of the field of grammar induction applied to randomly generated languages using sparse example sets. ..."
Abstract

Cited by 19 (0 self)
 Add to MetaCart
This paper provides a comprehensive survey of the field of grammar induction applied to randomly generated languages using sparse example sets.
Looping suffix treebased inference of partially observable hidden state
 In Proceedings of the twentythird international conference on Machine learning (ICML 2006
, 2006
"... We present a solution for inferring hidden state from sensorimotor experience when the environment takes the form of a POMDP with deterministic transition and observation functions. Such environments can appear to be arbitrarily complex and nondeterministic on the surface, but are actually determin ..."
Abstract

Cited by 18 (0 self)
 Add to MetaCart
We present a solution for inferring hidden state from sensorimotor experience when the environment takes the form of a POMDP with deterministic transition and observation functions. Such environments can appear to be arbitrarily complex and nondeterministic on the surface, but are actually deterministic with respect to the unobserved underlying state. We show that there always exists a finite historybased representation that fully captures the unobserved world state, allowing for perfect prediction of action effects. This representation takes the form of a looping prediction suffix tree (PST). We derive a sound and complete algorithm for learning a looping PST from a sufficient sample of sensorimotor experience. We also give empirical illustrations of the advantages conferred by this approach, and characterize the approximations to the looping PST that are made by existing algorithms such as Variable
Exploration Strategies for Modelbased Learning in Multiagent Systems
 Autonomous Agents and Multiagent Systems
, 1997
"... . An agent that interacts with other agents in multiagent systems can benefit significantly from adapting to the others. When performing active learning, every agent's action affects the interaction process in two ways: The effect on the expected reward according to the current knowledge held by th ..."
Abstract

Cited by 13 (1 self)
 Add to MetaCart
. An agent that interacts with other agents in multiagent systems can benefit significantly from adapting to the others. When performing active learning, every agent's action affects the interaction process in two ways: The effect on the expected reward according to the current knowledge held by the agent, and the effect on the acquired knowledge, and hence, on future rewards expected to be received. The agent must therefore make a tradeoff between the wish to exploit its current knowledge, and the wish to explore other alternatives, to improve its knowledge for better decisions in the future. The goal of this work is to develop exploration strategies for a modelbased learning agent to handle its encounters with other agents in a common environment. We first show how to incorporate exploration methods usually used in reinforcement learning into modelbased learning. We then demonstrate the risk involved in exploration  an exploratory action taken by the agent can yield a better mod...