Results 1 - 10
of
33
The Power of Amnesia: Learning Probabilistic Automata with Variable Memory Length
- Machine Learning
, 1996
"... . We propose and analyze a distribution learning algorithm for variable memory length Markov processes. These processes can be described by a subclass of probabilistic finite automata which we name Probabilistic Suffix Automata (PSA). Though hardness results are known for learning distributions gene ..."
Abstract
-
Cited by 148 (15 self)
- Add to MetaCart
. We propose and analyze a distribution learning algorithm for variable memory length Markov processes. These processes can be described by a subclass of probabilistic finite automata which we name Probabilistic Suffix Automata (PSA). Though hardness results are known for learning distributions generated by general probabilistic automata, we prove that the algorithm we present can efficiently learn distributions generated by PSAs. In particular, we show that for any target PSA, the KL-divergence between the distribution generated by the target and the distribution generated by the hypothesis the learning algorithm outputs, can be made small with high confidence in polynomial time and sample complexity. The learning algorithm is motivated by applications in human-machine interaction. Here we present two applications of the algorithm. In the first one we apply the algorithm in order to construct a model of the English language, and use this model to correct corrupted text. In the second ...
Results of the Abbadingo One DFA Learning Competition and a New Evidence-Driven State Merging Algorithm
, 1998
"... . This paper first describes the structure and results of the Abbadingo One DFA Learning Competition. The competition was designed to encourage work on algorithms that scale well---both to larger DFAs and to sparser training data. We then describe and discuss the winning algorithm of Rodney Price, w ..."
Abstract
-
Cited by 77 (1 self)
- Add to MetaCart
. This paper first describes the structure and results of the Abbadingo One DFA Learning Competition. The competition was designed to encourage work on algorithms that scale well---both to larger DFAs and to sparser training data. We then describe and discuss the winning algorithm of Rodney Price, which orders state merges according to the amount of evidence in their favor. A second winning algorithm, of Hugues Juille, will be described in a separate paper. Part I: Abbadingo 1 Introduction The Abbadingo One DFA Learning Competition was organized by two of the authors (Lang and Pearlmutter) and consisted of a set of challenge problems posted to the internet and token cash prizes of $1024. The organizers had the following goals: -- Promote the development of new and better algorithms. -- Encourage learning theorists to implement some of their ideas and gather empirical data concerning their performance on concrete problems which lie beyond proven bounds, particulary in the direction o...
The Power of a Pebble: Exploring and Mapping Directed Graphs
, 1998
"... Exploring and mapping an unknown environment is a fundamental problem, which is studied in various contexts. Many works have focused on finding efficient solutions to restricted versions of the problem. In this paper, we consider a model that makes very limited assumptions on the environment and ..."
Abstract
-
Cited by 76 (4 self)
- Add to MetaCart
Exploring and mapping an unknown environment is a fundamental problem, which is studied in various contexts. Many works have focused on finding efficient solutions to restricted versions of the problem. In this paper, we consider a model that makes very limited assumptions on the environment and solve the mapping problem in this general setting. We model
The Power of Amnesia
- Machine Learning
, 1994
"... We propose a learning algorithm for a variable memory length Markov process. Human communication, whether given as text, handwriting, or speech, has multi characteristic time scales. On short scales it is characterized mostly by the dynamics that generate the process, whereas on large scales, more s ..."
Abstract
-
Cited by 69 (4 self)
- Add to MetaCart
We propose a learning algorithm for a variable memory length Markov process. Human communication, whether given as text, handwriting, or speech, has multi characteristic time scales. On short scales it is characterized mostly by the dynamics that generate the process, whereas on large scales, more syntactic and semantic information is carried. For that reason the conventionally used fixed memory Markov models cannot capture effectively the complexity of such structures. On the other hand using long memory models uniformly is not practical even for as short memory as four. The algorithm we propose is based on minimizing the statistical prediction error by extending the memory, or state length, adaptively, until the total prediction error is sufficiently small. We demonstrate the algorithm by learning the structure of natural English text and applying the learned model to the correction of corrupted text. Using less than 3000 states the model's performance is far superior to that of fixe...
On the Learnability and Usage of Acyclic Probabilistic Finite Automata
- JOURNAL OF COMPUTER AND SYSTEM SCIENCES
, 1995
"... We propose and analyze a distribution learning algorithm for a subclass of Acyclic Probabilistic Finite Automata (APFA). This subclass is characterized by a certain distinguishability property of the automata's states. Though hardness results are known for learning distributions generated by general ..."
Abstract
-
Cited by 59 (3 self)
- Add to MetaCart
We propose and analyze a distribution learning algorithm for a subclass of Acyclic Probabilistic Finite Automata (APFA). This subclass is characterized by a certain distinguishability property of the automata's states. Though hardness results are known for learning distributions generated by general APFAs, we prove that our algorithm can efficiently learn distributions generated by the subclass of APFAs we consider. In particular, we show that the KL-divergence between the distribution generated by the target source and the distribution generated by our hypothesis can be made arbitrarily small with high confidence in polynomial time. We present two applications of our algorithm. In the first, we show how to model cursively written letters. The resulting models are part of a complete cursive handwriting recognition system. In the second application we demonstrate how APFAs can be used to build multiplepronunciation models for spoken words. We evaluate the APFA based pronunciation models...
Evaluation, Implementation, and Extension of Primitive Optimality Theory
, 1997
"... Eisner's (1997a) Primitive Optimality Theory is a simple formal model of a subset of Optimality Theory (Prince and Smolensky 1993). The work presented here implements this model and extends it. The implementation is used to evaluate the Primitive Optimality Theory model, and is in itself a useful to ..."
Abstract
-
Cited by 25 (4 self)
- Add to MetaCart
Eisner's (1997a) Primitive Optimality Theory is a simple formal model of a subset of Optimality Theory (Prince and Smolensky 1993). The work presented here implements this model and extends it. The implementation is used to evaluate the Primitive Optimality Theory model, and is in itself a useful tool for linguistic analysis. The model is evaluated in terms of its success or failure as an attempt to formulate a cognitively plausible, computationally tractable, and mathematically formal model of the Optimality Theoretic framework of phonological theory. As part of this evaluation, a comprehensive, implemented analysis is given for the harmony and disharmony phenomena of Turkish. In addition to an evaluation of the Primitive Optimality Theory model, concrete proposals are suggested for possible extensions to the model, and for improved models that, unlike Primitive Optimality Theory, can model non-concatenative morphology, Paradigm Uniformity, and reduplication.
Learning bias and phonological-rule induction
- Computational Linguistics
, 1996
"... A fundamental debate in the machine learning of language has been the role of prior knowledge in the learning process. Purely nativist approaches, such as the Principles and Parameters model, build parameterized linguistic generalizations directly into the learning system. Purely empirical approache ..."
Abstract
-
Cited by 25 (0 self)
- Add to MetaCart
A fundamental debate in the machine learning of language has been the role of prior knowledge in the learning process. Purely nativist approaches, such as the Principles and Parameters model, build parameterized linguistic generalizations directly into the learning system. Purely empirical approaches use a general, domain-independent learning rule (Error Back-Propagation, Instance-based Generalization, Minimum Description Length) to learn linguistic generalizations directly from the data. In this paper we suggest that an alternative to the purely nativist or purely empiricist learning paradigms is to represent the prior knowledge of language as a set of abstract learning biases, which guide an empirical inductive learning algorithm. We test our idea by examining the machine learning of simple Sound Pattern of English ( S P E)-style phonological rules. We represent phonological rules as finite-state transducers that accept underlying forms as input and generate surface forms as output. We show that OSTIA, a general-purpose transducer induction algorithm, was incapable of learning simple phonological rules like flapping. We then augmented OSTIA with three kinds of learning biases that are specific to natural language phonology, and that are assumed explicitly or implicitly by every theory of phonology: faithfulness (underlying segments
Inducing Grammars from Sparse Data Sets: A Survey of Algorithms and Results
, 2003
"... This paper provides a comprehensive survey of the field of grammar induction applied to randomly generated languages using sparse example sets. ..."
Abstract
-
Cited by 17 (0 self)
- Add to MetaCart
This paper provides a comprehensive survey of the field of grammar induction applied to randomly generated languages using sparse example sets.
Exploration Strategies for Model-based Learning in Multi-agent Systems
- Autonomous Agents and Multi-agent Systems
, 1997
"... . An agent that interacts with other agents in multi-agent systems can benefit significantly from adapting to the others. When performing active learning, every agent's action affects the interaction process in two ways: The effect on the expected reward according to the current knowledge held by th ..."
Abstract
-
Cited by 12 (1 self)
- Add to MetaCart
. An agent that interacts with other agents in multi-agent systems can benefit significantly from adapting to the others. When performing active learning, every agent's action affects the interaction process in two ways: The effect on the expected reward according to the current knowledge held by the agent, and the effect on the acquired knowledge, and hence, on future rewards expected to be received. The agent must therefore make a tradeoff between the wish to exploit its current knowledge, and the wish to explore other alternatives, to improve its knowledge for better decisions in the future. The goal of this work is to develop exploration strategies for a model-based learning agent to handle its encounters with other agents in a common environment. We first show how to incorporate exploration methods usually used in reinforcement learning into model-based learning. We then demonstrate the risk involved in exploration -- an exploratory action taken by the agent can yield a better mod...
Looping suffix tree-based inference of partially observable hidden state
- In Proceedings of the twenty-third international conference on Machine learning (ICML 2006
, 2006
"... We present a solution for inferring hidden state from sensorimotor experience when the environment takes the form of a POMDP with deterministic transition and observation functions. Such environments can appear to be arbitrarily complex and non-deterministic on the surface, but are actually determin ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
We present a solution for inferring hidden state from sensorimotor experience when the environment takes the form of a POMDP with deterministic transition and observation functions. Such environments can appear to be arbitrarily complex and non-deterministic on the surface, but are actually deterministic with respect to the unobserved underlying state. We show that there always exists a finite history-based representation that fully captures the unobserved world state, allowing for perfect prediction of action effects. This representation takes the form of a looping prediction suffix tree (PST). We derive a sound and complete algorithm for learning a looping PST from a sufficient sample of sensorimotor experience. We also give empirical illustrations of the advantages conferred by this approach, and characterize the approximations to the looping PST that are made by existing algorithms such as Variable

