Results 1 - 10
of
19
A Theory of Learning Classification Rules
, 1992
"... The main contributions of this thesis are a Bayesian theory of learning classification rules, the unification and comparison of this theory with some previous theories of learning, and two extensive applications of the theory to the problems of learning class probability trees and bounding error whe ..."
Abstract
-
Cited by 77 (6 self)
- Add to MetaCart
The main contributions of this thesis are a Bayesian theory of learning classification rules, the unification and comparison of this theory with some previous theories of learning, and two extensive applications of the theory to the problems of learning class probability trees and bounding error when learning logical rules. The thesis is motivated by considering some current research issues in machine learning such as bias, overfitting and search, and considering the requirements placed on a learning system when it is used for knowledge acquisition. Basic Bayesian decision theory relevant to the problem of learning classification rules is reviewed, then a Bayesian framework for such learning is presented. The framework has three components: the hypothesis space, the learning protocol, and criteria for successful learning. Several learning protocols are analysed in detail: queries, logical, noisy, uncertain and positive-only examples. The analysis is done by interpreting a protocol as a...
On the Computational Power of DNA Annealing and Ligation
- DNA Based Computers, volume 27 of DIMACS
, 1995
"... In [Winfree] it was shown that the DNA primitives of Separate, Merge, and Amplify were not sufficiently powerful to invert functions defined by circuits in linear time. Dan Boneh et al [Boneh] show that the addition of a ligation primitive, Append, provides the missing power. The question becomes, " ..."
Abstract
-
Cited by 62 (18 self)
- Add to MetaCart
In [Winfree] it was shown that the DNA primitives of Separate, Merge, and Amplify were not sufficiently powerful to invert functions defined by circuits in linear time. Dan Boneh et al [Boneh] show that the addition of a ligation primitive, Append, provides the missing power. The question becomes, "How powerful is ligation? Are Separate, Merge, and Amplify necessary at all?" This paper proposes to informally explore the power of annealing and ligation for DNA computation. We conclude, in fact, that annealing and ligation alone are theoretically capable of universal computation. 1 Introduction When Len Adleman introduced the paradigm of using DNA to solve combinatorial problems [Adleman], his computational scheme involved two distinct phases. To solve the directed Hamiltonian path problem, he first mixed together in a test tube a carefully designed set of DNA oligonucleotide "building blocks", which anneal to each other and are ligated to create long strands of DNA representing paths t...
Shifting Inductive Bias with Success-Story Algorithm, Adaptive Levin Search, and Incremental Self-Improvement
- MACHINE LEARNING
, 1997
"... We study task sequences that allow for speeding up the learner's average reward intake through appropriate shifts of inductive bias (changes of the learner's policy). To evaluate long-term effects of bias shifts setting the stage for later bias shifts we use the "success-story algorithm" (SSA). SSA ..."
Abstract
-
Cited by 58 (27 self)
- Add to MetaCart
We study task sequences that allow for speeding up the learner's average reward intake through appropriate shifts of inductive bias (changes of the learner's policy). To evaluate long-term effects of bias shifts setting the stage for later bias shifts we use the "success-story algorithm" (SSA). SSA is occasionally called at times that may depend on the policy itself. It uses backtracking to undo those bias shifts that have not been empirically observed to trigger longterm reward accelerations (measured up until the current SSA call). Bias shifts that survive SSA represent a lifelong success history. Until the next SSA call, they are considered useful and build the basis for additional bias shifts. SSA allows for plugging in a wide variety of learning algorithms. We plug in (1) a novel, adaptive extension of Levin search and (2) a method for embedding the learner's policy modification strategy within the policy itself (incremental self-improvement). Our inductive transfer case studies...
Discovering Neural Nets With Low Kolmogorov Complexity And High Generalization Capability
- Neural Networks
, 1997
"... Many neural net learning algorithms aim at finding "simple" nets to explain training data. The expectation is: the "simpler" the networks, the better the generalization on test data (! Occam's razor). Previous implementations, however, use measures for "simplicity" that lack the power, universali ..."
Abstract
-
Cited by 41 (23 self)
- Add to MetaCart
Many neural net learning algorithms aim at finding "simple" nets to explain training data. The expectation is: the "simpler" the networks, the better the generalization on test data (! Occam's razor). Previous implementations, however, use measures for "simplicity" that lack the power, universality and elegance of those based on Kolmogorov complexity and Solomonoff's algorithmic probability. Likewise, most previous approaches (especially those of the "Bayesian" kind) suffer from the problem of choosing appropriate priors. This paper addresses both issues. It first reviews some basic concepts of algorithmic complexity theory relevant to machine learning, and how the Solomonoff-Levin distribution (or universal prior) deals with the prior problem. The universal prior leads to a probabilistic method for finding "algorithmically simple" problem solutions with high generalization capability. The method is based on Levin complexity (a time-bounded generalization of Kolmogorov comple...
Discovering Solutions with Low Kolmogorov Complexity and High Generalization Capability
, 1995
"... Many machine learning algorithms aim at finding "simple" rules to explain training data. The expectation is: the "simpler" the rules, the better the generalization on test data (- Occam's razor). Most practi- cal implementations, however, use measures for "simplicity" that lack the power, univ ..."
Abstract
-
Cited by 34 (22 self)
- Add to MetaCart
Many machine learning algorithms aim at finding "simple" rules to explain training data. The expectation is: the "simpler" the rules, the better the generalization on test data (- Occam's razor). Most practi- cal implementations, however, use measures for "simplicity" that lack the power, universality and elegance of those based on Kolmogorov complexity and Solomonoff's algorithmic probability. Likewise, most pre- vious approaches (especially those of the "Bayesian" kind) suffer from the problem of choosing appropriate priors. This paper ad- dresses both issues. It first reviews some ba- sic concepts of algorithmic complexity theory relevant to machine learning, and how the Solomonoff-Levin distribution (or universal prior) deals with the prior problem. The uni- versal prior leads to a probabilistic method for finding "algorithmically simple" problem solutions with high generalization capability.
REINFORCEMENT LEARNING WITH SELF-MODIFYING POLICIES
, 1997
"... A learner’s modifiable components are called its policy. An algorithm that modifies the policy is a learning algorithm. If the learning algorithm has modifiable components represented as part of the policy, then we speak of a self-modifying policy (SMP). SMPs can modify the way they modify themselve ..."
Abstract
-
Cited by 28 (20 self)
- Add to MetaCart
A learner’s modifiable components are called its policy. An algorithm that modifies the policy is a learning algorithm. If the learning algorithm has modifiable components represented as part of the policy, then we speak of a self-modifying policy (SMP). SMPs can modify the way they modify themselves etc. They are of interest in situations where the initial learning algorithm itself can be improved by experience — this is what we call “learning to learn”. How can we force some (stochastic) SMP to trigger better and better self-modifications? The success-story algorithm (SSA) addresses this question in a lifelong reinforcement learning context. During the learner’s life-time, SSA is occasionally called at times computed according to SMP itself. SSA uses backtracking to undo those SMP-generated SMP-modifications that have not been empirically observed to trigger lifelong reward accelerations (measured up until the current SSA call — this evaluates the long-term effects of SMP-modifications setting the stage for later SMP-modifications). SMP-modifications that survive SSA represent a lifelong success history. Until the next SSA call, they build the basis for additional SMP-modifications. Solely by self-modifications our SMP/SSA-based learners solve a complex task in a partially observable environment (POE) whose state space is far bigger than most reported in the POE literature.
The discovery of algorithmic probability
- Journal of Computer and System Sciences
, 1997
"... This paper will describe a voyage of discovery — the discovery of Algorithmic Probability. But before I describe that voyage —a few words about motivation. Motivation in science is roughly of two kinds: In one, the motivation is discovery itself — the joy of “going where no one has gone before ” — ..."
Abstract
-
Cited by 22 (3 self)
- Add to MetaCart
This paper will describe a voyage of discovery — the discovery of Algorithmic Probability. But before I describe that voyage —a few words about motivation. Motivation in science is roughly of two kinds: In one, the motivation is discovery itself — the joy of “going where no one has gone before ” — the
A Formal Definition of Intelligence Based on an Intensional Variant of Algorithmic Complexity
- In Proceedings of the International Symposium of Engineering of Intelligent Systems (EIS'98
, 1998
"... Machine Due to the current technology of the computers we can use, we have chosen an extremely abridged emulation of the machine that will effectively run the programs, instead of more proper languages, like l-calculus (or LISP). We have adapted the "toy RISC" machine of [Hernndez & Hernndez 1993] ..."
Abstract
-
Cited by 20 (10 self)
- Add to MetaCart
Machine Due to the current technology of the computers we can use, we have chosen an extremely abridged emulation of the machine that will effectively run the programs, instead of more proper languages, like l-calculus (or LISP). We have adapted the "toy RISC" machine of [Hernndez & Hernndez 1993] with two remarkable features inherited from its object-oriented coding in C++: it is easily tunable for our needs, and it is efficient. We have made it even more reduced, removing any operand in the instruction set, even for the loop operations. We have only three registers which are AX (the accumulator), BX and CX. The operations Q b we have used for our experiment are in Table 1: LOOPTOP Decrements CX. If it is not equal to the first element jump to the program top.
Discovering Problem Solutions with Low Kolmogorov Complexity and High Generalization Capability
- MACHINE LEARNING: PROCEEDINGS OF THE TWELFTH INTERNATIONAL CONFERENCE
, 1994
"... Many machine learning algorithms aim at finding "simple" rules to explain training data. The expectation is: the "simpler" the rules, the better the generalization on test data (! Occam's razor). Most practical implementations, however, use measures for "simplicity" that lack the power, universality ..."
Abstract
-
Cited by 17 (9 self)
- Add to MetaCart
Many machine learning algorithms aim at finding "simple" rules to explain training data. The expectation is: the "simpler" the rules, the better the generalization on test data (! Occam's razor). Most practical implementations, however, use measures for "simplicity" that lack the power, universality and elegance of those based on Kolmogorov complexity and Solomonoff's algorithmic probability. Likewise, most previous approaches (especially those of the "Bayesian" kind) suffer from the problem of choosing appropriate priors. This paper addresses both issues. It first reviews some basic concepts of algorithmic complexity theory relevant to machine learning, and how the Solomonoff-Levin distribution (or universal prior) deals with the prior problem. The universal prior leads to a probabilistic method for finding "algorithmically simple" problem solutions with high generalization capability. The method is based on Levin complexity (a time-bounded generalization of Kolmogorov complexity) and...
Simple Principles Of Metalearning
- SEE
, 1996
"... The goal of metalearning is to generate useful shifts of inductive bias by adapting the current learning strategy in a "useful" way. Our learner leads a single life during which actions are continually executed according to the system's internal state and current policy (a modifiable, probabilistic ..."
Abstract
-
Cited by 14 (8 self)
- Add to MetaCart
The goal of metalearning is to generate useful shifts of inductive bias by adapting the current learning strategy in a "useful" way. Our learner leads a single life during which actions are continually executed according to the system's internal state and current policy (a modifiable, probabilistic algorithm mapping environmental inputs and internal states to outputs and new internal states). An action is considered a learning algorithm if it can modify the policy. Effects of learning processes on later learning processes are measured using reward/time ratios. Occasional backtracking enforces success histories of still valid policy modifications corresponding to histories of lifelong reward accelerations. The principle allows for plugging in a wide variety of learning algorithms. In particular, it allows for embedding the learner's policy modification strategy within the policy itself (self-reference). To demonstrate the principle's feasibility in cases where conventional reinforcemen...

