Results 1  10
of
14
Discovering Neural Nets With Low Kolmogorov Complexity And High Generalization Capability
 Neural Networks
, 1997
"... Many neural net learning algorithms aim at finding "simple" nets to explain training data. The expectation is: the "simpler" the networks, the better the generalization on test data (! Occam's razor). Previous implementations, however, use measures for "simplicity" that lack the power, universali ..."
Abstract

Cited by 49 (30 self)
 Add to MetaCart
Many neural net learning algorithms aim at finding "simple" nets to explain training data. The expectation is: the "simpler" the networks, the better the generalization on test data (! Occam's razor). Previous implementations, however, use measures for "simplicity" that lack the power, universality and elegance of those based on Kolmogorov complexity and Solomonoff's algorithmic probability. Likewise, most previous approaches (especially those of the "Bayesian" kind) suffer from the problem of choosing appropriate priors. This paper addresses both issues. It first reviews some basic concepts of algorithmic complexity theory relevant to machine learning, and how the SolomonoffLevin distribution (or universal prior) deals with the prior problem. The universal prior leads to a probabilistic method for finding "algorithmically simple" problem solutions with high generalization capability. The method is based on Levin complexity (a timebounded generalization of Kolmogorov comple...
Discovering Solutions with Low Kolmogorov Complexity and High Generalization Capability
, 1995
"... Many machine learning algorithms aim at finding "simple" rules to explain training data. The expectation is: the "simpler" the rules, the better the generalization on test data ( Occam's razor). Most practi cal implementations, however, use measures for "simplicity" that lack the power, univ ..."
Abstract

Cited by 37 (25 self)
 Add to MetaCart
Many machine learning algorithms aim at finding "simple" rules to explain training data. The expectation is: the "simpler" the rules, the better the generalization on test data ( Occam's razor). Most practi cal implementations, however, use measures for "simplicity" that lack the power, universality and elegance of those based on Kolmogorov complexity and Solomonoff's algorithmic probability. Likewise, most pre vious approaches (especially those of the "Bayesian" kind) suffer from the problem of choosing appropriate priors. This paper ad dresses both issues. It first reviews some ba sic concepts of algorithmic complexity theory relevant to machine learning, and how the SolomonoffLevin distribution (or universal prior) deals with the prior problem. The uni versal prior leads to a probabilistic method for finding "algorithmically simple" problem solutions with high generalization capability.
On Learning How to Learn Learning Strategies
, 1995
"... This paper introduces the "incremental selfimprovement paradigm". Unlike previous methods, incremental selfimprovement encourages a reinforcement learning system to improve the way it learns, and to improve the way it improves the way it learns ..., without significant theoretical limitations  ..."
Abstract

Cited by 36 (15 self)
 Add to MetaCart
This paper introduces the "incremental selfimprovement paradigm". Unlike previous methods, incremental selfimprovement encourages a reinforcement learning system to improve the way it learns, and to improve the way it improves the way it learns ..., without significant theoretical limitations  the system is able to "shift its inductive bias" in a universal way. Its major features are: (1) There is no explicit difference between "learning", "metalearning", and other kinds of information processing. Using a Turing machine equivalent programming language, the system itself occasionally executes selfdelimiting, initially highly random "selfmodification programs" which modify the contextdependent probabilities of future action sequences (including future selfmodification programs). (2) The system keeps only those probability modifications computed by "useful" selfmodification programs: those which bring about more payoff (reward, reinforcement) per time than all previous selfmodi...
Reinforcement Learning With SelfModifying Policies
 IN S. THRUN , L. PRATT (EDS.), LEARNING TO LEARN
, 1997
"... A learner's modifiable components are called its policy. An algorithm that modifies the policy is a learning algorithm. If the learning algorithm has modifiable components represented as part of the policy, then we speak of a selfmodifying policy (SMP). SMPs can modify the way they modify themselve ..."
Abstract

Cited by 33 (22 self)
 Add to MetaCart
A learner's modifiable components are called its policy. An algorithm that modifies the policy is a learning algorithm. If the learning algorithm has modifiable components represented as part of the policy, then we speak of a selfmodifying policy (SMP). SMPs can modify the way they modify themselves etc. They are of interest in situations where the initial learning algorithm itself can be improved by experience  this is what we call "learning to learn". How can we force some (stochastic) SMP to trigger better and better selfmodifications? The successstory algorithm (SSA) addresses this question in a lifelong reinforcement learning context. During the learner's lifetime, SSA is occasionally called at times computed according to SMP itself. SSA uses backtracking to undo those SMPgenerated SMPmodifications that have not been empirically observed to trigger lifelong reward accelerations (measured up until the current SSA call  this evaluates the longterm effects of SMPmodifi...
A General Method for Incremental SelfImprovement and MultiAgent Learning in Unrestricted Environments
 Evolutionary Computation: Theory and Applications. Scientific Publ. Co., Singapore. In
, 1996
"... I describe a novel paradigm for reinforcement learning (RL) with limited computational resources in realistic, nonresettable environments. The learner's policy is an arbitrary modifiable algorithm mapping environmental inputs and internal states to outputs and new internal states. Like in the re ..."
Abstract

Cited by 22 (8 self)
 Add to MetaCart
I describe a novel paradigm for reinforcement learning (RL) with limited computational resources in realistic, nonresettable environments. The learner's policy is an arbitrary modifiable algorithm mapping environmental inputs and internal states to outputs and new internal states. Like in the real world, any event in system life and any learning process computing policy modifications may affect future performance and preconditions of future learning processes. Unlike with most previous RL approaches, the expected reward for a certain behavior may change during successive "trials". At a given time in system life, there is only one single training example to evaluate the current longterm usefulness of any given previous policy modification, namely the average reinforcement per time since that modification occurred. At certain times in system life called checkpoints, such singular observations are used by a stackbased backtracking method which invalidates certain previous po...
The Use of a Bayesian Neural Network Model for Classification Tasks
, 1997
"... This thesis deals with a Bayesian neural network model. The focus is on how to use the model for automatic classification, i.e. on how to train the neural network to classify objects from some domain, given a database of labeled examples from the domain. The original Bayesian neural network is a one ..."
Abstract

Cited by 19 (1 self)
 Add to MetaCart
This thesis deals with a Bayesian neural network model. The focus is on how to use the model for automatic classification, i.e. on how to train the neural network to classify objects from some domain, given a database of labeled examples from the domain. The original Bayesian neural network is a onelayer network implementing a naive Bayesian classifier. It is based on the assumption that different attributes of the objects appear independently of each other. This work has been aimed at extending the original Bayesian neural network model, mainly focusing on three different aspects. First the model is extended to a multilayer network, to relax the independence requirement. This is done by introducing a hidden layer of complex columns, groups of units which take input from the same set of input attributes. Two different types of complex column structures in the hidden layer are studied and compared. An information theoretic measure is used to decide which input attributes to consider toget...
LowComplexity Art
, 1994
"... Many artists try to depict "the essence" of objects to be represented. In an attempt to formalize certain aspects of the "the essence", I propose an art form called lowcomplexity art. Its goals are based on concepts from algorithmic information theory. Suppose the task is to draw a given object. Us ..."
Abstract

Cited by 6 (3 self)
 Add to MetaCart
Many artists try to depict "the essence" of objects to be represented. In an attempt to formalize certain aspects of the "the essence", I propose an art form called lowcomplexity art. Its goals are based on concepts from algorithmic information theory. Suppose the task is to draw a given object. Usually there are many ways of doing so. The goal of lowcomplexity art is to draw the object such that the drawing can be specified by a computer algorithm and two properties hold: (1) The drawing should "look right". (2) The Kolmogorov complexity of the drawing should be small (the algorithm should be short), and a typical observer should be able to see this. Examples of lowcomplexity art are given in form of "algorithmically simple" cartoons of various objects. Relations to previous work are established. Attempts are made to relate the formalism of the theory of minimum description length to informal notions like "good artistic style" and "beauty". Keywords: Lowcomplexity art, fine arts, ...
Flat Minimum Search Finds Simple Nets
, 1994
"... We present a new algorithm for finding low complexity neural networks with high generalization capability. The algorithm searches for a "flat" minimum of the error function. A flat minimum is a large connected region in weightspace where the error remains approximately constant. An MDLbased argume ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
We present a new algorithm for finding low complexity neural networks with high generalization capability. The algorithm searches for a "flat" minimum of the error function. A flat minimum is a large connected region in weightspace where the error remains approximately constant. An MDLbased argument shows that flat minima correspond to low expected overfitting. Although our algorithm requires the computation of second order derivatives, it has backprop's order of complexity. Automatically, it effectively prunes units, weights, and input lines. Various experiments with feedforward and recurrent nets are described. In an application to stock market prediction, flat minimum search outperforms (1) conventional backprop, (2) weight decay, (3) "optimal brain surgeon" / "optimal brain damage".