Results 11  20
of
172
The Speed Prior: A New Simplicity Measure Yielding NearOptimal Computable Predictions
 Proceedings of the 15th Annual Conference on Computational Learning Theory (COLT 2002), Lecture Notes in Artificial Intelligence
, 2002
"... Solomonoff's optimal but noncomputable method for inductive inference assumes that observation sequences x are drawn from an recursive prior distribution p(x). Instead of using the unknown p() he predicts using the celebrated universal enumerable prior M() which for all exceeds any recursive p() ..."
Abstract

Cited by 51 (20 self)
 Add to MetaCart
Solomonoff's optimal but noncomputable method for inductive inference assumes that observation sequences x are drawn from an recursive prior distribution p(x). Instead of using the unknown p() he predicts using the celebrated universal enumerable prior M() which for all exceeds any recursive p(), save for a constant factor independent of x. The simplicity measure M() naturally implements "Occam's razor " and is closely related to the Kolmogorov complexity of . However, M assigns high probability to certain data that are extremely hard to compute. This does not match our intuitive notion of simplicity. Here we suggest a more plausible measure derived from the fastest way of computing data. In absence of contrarian evidence, we assume that the physical world is generated by a computational process, and that any possibly infinite sequence of observations is therefore computable in the limit (this assumption is more radical and stronger than Solomonoff's).
Discovering Neural Nets With Low Kolmogorov Complexity And High Generalization Capability
 Neural Networks
, 1997
"... Many neural net learning algorithms aim at finding "simple" nets to explain training data. The expectation is: the "simpler" the networks, the better the generalization on test data (! Occam's razor). Previous implementations, however, use measures for "simplicity" that lack the power, universali ..."
Abstract

Cited by 50 (31 self)
 Add to MetaCart
Many neural net learning algorithms aim at finding "simple" nets to explain training data. The expectation is: the "simpler" the networks, the better the generalization on test data (! Occam's razor). Previous implementations, however, use measures for "simplicity" that lack the power, universality and elegance of those based on Kolmogorov complexity and Solomonoff's algorithmic probability. Likewise, most previous approaches (especially those of the "Bayesian" kind) suffer from the problem of choosing appropriate priors. This paper addresses both issues. It first reviews some basic concepts of algorithmic complexity theory relevant to machine learning, and how the SolomonoffLevin distribution (or universal prior) deals with the prior problem. The universal prior leads to a probabilistic method for finding "algorithmically simple" problem solutions with high generalization capability. The method is based on Levin complexity (a timebounded generalization of Kolmogorov comple...
Computational mechanics: Pattern and prediction, structure and simplicity
 Journal of Statistical Physics
, 1999
"... Computational mechanics, an approach to structural complexity, defines a process’s causal states and gives a procedure for finding them. We show that the causalstate representation—an Emachine—is the minimal one consistent with ..."
Abstract

Cited by 43 (8 self)
 Add to MetaCart
Computational mechanics, an approach to structural complexity, defines a process’s causal states and gives a procedure for finding them. We show that the causalstate representation—an Emachine—is the minimal one consistent with
Discovering Solutions with Low Kolmogorov Complexity and High Generalization Capability
, 1995
"... Many machine learning algorithms aim at finding "simple" rules to explain training data. The expectation is: the "simpler" the rules, the better the generalization on test data ( Occam's razor). Most practi cal implementations, however, use measures for "simplicity" that lack the power, univ ..."
Abstract

Cited by 37 (25 self)
 Add to MetaCart
Many machine learning algorithms aim at finding "simple" rules to explain training data. The expectation is: the "simpler" the rules, the better the generalization on test data ( Occam's razor). Most practi cal implementations, however, use measures for "simplicity" that lack the power, universality and elegance of those based on Kolmogorov complexity and Solomonoff's algorithmic probability. Likewise, most pre vious approaches (especially those of the "Bayesian" kind) suffer from the problem of choosing appropriate priors. This paper ad dresses both issues. It first reviews some ba sic concepts of algorithmic complexity theory relevant to machine learning, and how the SolomonoffLevin distribution (or universal prior) deals with the prior problem. The uni versal prior leads to a probabilistic method for finding "algorithmically simple" problem solutions with high generalization capability.
On Learning How to Learn Learning Strategies
, 1995
"... This paper introduces the "incremental selfimprovement paradigm". Unlike previous methods, incremental selfimprovement encourages a reinforcement learning system to improve the way it learns, and to improve the way it improves the way it learns ..., without significant theoretical limitations  ..."
Abstract

Cited by 36 (15 self)
 Add to MetaCart
This paper introduces the "incremental selfimprovement paradigm". Unlike previous methods, incremental selfimprovement encourages a reinforcement learning system to improve the way it learns, and to improve the way it improves the way it learns ..., without significant theoretical limitations  the system is able to "shift its inductive bias" in a universal way. Its major features are: (1) There is no explicit difference between "learning", "metalearning", and other kinds of information processing. Using a Turing machine equivalent programming language, the system itself occasionally executes selfdelimiting, initially highly random "selfmodification programs" which modify the contextdependent probabilities of future action sequences (including future selfmodification programs). (2) The system keeps only those probability modifications computed by "useful" selfmodification programs: those which bring about more payoff (reward, reinforcement) per time than all previous selfmodi...
A Natural Law of Succession
, 1995
"... Consider the following problem. You are given an alphabet of k distinct symbols and are told that the i th symbol occurred exactly ni times in the past. On the basis of this information alone, you must now estimate the conditional probability that the next symbol will be i. In this report, we presen ..."
Abstract

Cited by 35 (3 self)
 Add to MetaCart
Consider the following problem. You are given an alphabet of k distinct symbols and are told that the i th symbol occurred exactly ni times in the past. On the basis of this information alone, you must now estimate the conditional probability that the next symbol will be i. In this report, we present a new solution to this fundamental problem in statistics and demonstrate that our solution outperforms standard approaches, both in theory and in practice.
The Fastest And Shortest Algorithm For All WellDefined Problems
, 2002
"... An algorithm M is described that solves any welldefined problem p as quickly as the fastest algorithm computing a solution to p, save for a factor of 5 and loworder additive terms. M optimally distributes resources between the execution of provably correct psolving programs and an enumeration of ..."
Abstract

Cited by 35 (7 self)
 Add to MetaCart
An algorithm M is described that solves any welldefined problem p as quickly as the fastest algorithm computing a solution to p, save for a factor of 5 and loworder additive terms. M optimally distributes resources between the execution of provably correct psolving programs and an enumeration of all proofs, including relevant proofs of program correctness and of time bounds on program runtimes. M avoids Blum's speedup theorem by ignoring programs without correctness proof. M has broader applicability and can be faster than Levin's universal search, the fastest method for inverting functions save for a large multiplicative constant. An extension of Kolmogorov complexity and two novel natural measures of function complexity are used to show that the most efficient program computing some function f is also among the shortest programs provably computing f.
Reinforcement Learning With SelfModifying Policies
 IN S. THRUN , L. PRATT (EDS.), LEARNING TO LEARN
, 1997
"... A learner's modifiable components are called its policy. An algorithm that modifies the policy is a learning algorithm. If the learning algorithm has modifiable components represented as part of the policy, then we speak of a selfmodifying policy (SMP). SMPs can modify the way they modify themselve ..."
Abstract

Cited by 33 (22 self)
 Add to MetaCart
A learner's modifiable components are called its policy. An algorithm that modifies the policy is a learning algorithm. If the learning algorithm has modifiable components represented as part of the policy, then we speak of a selfmodifying policy (SMP). SMPs can modify the way they modify themselves etc. They are of interest in situations where the initial learning algorithm itself can be improved by experience  this is what we call "learning to learn". How can we force some (stochastic) SMP to trigger better and better selfmodifications? The successstory algorithm (SSA) addresses this question in a lifelong reinforcement learning context. During the learner's lifetime, SSA is occasionally called at times computed according to SMP itself. SSA uses backtracking to undo those SMPgenerated SMPmodifications that have not been empirically observed to trigger lifelong reward accelerations (measured up until the current SSA call  this evaluates the longterm effects of SMPmodifi...
On initial segment complexity and degrees of randomness
 Trans. Amer. Math. Soc
"... Abstract. One approach to understanding the fine structure of initial segment complexity was introduced by Downey, Hirschfeldt and LaForte. They define X ≤K Y to mean that (∀n) K(X ↾ n) ≤ K(Y ↾ n) +O(1). The equivalence classes under this relation are the Kdegrees. We prove that if X ⊕ Y is 1rand ..."
Abstract

Cited by 32 (6 self)
 Add to MetaCart
Abstract. One approach to understanding the fine structure of initial segment complexity was introduced by Downey, Hirschfeldt and LaForte. They define X ≤K Y to mean that (∀n) K(X ↾ n) ≤ K(Y ↾ n) +O(1). The equivalence classes under this relation are the Kdegrees. We prove that if X ⊕ Y is 1random, then X and Y have no upper bound in the Kdegrees (hence, no join). We also prove that nrandomness is closed upward in the Kdegrees. Our main tool is another structure intended to measure the degree of randomness of real numbers: the vLdegrees. Unlike the Kdegrees, many basic properties of the vLdegrees are easy to prove. We show that X ≤K Y implies X ≤vL Y, so some results can be transferred. The reverse implication is proved to fail. The same analysis is also done for ≤C, the analogue of ≤K for plain Kolmogorov complexity. Two other interesting results are included. First, we prove that for any Z ∈ 2ω, a 1random real computable from a 1Zrandom real is automatically 1Zrandom. Second, we give a plain Kolmogorov complexity characterization of 1randomness. This characterization is related to our proof that X ≤C Y implies X ≤vL Y. 1.