Results 1  10
of
47
Inductive Inference, DFAs and Computational Complexity
 2nd Int. Workshop on Analogical and Inductive Inference (AII
, 1989
"... This paper surveys recent results concerning the inference of deterministic finite automata (DFAs). The results discussed determine the extent to which DFAs can be feasibly inferred, and highlight a number of interesting approaches in computational learning theory. 1 ..."
Abstract

Cited by 78 (1 self)
 Add to MetaCart
This paper surveys recent results concerning the inference of deterministic finite automata (DFAs). The results discussed determine the extent to which DFAs can be feasibly inferred, and highlight a number of interesting approaches in computational learning theory. 1
Minimum Description Length Induction, Bayesianism, and Kolmogorov Complexity
 IEEE Transactions on Information Theory
, 1998
"... The relationship between the Bayesian approach and the minimum description length approach is established. We sharpen and clarify the general modeling principles MDL and MML, abstracted as the ideal MDL principle and defined from Bayes's rule by means of Kolmogorov complexity. The basic condition un ..."
Abstract

Cited by 67 (7 self)
 Add to MetaCart
The relationship between the Bayesian approach and the minimum description length approach is established. We sharpen and clarify the general modeling principles MDL and MML, abstracted as the ideal MDL principle and defined from Bayes's rule by means of Kolmogorov complexity. The basic condition under which the ideal principle should be applied is encapsulated as the Fundamental Inequality, which in broad terms states that the principle is valid when the data are random, relative to every contemplated hypothesis and also these hypotheses are random relative to the (universal) prior. Basically, the ideal principle states that the prior probability associated with the hypothesis should be given by the algorithmic universal probability, and the sum of the log universal probability of the model plus the log of the probability of the data given the model should be minimized. If we restrict the model class to the finite sets then application of the ideal principle turns into Kolmogorov's mi...
Learning Simple Concepts Under Simple Distributions
 SIAM JOURNAL OF COMPUTING
, 1991
"... We aim at developing a learning theory where `simple' concepts are easily learnable. In Valiant's learning model, many concepts turn out to be too hard (like NP hard) to learn. Relatively few concept classes were shown to be learnable polynomially. In daily life, it seems that things we care to le ..."
Abstract

Cited by 56 (3 self)
 Add to MetaCart
We aim at developing a learning theory where `simple' concepts are easily learnable. In Valiant's learning model, many concepts turn out to be too hard (like NP hard) to learn. Relatively few concept classes were shown to be learnable polynomially. In daily life, it seems that things we care to learn are usually learnable. To model the intuitive notion of learning more closely, we do not require that the learning algorithm learns (polynomially) under all distributions, but only under all simple distributions. A distribution is simple if it is dominated by an enumerable distrib...
Compression, Significance and Accuracy
, 1992
"... Inductive Logic Programming (ILP) involves learning relational concepts from examples and background knowledge. To date all ILP learning systems make use of tests inherited from propositional and decision tree learning for evaluating the significance of hypotheses. None of these significance t ..."
Abstract

Cited by 43 (5 self)
 Add to MetaCart
Inductive Logic Programming (ILP) involves learning relational concepts from examples and background knowledge. To date all ILP learning systems make use of tests inherited from propositional and decision tree learning for evaluating the significance of hypotheses. None of these significance tests take account of the relevance or utility of the background knowledge. In this paper we describe a method, called HPcompression, of evaluating the significance of a hypothesis based on the degree to which it allows compression of the observed data with respect to the background knowledge. This can be measured by comparing the lengths of the input and output tapes of a reference Turing machine which will generate the examples from the hypothesis and a set of derivational proofs. The model extends an earlier approach of Muggleton by allowing for noise. The truth values of noisy instances are switched by making use of correction codes. The utility of compression as a significance measure is evaluated empirically in three independent domains. In particular, the results show that the existence of positive compression distinguishes a larger number of significant clauses than other significance tests The method is also shown to reliably distinguish artificially introduced noise as incompressible data.
Towards a universal theory of artificial intelligence based on algorithmic probability and sequential decisions
 Proceedings of the 12 th Eurpean Conference on Machine Learning (ECML2001
, 2001
"... Abstract. Decision theory formally solves the problem of rational agents in uncertain worlds if the true environmental probability distribution is known. Solomonoff’s theory of universal induction formally solves the problem of sequence prediction for unknown distributions. We unify both theories an ..."
Abstract

Cited by 26 (10 self)
 Add to MetaCart
Abstract. Decision theory formally solves the problem of rational agents in uncertain worlds if the true environmental probability distribution is known. Solomonoff’s theory of universal induction formally solves the problem of sequence prediction for unknown distributions. We unify both theories and give strong arguments that the resulting universal AIξ model behaves optimally in any computable environment. The major drawback of the AIξ model is that it is uncomputable. To overcome this problem, we construct a modified algorithm AIξ tl, which is still superior to any other time t and length l bounded agent. The computation time of AIξ tl is of the order t·2 l. 1
Distinguishing Exceptions from Noise in NonMonotonic Learning

, 1996
"... It is important for a learning program to have a reliable method of deciding whether to treat errors as noise or to include them as exceptions within a growing firstorder theory. We explore the use of an informationtheoretic measure to decide this problem within the nonmonotonic learning frame ..."
Abstract

Cited by 24 (4 self)
 Add to MetaCart
It is important for a learning program to have a reliable method of deciding whether to treat errors as noise or to include them as exceptions within a growing firstorder theory. We explore the use of an informationtheoretic measure to decide this problem within the nonmonotonic learning framework defined by ClosedWorldSpecialisation. The approach adopted uses a model that consists of a reference Turing machine which accepts an encoding of a theory and proofs on its input tape and generates the observed data on the output tape. Within this model, the theory is said to "compress" data if the length of the input tape is shorter than that of the output tape. Data found to be incompressible are deemed to be "noise".
New Error Bounds for Solomonoff Prediction
 Journal of Computer and System Sciences
, 1999
"... Several new relations between universal Solomonoff sequence prediction and informed prediction and general probabilistic prediction schemes will be proved. Among others, they show that the number of errors in Solomonoff prediction is finite for computable prior probability, if finite in the informed ..."
Abstract

Cited by 23 (16 self)
 Add to MetaCart
Several new relations between universal Solomonoff sequence prediction and informed prediction and general probabilistic prediction schemes will be proved. Among others, they show that the number of errors in Solomonoff prediction is finite for computable prior probability, if finite in the informed case, where the prior is known. Deterministic variants will also be studied. The most interesting result is that the deterministic variant of Solomonoff prediction is optimal compared to any other probabilistic or deterministic prediction scheme apart from additive square root corrections only. This makes it well suited even for difficult prediction problems, where it does not suffice when the number of errors is minimal to within some factor greater than one. Solomonoff's original bound and the ones presented here complement each other in a useful way.
Universal Algorithmic Intelligence: A mathematical topdown approach
 Artificial General Intelligence
, 2005
"... Artificial intelligence; algorithmic probability; sequential decision theory; rational ..."
Abstract

Cited by 22 (6 self)
 Add to MetaCart
Artificial intelligence; algorithmic probability; sequential decision theory; rational
Convergence and Error Bounds for Universal Prediction of Nonbinary Sequences
 Proceedings of the 12th Eurpean Conference on Machine Learning (ECML2001
, 2001
"... Solomonoff's uncomputable universal prediction scheme ß allows to predict the next symbol x k of a sequence x 1 ...x k1 for any Turing computable, but otherwise unknown, probabilistic environment µ . This scheme will be generalized to arbitrary environmental classes, which, among others ..."
Abstract

Cited by 21 (15 self)
 Add to MetaCart
Solomonoff's uncomputable universal prediction scheme ß allows to predict the next symbol x k of a sequence x 1 ...x k1 for any Turing computable, but otherwise unknown, probabilistic environment µ . This scheme will be generalized to arbitrary environmental classes, which, among others, allows the construction of computable universal prediction schemes ß . Convergence of ß to µ in a conditional mean squared sense and with µ probability 1 is proven. It is shown that the average number of prediction errors made by the universal ß scheme rapidly converges to those made by the best possible informed µ scheme. The schemes, theorems and proofs are given for general finite alphabet, which results in additional complications as compared to the binary case. Several extensions of the presented theory and results are outlined. They include general loss functions and bounds, games of chance, infinite alphabet, partial and delayed prediction, classification, and more active systems.
Applying MDL to Learning Best Model Granularity
, 1994
"... The Minimum Description Length (MDL) principle is solidly based on a provably ideal method of inference using Kolmogorov complexity. We test how the theory behaves in practice on a general problem in model selection: that of learning the best model granularity. The performance of a model depends ..."
Abstract

Cited by 20 (8 self)
 Add to MetaCart
The Minimum Description Length (MDL) principle is solidly based on a provably ideal method of inference using Kolmogorov complexity. We test how the theory behaves in practice on a general problem in model selection: that of learning the best model granularity. The performance of a model depends critically on the granularity, for example the choice of precision of the parameters. Too high precision generally involves modeling of accidental noise and too low precision may lead to confusion of models that should be distinguished. This precision is often determined ad hoc. In MDL the best model is the one that most compresses a twopart code of the data set: this embodies "Occam's Razor." In two quite different experimental settings the theoretical value determined using MDL coincides with the best value found experimentally. In the first experiment the task is to recognize isolated handwritten characters in one subject's handwriting, irrespective of size and orientation. Base...