Results 1 
5 of
5
A gentle introduction to the universal algorithmic agent AIXI
 Real AI: New Approaches to Arti General Intelligence
, 2003
"... Decision theory formally solves the problem of rational agents in uncertain worlds if the true environmental prior probability distribution is known. Solomonoff's theory of universal induction formally solves the problem of sequence prediction for unknown prior distribution. We combine both ideas an ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
Decision theory formally solves the problem of rational agents in uncertain worlds if the true environmental prior probability distribution is known. Solomonoff's theory of universal induction formally solves the problem of sequence prediction for unknown prior distribution. We combine both ideas and get a parameterless theory of universal Artificial Intelligence. We give strong arguments that the resulting AIXI model is the most intelligent unbiased agent possible. We outline for a number of problem classes, including sequence prediction, strategic games, function minimization, reinforcement and supervised learning, how the AIXI model can formally solve them. The major drawback of the AIXI model is that it is uncomputable. To overcome this problem, we construct a modified algorithm AIXItl, which is still effectively more intelligent than any other time t and space l bounded agent. The computation time of AIXItl is of the order t·2^l. Other discussed topics are formal definitions of intelligence order relations, the horizon problem and relations of the AIXI theory to other AI approaches.
The Kolmogorov lecture: The universal distribution and machine learning
 Computer Journal
, 2003
"... I will discuss two main topics in this lecture. Firstly, the Universal Distribution and some of its properties: its accuracy, its incomputability, its subjectivity. Secondly, I’m going to tell how to use this distribution to create very intelligent machines. Many years ago—in 1960—I discovered what ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
I will discuss two main topics in this lecture. Firstly, the Universal Distribution and some of its properties: its accuracy, its incomputability, its subjectivity. Secondly, I’m going to tell how to use this distribution to create very intelligent machines. Many years ago—in 1960—I discovered what we now call the Universal Probability Distribution [1]. It is the probability distribution on all possible output strings of a universal computer with random input. It seemed to solve all kinds of prediction problems and resolve serious difficulties in the foundations of Bayesian statistics. Suppose we have a string, x, and we want to know its universal probability with respect to machine M. There will be many inputs to machine M that will give x as output. Say si is the ith such input. If si is of length L(si) bits, the
Machine Learning — Past and Future
, 2009
"... I will first discuss current work in machine learning – in particular, feedforeword Artificial Neural Nets (ANN), Boolean Belief Nets (BBN), Support Vector Machines (SVM), Radial Basis Functions (RBF) and Prediction by Partial Matching (PPM). While they work quite well for the types of problems for ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
I will first discuss current work in machine learning – in particular, feedforeword Artificial Neural Nets (ANN), Boolean Belief Nets (BBN), Support Vector Machines (SVM), Radial Basis Functions (RBF) and Prediction by Partial Matching (PPM). While they work quite well for the types of problems for which they have been designed, they do not use recursion at all and this severely limits their power. Among techniques employing recursion, Recurrent Neural Nets, Context Free Grammar Discovery, Genetic Algorithms, and Genetic Programming have been prominent. I will describe the Universal Distribution, a method of induction that is guaranteed to discover any describable regularities in a body of data, using a relatively small sample of the data. While the incomputability of this distribution has sharply limited its adoption by the machine learning community, I will show that paradoxically, this incomputability imposes no limitation at all on its application to practical prediction. My recent work has centered mainly on two systems for machine learning. The first might be called ”The Baby Machine ” We start out with the machine having little problem specific knowledge, but a very good learning algorithm. At first we give it very simple problems. It uses its solutions to these problems to devise a probability distribution over function space to help search for solutions to harder problems. We give it harder problems and it updates its probability distribution on their solutions. This continues recursively, solving more and more difficult problems. The task of writing a suitable training sequence has been made much easier by Moore’s Law, which gives us enough computer speed to enable large conceptual jumps between problems in the sequence.
An ObservationCentric Analysis on the Modeling of Anomalybased Intrusion Detection Abstract
, 2005
"... It is generally agreed that two key points always attract special concerns during the modelling of anomalybased intrusion detection. One is the techniques about discerning two classes with different features, another is the construction/selection of the observed sample of normally occurring pattern ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
It is generally agreed that two key points always attract special concerns during the modelling of anomalybased intrusion detection. One is the techniques about discerning two classes with different features, another is the construction/selection of the observed sample of normally occurring patterns for system normality characterization. In this paper, instead of focusing on the design of specific anomaly detection models, we restrict our attention to the analysis of the anomaly detector’s operating environments, which facilitates us to insight into anomaly detectors ’ operational capabilities, including their detection coverage and blind spots, and thus to evaluate them in convincing manners. Taking the similarity with the induction problem as the starting point, we cast anomaly detection in a statistical framework, which gives a formal analysis of anomaly detector’s anticipated behavior from a high level. Some existing problems and possible solutions about the normality characterization for the observable subjects that from hosts and networks are addressed respectively. As case studies, several typical anomaly detectors are analyzed and compared from the prospective of their operating environments, especially those factors causing their special detection coverage or blind spots. Moreover, the evaluation of anomaly detectors are also roughly discussed based on some existing benchmarks. Careful analysis shows that the fundamental understanding of the operating environments (i.e., properties of observable subjects) is the elementary but essential stage in the process of establishing an effective anomaly detection model, which therefore worth insightful exploration, especially when we face the dilemma between anomaly detection performance and the computational cost.
On sample complexity for computational pattern recognition
, 2008
"... In this work we consider the task of pattern recognition in which the target (labelling) function is known to be computable on some Turing machine. It is easy to show that there exist a pattern recognition method for which the number of examples needed to approximate the target function with certain ..."
Abstract
 Add to MetaCart
In this work we consider the task of pattern recognition in which the target (labelling) function is known to be computable on some Turing machine. It is easy to show that there exist a pattern recognition method for which the number of examples needed to approximate the target function with certain accuracy is linear in the length of the (unknown) program computing the target function. We investigate the question whether any bounds of this kind exist if we consider only computable pattern recognition methods. We find that the number of examples required for a computable method to approximate an unknown computable function not only is not linear, but grows faster (in the length of the target function) than any computable function. No time or space constraints are put on the predictors or target functions; the only resource we consider is the training examples. The task of pattern recognition is considered in conjunction with another learning problem — data compression. An impossibility result for the task of data compression allows us to estimate the sample complexity for pattern recognition. 1 1