Results 1  10
of
25
A Theory of Program Size Formally Identical to Information Theory
, 1975
"... A new definition of programsize complexity is made. H(A;B=C;D) is defined to be the size in bits of the shortest selfdelimiting program for calculating strings A and B if one is given a minimalsize selfdelimiting program for calculating strings C and D. This differs from previous definitions: (1) ..."
Abstract

Cited by 332 (16 self)
 Add to MetaCart
A new definition of programsize complexity is made. H(A;B=C;D) is defined to be the size in bits of the shortest selfdelimiting program for calculating strings A and B if one is given a minimalsize selfdelimiting program for calculating strings C and D. This differs from previous definitions: (1) programs are required to be selfdelimiting, i.e. no program is a prefix of another, and (2) instead of being given C and D directly, one is given a program for calculating them that is minimal in size. Unlike previous definitions, this one has precisely the formal 2 G. J. Chaitin properties of the entropy concept of information theory. For example, H(A;B) = H(A) + H(B=A) + O(1). Also, if a program of length k is assigned measure 2 \Gammak , then H(A) = \Gamma log 2 (the probability that the standard universal computer will calculate A) +O(1). Key Words and Phrases: computational complexity, entropy, information theory, instantaneous code, Kraft inequality, minimal program, probab...
Algorithmic information theory
 IBM JOURNAL OF RESEARCH AND DEVELOPMENT
, 1977
"... This paper reviews algorithmic information theory, which is an attempt to apply informationtheoretic and probabilistic ideas to recursive function theory. Typical concerns in this approach are, for example, the number of bits of information required to specify an algorithm, or the probability that ..."
Abstract

Cited by 325 (19 self)
 Add to MetaCart
This paper reviews algorithmic information theory, which is an attempt to apply informationtheoretic and probabilistic ideas to recursive function theory. Typical concerns in this approach are, for example, the number of bits of information required to specify an algorithm, or the probability that a program whose bits are chosen by coin flipping produces a given output. During the past few years the definitions of algorithmic information theory have been reformulated. The basic features of the new formalism are presented here and certain results of R. M. Solovay are reported.
Universal prediction of individual sequences
 IEEE Transactions on Information Theory
, 1992
"... AbstructThe problem of predicting the next outcome of an individual binary sequence using finite memory, is considered. The finitestate predictability of an infinite sequence is defined as the minimum fraction of prediction errors that can be made by any finitestate (FS) predictor. It is proved t ..."
Abstract

Cited by 158 (13 self)
 Add to MetaCart
AbstructThe problem of predicting the next outcome of an individual binary sequence using finite memory, is considered. The finitestate predictability of an infinite sequence is defined as the minimum fraction of prediction errors that can be made by any finitestate (FS) predictor. It is proved that this FS predictability can be attained by universal sequential prediction schemes. Specifically, an efficient prediction procedure based on the incremental parsing procedure of the LempelZiv data compression algorithm is shown to achieve asymptotically the FS predictability. Finally, some relations between compressibility and predictability are pointed out, and the predictability is proposed as an additional measure of the complexity of a sequence. Index TermsPredictability, compressibility, complexity, finitestate machines, Lempel Ziv algorithm.
Universal prediction
 IEEE TRANSACTIONS ON INFORMATION THEORY
, 1998
"... This paper consists of an overview on universal prediction from an informationtheoretic perspective. Special attention is given to the notion of probability assignment under the selfinformation loss function, which is directly related to the theory of universal data compression. Both the probabili ..."
Abstract

Cited by 136 (11 self)
 Add to MetaCart
This paper consists of an overview on universal prediction from an informationtheoretic perspective. Special attention is given to the notion of probability assignment under the selfinformation loss function, which is directly related to the theory of universal data compression. Both the probabilistic setting and the deterministic setting of the universal prediction problem are described with emphasis on the analogy and the differences between results in the two settings.
A Convergent Gambling Estimate of the Entropy of English
 IEEE Transactions on Information Theory
, 1978
"... AbstmctIn his original paper on the subject, Shannon found upper which follow using the boundedness and continuity of and lower bounds for the entropy of printed English based on the number h(p) =p logp (1p) log (1p). In addition, if English of trials required for a subject to guess subsequent ..."
Abstract

Cited by 54 (1 self)
 Add to MetaCart
AbstmctIn his original paper on the subject, Shannon found upper which follow using the boundedness and continuity of and lower bounds for the entropy of printed English based on the number h(p) =p logp (1p) log (1p). In addition, if English of trials required for a subject to guess subsequent symbols in a given text. is an ergodic process, then the ShamronMcMillanBreiThe guessing approach precludes asymptotic consistency of either the upper or lower bounds except for degenerate ergodic processes. Shannon’s man theorem states technique of guessing the next symbol is altered by having the subject place sequential bets on the next symbol of text. lf S,, denotes the subject’s capital after n bets at 27 for 1 odds, and lf it is assumed thati log,,p(X,;..,X&H(X) a.e. (3) the subject hnows the underlying prpbabillty distribution for the process X, then the entropy estimate ls H,(X) =(l(l/n) log,, S,) log, 27 If printed English is indeed an ergodic process, then for bits/symbol. If the subject does npt hnow the true probabllty distribution sufficiently large n a good estimate of H(X) can be for the stochastic process, then Z&(X! ls an asymptotic upper bound for obtained from knowledge of p(e) on a randomly drawn the true entropy. ff X is stationary, EH,,(X)+H(X), H(X) bell the true
Fractal Dimension and Logarithmic Loss Unpredictability
"... We show that the Hausdorff dimension equals the logarithmic loss unpredictability for any set of infinite sequences over a finite alphabet. Using computable, feasible, and finitestate predictors, this equivalence also holds for the recently introduced computable, feasible, and finitestate dimensio ..."
Abstract

Cited by 35 (10 self)
 Add to MetaCart
We show that the Hausdorff dimension equals the logarithmic loss unpredictability for any set of infinite sequences over a finite alphabet. Using computable, feasible, and finitestate predictors, this equivalence also holds for the recently introduced computable, feasible, and finitestate dimensions [Lutz (2000) and Dai, Lathrop, Lutz, and Mayordomo (2001)]. Combining this with recent results of Fortnow and Lutz (2002), we have a tight relationship between prediction with respect to logarithmic loss and absolute loss.
The Application Of Algorithmic Probability to Problems in Artificial Intelligence
 in Uncertainty in Artificial Intelligence, Kanal, L.N. and Lemmer, J.F. (Eds), Elsevier Science Publishers B.V
, 1986
"... INTRODUCTION We will cover two topics First, Algorithmic Probability  the motivation for defining it, how it overcomes di#culties in other formulations of probability, some of its characteristic properties and successful applications. Second, we will apply it to problems in A.I.  where it p ..."
Abstract

Cited by 30 (5 self)
 Add to MetaCart
INTRODUCTION We will cover two topics First, Algorithmic Probability  the motivation for defining it, how it overcomes di#culties in other formulations of probability, some of its characteristic properties and successful applications. Second, we will apply it to problems in A.I.  where it promises to give near optimum search procedures for two very broad classes of problems. A strong motivation for revising classical concepts of probability has come from the analysis of human problem solving. When working on a di#cult problem, a person is in a maze in which he must make choices of possible courses of action. If the problem is a familiar one, the choices will all be easy. If it is not familiar, there can be much uncertainty in each choice, but choices must somehow be made. One basis for choice might be the probability of each choice leading to a quick solution  this probability being based on experience in this problem and in problems like it. A good reason for using proba
Universal schemes for sequential decision from individual data sequences
, 1993
"... Sequential decision algorithms are investigated, under a family of additive performance criteria, for individual data sequences, with various application areas in information theory and signal processing. Simple universal sequential schemes are known, under certain conditions, to approach optimality ..."
Abstract

Cited by 28 (11 self)
 Add to MetaCart
Sequential decision algorithms are investigated, under a family of additive performance criteria, for individual data sequences, with various application areas in information theory and signal processing. Simple universal sequential schemes are known, under certain conditions, to approach optimality uniformly as fast as nl log n, where n is the sample size. For the case of finitealphabet observations, the class of schemes that can be implemented by bitestate machines (FSM’s), is studied. It is shown that Markovian machines with daently long memory exist that are asympboticaily nerrly as good as any given FSM (deterministic or WomhI) for the purpose of sequential decision. For the continuousvalued observation case, a useful class of parametric schemes is discussed with special attention to the recursive least squares W) algorithm.