Results 1  10
of
403
Learning Stochastic Logic Programs
, 2000
"... Stochastic Logic Programs (SLPs) have been shown to be a generalisation of Hidden Markov Models (HMMs), stochastic contextfree grammars, and directed Bayes' nets. A stochastic logic program consists of a set of labelled clauses p:C where p is in the interval [0,1] and C is a firstorder r ..."
Abstract

Cited by 1057 (69 self)
 Add to MetaCart
Stochastic Logic Programs (SLPs) have been shown to be a generalisation of Hidden Markov Models (HMMs), stochastic contextfree grammars, and directed Bayes' nets. A stochastic logic program consists of a set of labelled clauses p:C where p is in the interval [0,1] and C is a firstorder rangerestricted definite clause. This paper summarises the syntax, distributional semantics and proof techniques for SLPs and then discusses how a standard Inductive Logic Programming (ILP) system, Progol, has been modied to support learning of SLPs. The resulting system 1) nds an SLP with uniform probability labels on each definition and nearmaximal Bayes posterior probability and then 2) alters the probability labels to further increase the posterior probability. Stage 1) is implemented within CProgol4.5, which differs from previous versions of Progol by allowing userdefined evaluation functions written in Prolog. It is shown that maximising the Bayesian posterior function involves nding SLPs with short derivations of the examples. Search pruning with the Bayesian evaluation function is carried out in the same way as in previous versions of CProgol. The system is demonstrated with worked examples involving the learning of probability distributions over sequences as well as the learning of simple forms of uncertain knowledge.
A Theory of Program Size Formally Identical to Information Theory
, 1975
"... A new definition of programsize complexity is made. H(A;B=C;D) is defined to be the size in bits of the shortest selfdelimiting program for calculating strings A and B if one is given a minimalsize selfdelimiting program for calculating strings C and D. This differs from previous definitions: (1) ..."
Abstract

Cited by 330 (16 self)
 Add to MetaCart
A new definition of programsize complexity is made. H(A;B=C;D) is defined to be the size in bits of the shortest selfdelimiting program for calculating strings A and B if one is given a minimalsize selfdelimiting program for calculating strings C and D. This differs from previous definitions: (1) programs are required to be selfdelimiting, i.e. no program is a prefix of another, and (2) instead of being given C and D directly, one is given a program for calculating them that is minimal in size. Unlike previous definitions, this one has precisely the formal 2 G. J. Chaitin properties of the entropy concept of information theory. For example, H(A;B) = H(A) + H(B=A) + O(1). Also, if a program of length k is assigned measure 2 \Gammak , then H(A) = \Gamma log 2 (the probability that the standard universal computer will calculate A) +O(1). Key Words and Phrases: computational complexity, entropy, information theory, instantaneous code, Kraft inequality, minimal program, probab...
Algorithmic information theory
 IBM JOURNAL OF RESEARCH AND DEVELOPMENT
, 1977
"... This paper reviews algorithmic information theory, which is an attempt to apply informationtheoretic and probabilistic ideas to recursive function theory. Typical concerns in this approach are, for example, the number of bits of information required to specify an algorithm, or the probability that ..."
Abstract

Cited by 321 (19 self)
 Add to MetaCart
This paper reviews algorithmic information theory, which is an attempt to apply informationtheoretic and probabilistic ideas to recursive function theory. Typical concerns in this approach are, for example, the number of bits of information required to specify an algorithm, or the probability that a program whose bits are chosen by coin flipping produces a given output. During the past few years the definitions of algorithmic information theory have been reformulated. The basic features of the new formalism are presented here and certain results of R. M. Solovay are reported.
The minimum description length principle in coding and modeling
 IEEE Trans. Inform. Theory
, 1998
"... Abstract — We review the principles of Minimum Description Length and Stochastic Complexity as used in data compression and statistical modeling. Stochastic complexity is formulated as the solution to optimum universal coding problems extending Shannon’s basic source coding theorem. The normalized m ..."
Abstract

Cited by 306 (12 self)
 Add to MetaCart
Abstract — We review the principles of Minimum Description Length and Stochastic Complexity as used in data compression and statistical modeling. Stochastic complexity is formulated as the solution to optimum universal coding problems extending Shannon’s basic source coding theorem. The normalized maximized likelihood, mixture, and predictive codings are each shown to achieve the stochastic complexity to within asymptotically vanishing terms. We assess the performance of the minimum description length criterion both from the vantage point of quality of data compression and accuracy of statistical inference. Context tree modeling, density estimation, and model selection in Gaussian linear regression serve as examples. Index Terms—Complexity, compression, estimation, inference, universal modeling.
Almost Everywhere High Nonuniform Complexity
, 1992
"... . We investigate the distribution of nonuniform complexities in uniform complexity classes. We prove that almost every problem decidable in exponential space has essentially maximum circuitsize and spacebounded Kolmogorov complexity almost everywhere. (The circuitsize lower bound actually exceeds ..."
Abstract

Cited by 173 (36 self)
 Add to MetaCart
. We investigate the distribution of nonuniform complexities in uniform complexity classes. We prove that almost every problem decidable in exponential space has essentially maximum circuitsize and spacebounded Kolmogorov complexity almost everywhere. (The circuitsize lower bound actually exceeds, and thereby strengthens, the Shannon 2 n n lower bound for almost every problem, with no computability constraint.) In exponential time complexity classes, we prove that the strongest relativizable lower bounds hold almost everywhere for almost all problems. Finally, we show that infinite pseudorandom sequences have high nonuniform complexity almost everywhere. The results are unified by a new, more powerful formulation of the underlying measure theory, based on uniform systems of density functions, and by the introduction of a new nonuniform complexity measure, the selective Kolmogorov complexity. This research was supported in part by NSF Grants CCR8809238 and CCR9157382 and in ...
Probabilistic Models for Information Retrieval based on Divergence from Randomness
 ACM Transactions on Information Systems
, 2002
"... We introduce and create a framework for deriving probabilistic models of Information Retrieval. The models are nonparametric models of IR obtained in the language model approach. We derive termweighting models by measuring the divergence of the actual term distribution from that obtained under a ra ..."
Abstract

Cited by 152 (5 self)
 Add to MetaCart
We introduce and create a framework for deriving probabilistic models of Information Retrieval. The models are nonparametric models of IR obtained in the language model approach. We derive termweighting models by measuring the divergence of the actual term distribution from that obtained under a random process. Among the random processes we study the binomial distribution and Bose–Einstein statistics. We define two types of term frequency normalization for tuning term weights in the document–query matching process. The first normalization assumes that documents have the same length and measures the information gain with the observed term once it has been accepted as a good descriptor of the observed document. The second normalization is related to the document length and to other statistics. These two normalization methods are applied to the basic models in succession to obtain weighting formulae. Results show that our framework produces different nonparametric models forming baseline alternatives to the standard tfidf model.
Geometric Motion Segmentation and Model Selection
 Phil. Trans. Royal Society of London A
, 1998
"... this paper we place the three problems into a common statistical framework; investigating the use of information criteria and robust mixture models as a principled way for motion segmentation of images. The final result is a general fully automatic algorithm for clustering that works in the presence ..."
Abstract

Cited by 105 (2 self)
 Add to MetaCart
this paper we place the three problems into a common statistical framework; investigating the use of information criteria and robust mixture models as a principled way for motion segmentation of images. The final result is a general fully automatic algorithm for clustering that works in the presence of noise and outliers. 1. Introduction
Minimum Message Length and Kolmogorov Complexity
 Computer Journal
, 1999
"... this paper is to describe some of the relationships among the different streams and to try to clarify some of the important differences in their assumptions and development. Other studies mentioning the relationships appear in [1, Section IV, pp. 10381039], [2, sections 5.2, 5.5] and [3, p. 465] ..."
Abstract

Cited by 104 (25 self)
 Add to MetaCart
this paper is to describe some of the relationships among the different streams and to try to clarify some of the important differences in their assumptions and development. Other studies mentioning the relationships appear in [1, Section IV, pp. 10381039], [2, sections 5.2, 5.5] and [3, p. 465]
The Dimensions of Individual Strings and Sequences
 INFORMATION AND COMPUTATION
, 2003
"... A constructive version of Hausdorff dimension is developed using constructive supergales, which are betting strategies that generalize the constructive supermartingales used in the theory of individual random sequences. This constructive dimension is used to assign every individual (infinite, binary ..."
Abstract

Cited by 93 (10 self)
 Add to MetaCart
A constructive version of Hausdorff dimension is developed using constructive supergales, which are betting strategies that generalize the constructive supermartingales used in the theory of individual random sequences. This constructive dimension is used to assign every individual (infinite, binary) sequence S a dimension, which is a real number dim(S) in the interval [0, 1]. Sequences that