Results 1  10
of
508
Learning Stochastic Logic Programs
, 2000
"... Stochastic Logic Programs (SLPs) have been shown to be a generalisation of Hidden Markov Models (HMMs), stochastic contextfree grammars, and directed Bayes' nets. A stochastic logic program consists of a set of labelled clauses p:C where p is in the interval [0,1] and C is a firstorder r ..."
Abstract

Cited by 1163 (77 self)
 Add to MetaCart
(Show Context)
Stochastic Logic Programs (SLPs) have been shown to be a generalisation of Hidden Markov Models (HMMs), stochastic contextfree grammars, and directed Bayes' nets. A stochastic logic program consists of a set of labelled clauses p:C where p is in the interval [0,1] and C is a firstorder rangerestricted definite clause. This paper summarises the syntax, distributional semantics and proof techniques for SLPs and then discusses how a standard Inductive Logic Programming (ILP) system, Progol, has been modied to support learning of SLPs. The resulting system 1) nds an SLP with uniform probability labels on each definition and nearmaximal Bayes posterior probability and then 2) alters the probability labels to further increase the posterior probability. Stage 1) is implemented within CProgol4.5, which differs from previous versions of Progol by allowing userdefined evaluation functions written in Prolog. It is shown that maximising the Bayesian posterior function involves nding SLPs with short derivations of the examples. Search pruning with the Bayesian evaluation function is carried out in the same way as in previous versions of CProgol. The system is demonstrated with worked examples involving the learning of probability distributions over sequences as well as the learning of simple forms of uncertain knowledge.
A Theory of Program Size Formally Identical to Information Theory
, 1975
"... A new definition of programsize complexity is made. H(A;B=C;D) is defined to be the size in bits of the shortest selfdelimiting program for calculating strings A and B if one is given a minimalsize selfdelimiting program for calculating strings C and D. This differs from previous definitions: (1) ..."
Abstract

Cited by 397 (17 self)
 Add to MetaCart
(Show Context)
A new definition of programsize complexity is made. H(A;B=C;D) is defined to be the size in bits of the shortest selfdelimiting program for calculating strings A and B if one is given a minimalsize selfdelimiting program for calculating strings C and D. This differs from previous definitions: (1) programs are required to be selfdelimiting, i.e. no program is a prefix of another, and (2) instead of being given C and D directly, one is given a program for calculating them that is minimal in size. Unlike previous definitions, this one has precisely the formal 2 G. J. Chaitin properties of the entropy concept of information theory. For example, H(A;B) = H(A) + H(B=A) + O(1). Also, if a program of length k is assigned measure 2 \Gammak , then H(A) = \Gamma log 2 (the probability that the standard universal computer will calculate A) +O(1). Key Words and Phrases: computational complexity, entropy, information theory, instantaneous code, Kraft inequality, minimal program, probab...
Algorithmic information theory
 IBM JOURNAL OF RESEARCH AND DEVELOPMENT
, 1977
"... This paper reviews algorithmic information theory, which is an attempt to apply informationtheoretic and probabilistic ideas to recursive function theory. Typical concerns in this approach are, for example, the number of bits of information required to specify an algorithm, or the probability that ..."
Abstract

Cited by 394 (20 self)
 Add to MetaCart
This paper reviews algorithmic information theory, which is an attempt to apply informationtheoretic and probabilistic ideas to recursive function theory. Typical concerns in this approach are, for example, the number of bits of information required to specify an algorithm, or the probability that a program whose bits are chosen by coin flipping produces a given output. During the past few years the definitions of algorithmic information theory have been reformulated. The basic features of the new formalism are presented here and certain results of R. M. Solovay are reported.
The minimum description length principle in coding and modeling
 IEEE TRANS. INFORM. THEORY
, 1998
"... We review the principles of Minimum Description Length and Stochastic Complexity as used in data compression and statistical modeling. Stochastic complexity is formulated as the solution to optimum universal coding problems extending Shannon’s basic source coding theorem. The normalized maximized ..."
Abstract

Cited by 382 (17 self)
 Add to MetaCart
(Show Context)
We review the principles of Minimum Description Length and Stochastic Complexity as used in data compression and statistical modeling. Stochastic complexity is formulated as the solution to optimum universal coding problems extending Shannon’s basic source coding theorem. The normalized maximized likelihood, mixture, and predictive codings are each shown to achieve the stochastic complexity to within asymptotically vanishing terms. We assess the performance of the minimum description length criterion both from the vantage point of quality of data compression and accuracy of statistical inference. Context tree modeling, density estimation, and model selection in Gaussian linear regression serve as examples.
Minimum complexity density estimation
 IEEE TRANS. INF. THEORY
, 1991
"... The minimum complexity or minimum descriptionlength criterion developed by Kolmogorov, Rissanen, Wallace, So&in, and others leads to consistent probability density estimators. These density estimators are defined to achieve the best compromise between likelihood and simplicity. A related issue ..."
Abstract

Cited by 237 (7 self)
 Add to MetaCart
The minimum complexity or minimum descriptionlength criterion developed by Kolmogorov, Rissanen, Wallace, So&in, and others leads to consistent probability density estimators. These density estimators are defined to achieve the best compromise between likelihood and simplicity. A related issue is the compromise between accuracy of approximations and complexity relative to the sample size. An index of resolvability is studied which is shown to bound the statistical accuracy of the density estimators, as well as the informationtheoretic redundancy.
Probabilistic Models for Information Retrieval based on Divergence from Randomness
 ACM Transactions on Information Systems
, 2002
"... We introduce and create a framework for deriving probabilistic models of Information Retrieval. The models are nonparametric models of IR obtained in the language model approach. We derive termweighting models by measuring the divergence of the actual term distribution from that obtained under a ra ..."
Abstract

Cited by 224 (5 self)
 Add to MetaCart
We introduce and create a framework for deriving probabilistic models of Information Retrieval. The models are nonparametric models of IR obtained in the language model approach. We derive termweighting models by measuring the divergence of the actual term distribution from that obtained under a random process. Among the random processes we study the binomial distribution and Bose–Einstein statistics. We define two types of term frequency normalization for tuning term weights in the document–query matching process. The first normalization assumes that documents have the same length and measures the information gain with the observed term once it has been accepted as a good descriptor of the observed document. The second normalization is related to the document length and to other statistics. These two normalization methods are applied to the basic models in succession to obtain weighting formulae. Results show that our framework produces different nonparametric models forming baseline alternatives to the standard tfidf model.
Almost Everywhere High Nonuniform Complexity
, 1992
"... . We investigate the distribution of nonuniform complexities in uniform complexity classes. We prove that almost every problem decidable in exponential space has essentially maximum circuitsize and spacebounded Kolmogorov complexity almost everywhere. (The circuitsize lower bound actually exceeds ..."
Abstract

Cited by 176 (39 self)
 Add to MetaCart
. We investigate the distribution of nonuniform complexities in uniform complexity classes. We prove that almost every problem decidable in exponential space has essentially maximum circuitsize and spacebounded Kolmogorov complexity almost everywhere. (The circuitsize lower bound actually exceeds, and thereby strengthens, the Shannon 2 n n lower bound for almost every problem, with no computability constraint.) In exponential time complexity classes, we prove that the strongest relativizable lower bounds hold almost everywhere for almost all problems. Finally, we show that infinite pseudorandom sequences have high nonuniform complexity almost everywhere. The results are unified by a new, more powerful formulation of the underlying measure theory, based on uniform systems of density functions, and by the introduction of a new nonuniform complexity measure, the selective Kolmogorov complexity. This research was supported in part by NSF Grants CCR8809238 and CCR9157382 and in ...
Geometric Motion Segmentation and Model Selection
 Phil. Trans. Royal Society of London A
, 1998
"... this paper we place the three problems into a common statistical framework; investigating the use of information criteria and robust mixture models as a principled way for motion segmentation of images. The final result is a general fully automatic algorithm for clustering that works in the presence ..."
Abstract

Cited by 130 (2 self)
 Add to MetaCart
this paper we place the three problems into a common statistical framework; investigating the use of information criteria and robust mixture models as a principled way for motion segmentation of images. The final result is a general fully automatic algorithm for clustering that works in the presence of noise and outliers. 1. Introduction
Minimum Message Length and Kolmogorov Complexity
 Computer Journal
, 1999
"... this paper is to describe some of the relationships among the different streams and to try to clarify some of the important differences in their assumptions and development. Other studies mentioning the relationships appear in [1, Section IV, pp. 10381039], [2, sections 5.2, 5.5] and [3, p. 465] ..."
Abstract

Cited by 124 (27 self)
 Add to MetaCart
(Show Context)
this paper is to describe some of the relationships among the different streams and to try to clarify some of the important differences in their assumptions and development. Other studies mentioning the relationships appear in [1, Section IV, pp. 10381039], [2, sections 5.2, 5.5] and [3, p. 465]