Results 1 - 10
of
50
Learning Stochastic Logic Programs
, 2000
"... Stochastic Logic Programs (SLPs) have been shown to be a generalisation of Hidden Markov Models (HMMs), stochastic context-free grammars, and directed Bayes' nets. A stochastic logic program consists of a set of labelled clauses p:C where p is in the interval [0,1] and C is a first-order range- ..."
Abstract
-
Cited by 962 (56 self)
- Add to MetaCart
Stochastic Logic Programs (SLPs) have been shown to be a generalisation of Hidden Markov Models (HMMs), stochastic context-free grammars, and directed Bayes' nets. A stochastic logic program consists of a set of labelled clauses p:C where p is in the interval [0,1] and C is a first-order range-restricted definite clause. This paper summarises the syntax, distributional semantics and proof techniques for SLPs and then discusses how a standard Inductive Logic Programming (ILP) system, Progol, has been modied to support learning of SLPs. The resulting system 1) nds an SLP with uniform probability labels on each definition and near-maximal Bayes posterior probability and then 2) alters the probability labels to further increase the posterior probability. Stage 1) is implemented within CProgol4.5, which differs from previous versions of Progol by allowing user-defined evaluation functions written in Prolog. It is shown that maximising the Bayesian posterior function involves nding SLPs with short derivations of the examples. Search pruning with the Bayesian evaluation function is carried out in the same way as in previous versions of CProgol. The system is demonstrated with worked examples involving the learning of probability distributions over sequences as well as the learning of simple forms of uncertain knowledge.
Prior Probabilities
- IEEE Transactions on Systems Science and Cybernetics
, 1968
"... e case of location and scale parameters, rate constants, and in Bernoulli trials with unknown probability of success. In realistic problems, both the transformation group analysis and the principle of maximum entropy are needed to determine the prior. The distributions thus found are uniquely determ ..."
Abstract
-
Cited by 135 (3 self)
- Add to MetaCart
e case of location and scale parameters, rate constants, and in Bernoulli trials with unknown probability of success. In realistic problems, both the transformation group analysis and the principle of maximum entropy are needed to determine the prior. The distributions thus found are uniquely determined by the prior information, independently of the choice of parameters. In a certain class of problems, therefore, the prior distributions may now be claimed to be fully as "objective" as the sampling distributions. I. Background of the problem Since the time of Laplace, applications of probability theory have been hampered by difficulties in the treatment of prior information. In realistic problems of decision or inference, we often have prior information which is highly relevant to the question being asked; to fail to take it into account is to commit the most obvious inconsistency of reasoning and may lead to absurd or dangerously misleading results. As an extreme examp
Random Worlds and Maximum Entropy
- In Proc. 7th IEEE Symp. on Logic in Computer Science
, 1994
"... Given a knowledge base KB containing first-order and statistical facts, we consider a principled method, called the random-worlds method, for computing a degree of belief that some formula ' holds given KB . If we are reasoning about a world or system consisting of N individuals, then we can conside ..."
Abstract
-
Cited by 44 (12 self)
- Add to MetaCart
Given a knowledge base KB containing first-order and statistical facts, we consider a principled method, called the random-worlds method, for computing a degree of belief that some formula ' holds given KB . If we are reasoning about a world or system consisting of N individuals, then we can consider all possible worlds, or first-order models, with domain f1; : : : ; Ng that satisfy KB , and compute the fraction of them in which ' is true. We define the degree of belief to be the asymptotic value of this fraction as N grows large. We show that when the vocabulary underlying ' and KB uses constants and unary predicates only, we can naturally associate an entropy with each world. As N grows larger, there are many more worlds with higher entropy. Therefore, we can use a maximum-entropy computation to compute the degree of belief. This result is in a similar spirit to previous work in physics and artificial intelligence, but is far more general. Of equal interest to the result itself are...
Statistical Foundations for Default Reasoning
, 1993
"... We describe a new approach to default reasoning, based on a principle of indifference among possible worlds. We interpret default rules as extreme statistical statements, thus obtaining a knowledge base KB comprised of statistical and first-order statements. We then assign equal probability to all w ..."
Abstract
-
Cited by 43 (8 self)
- Add to MetaCart
We describe a new approach to default reasoning, based on a principle of indifference among possible worlds. We interpret default rules as extreme statistical statements, thus obtaining a knowledge base KB comprised of statistical and first-order statements. We then assign equal probability to all worlds consistent with KB in order to assign a degree of belief to a statement '. The degree of belief can be used to decide whether to defeasibly conclude '. Various natural patterns of reasoning, such as a preference for more specific defaults, indifference to irrelevant information, and the ability to combine independent pieces of evidence, turn out to follow naturally from this technique. Furthermore, our approach is not restricted to default reasoning; it supports a spectrum of reasoning, from quantitative to qualitative. It is also related to other systems for default reasoning. In particular, we show that the work of [ Goldszmidt et al., 1990 ] , which applies maximum entropy ideas t...
A Natural Law of Succession
, 1995
"... We present a new solution to multinomial estimation and demonstrate that our solution outperforms standard solutions both in theory and in practice. The novelty of our approach lies in our use of combinatorial priors on strings. I. Natural Strings An alphabet represents the set of logically possib ..."
Abstract
-
Cited by 33 (4 self)
- Add to MetaCart
We present a new solution to multinomial estimation and demonstrate that our solution outperforms standard solutions both in theory and in practice. The novelty of our approach lies in our use of combinatorial priors on strings. I. Natural Strings An alphabet represents the set of logically possible events. In this world, all strings are finite and most are very short. For this basic reason, natural strings do not include all the symbols in the alphabet. This claim is tautological for short strings, but it is also true for long strings. To model this phenomenon, we propose a uniform prior on the cardinalities of all nonempty subsets of the alphabet. Such a prior on an alphabet of size k entails the probability pN (x n jn) = min(k; n) ` k q '` n \Gamma 1 q \Gamma 1 '` n fn i g ' \Gamma1 for strings x n of length n with cardinality q. This probability is not Kolmogorov compatible. To obtain a conditional probability, we must use p(ijx n ; n + 1) instead of the more o...
From inheritance relation to nonaxiomatic logic
- International Journal of Approximate Reasoning
, 1994
"... Non-Axiomatic Reasoning System is an adaptive system that works with insu cient knowledge and resources. At the beginning of the paper, three binary term logics are de ned. The rst is based only on an inheritance relation. The second and the third suggest a novel way to process extension and intensi ..."
Abstract
-
Cited by 31 (24 self)
- Add to MetaCart
Non-Axiomatic Reasoning System is an adaptive system that works with insu cient knowledge and resources. At the beginning of the paper, three binary term logics are de ned. The rst is based only on an inheritance relation. The second and the third suggest a novel way to process extension and intension, and they also have interesting relations with Aristotle's syllogistic logic. Based on the three simple systems, a Non-Axiomatic Logic is de ned. It has a term-oriented language and an experience-grounded semantics. It can uniformly represents and processes randomness, fuzziness, and ignorance. It can also uniformly carries out deduction, abduction, induction, and revision.
Distinguishing Exceptions from Noise in Non-Monotonic Learning
-
, 1996
"... It is important for a learning program to have a reliable method of deciding whether to treat errors as noise or to include them as exceptions within a growing first-order theory. We explore the use of an informationtheoretic measure to decide this problem within the non-monotonic learning frame ..."
Abstract
-
Cited by 21 (4 self)
- Add to MetaCart
It is important for a learning program to have a reliable method of deciding whether to treat errors as noise or to include them as exceptions within a growing first-order theory. We explore the use of an informationtheoretic measure to decide this problem within the non-monotonic learning framework defined by Closed-World-Specialisation. The approach adopted uses a model that consists of a reference Turing machine which accepts an encoding of a theory and proofs on its input tape and generates the observed data on the output tape. Within this model, the theory is said to "compress" data if the length of the input tape is shorter than that of the output tape. Data found to be incompressible are deemed to be "noise".
On Universal Prediction and Bayesian Confirmation
- Theoretical Computer Science
, 2007
"... The Bayesian framework is a well-studied and successful framework for inductive reasoning, which includes hypothesis testing and confirmation, parameter estimation, sequence prediction, classification, and regression. But standard statistical guidelines for choosing the model class and prior are not ..."
Abstract
-
Cited by 20 (10 self)
- Add to MetaCart
The Bayesian framework is a well-studied and successful framework for inductive reasoning, which includes hypothesis testing and confirmation, parameter estimation, sequence prediction, classification, and regression. But standard statistical guidelines for choosing the model class and prior are not always available or can fail, in particular in complex situations. Solomonoff completed the Bayesian framework by providing a rigorous, unique, formal, and universal choice for the model class and the prior. I discuss in breadth how and in which sense universal (non-i.i.d.) sequence prediction solves various (philosophical) problems of traditional Bayesian sequence prediction. I show that Solomonoff’s model possesses many desirable properties: Strong total and future bounds, and weak instantaneous bounds, and in contrast to most classical continuous prior densities has no zero p(oste)rior problem, i.e. can confirm universal hypotheses, is reparametrization and regrouping invariant, and avoids the old-evidence and updating problem. It even performs well

