## Applying MML to ILP ∗

### Cached

### Download Links

### BibTeX

@MISC{Ferri_applyingmml,

author = {D. L. Dowe Cèsar Ferri and José Hernández-orallo and María José and Ramírez Quintana},

title = {Applying MML to ILP ∗},

year = {}

}

### OpenURL

### Abstract

In Inductive Logic Programming (ILP), since logic is a complete (universal) language, innitely many possible hypotheses are compatible (hence plausible) given the evidence. An intrinsic way of selecting the most convenient hypothesis from the set of possible theories is not only useful for model selection but it is also useful for guiding the search in the hypotheses space, as some ILP systems have done in the past. One selection/search criterion is to apply Occam's razor, i.e. to rst select/try the simplest hypotheses which cover the evidence. In order to do this, it is necessary to measure how simple a theory is. The Minimum Message Length (MML) principle is based on information theory and it re ects Occam's razor philosophy. In this paper we present a MML method for costing both logic programs and sets of facts according to the theory. Our scheme has a solid foundation and avoids the drawbacks of previous coding schemes in ILP,

### Citations

1239 |
Modeling by Shortest Data Description
- Rissanen
- 1978
(Show Context)
Citation Context ...ence the name minimum message length" (principle) for thus choosing a theory, T , to t observed data, E. For a comparison with the related subsequent Minimum Description Length (MDL) work of Rissanen =-=[10, 11]-=-, see, e.g., [15] and other papers in that special issue of the Computer Journal and Chapter 10 of [13]. 3 Evidence Representation and Coding Given the MML philosophy it seems straightforward how to n... |

433 |
A universal prior for integers and estimation by minimum description length’, The Annals of Statistics 11(2
- Rissanen
- 1983
(Show Context)
Citation Context ...ence the name minimum message length" (principle) for thus choosing a theory, T , to t observed data, E. For a comparison with the related subsequent Minimum Description Length (MDL) work of Rissanen =-=[10, 11]-=-, see, e.g., [15] and other papers in that special issue of the Computer Journal and Chapter 10 of [13]. 3 Evidence Representation and Coding Given the MML philosophy it seems straightforward how to n... |

107 | Minimum message length and Kolmogorov complexity
- Wallace, Dowe
- 1999
(Show Context)
Citation Context ...tains two examples of how our scheme can be applied. The paper nishes with some conclusions and future work. 2 Minimum Message Length The Minimum Message Length (MML) principle of inductive inference =-=[16, 13, 15]-=-, is based on information theory, and hence lies on the interface of computer science and statistics. A Bayesian interpretation of the MML principle is that it variously states that the best conclusio... |

98 |
Learning from positive data
- Muggleton
- 1996
(Show Context)
Citation Context ...ring in light of that theory. this immediately below. Letting E be the data and T be a theory with prior probability Pr(T ), we can write 1 Logic programs are learnable from positive examples only as =-=[5]-=- shows. the posterior probability Pr(T |E) = Pr(T ∧ E)/ Pr(E) = Pr(T ) · Pr(E|T )/ Pr(E), by repeated application of Bayes's Theorem. Since E and Pr(E) are given and we wish to infer T , we can regard... |

93 |
Statistical and Inductive Inference by Minimum Message Length
- Wallace
- 2005
(Show Context)
Citation Context ... with arity greater or equal to 1. We discuss these problems in the following sections. In this paper, we present an alternative coding scheme based on the Minimum Message Length (MML) principle. MML =-=[13]-=- is a formal information theory restatement of Occam's Razor: even when models are not equal in explaining the observed data, the model generating the shortest overall message (data and model) is more... |

83 |
Learning logical de nitions from relations
- Quinlan
- 1990
(Show Context)
Citation Context ...es In this section, we show how can we apply our approach to select the best hypothesis when we have some available. 5.1 Network The rst example was also employed in [1], although it is original from =-=[9]-=-. The goal of this problem is to learn the predicate "reach", that expresses the binary "reachability" relation in a directed graph. One vertex can reach another if there is a path between them in the... |

70 | Parameter estimation in stochastic logic programs - Cussens |

31 | Inductive logic programming: issues, results and the challenge of learning language in logic
- Muggleton
- 1999
(Show Context)
Citation Context ... TAMAT. † C. Ferri was supported by grant 2765 of UPV during a stay at Monash University. namely the model complexity and proof complexity approaches. 1 Introduction Inductive Logic Programming (ILP) =-=[7]-=- is currently a very important area of research as an appropriate framework for the inductive inference of rst-order clausal theories from facts. ILP has provided an outstanding advantage in the induc... |

30 |
Learning Structure and Parameters of Stochastic Logic Programs
- Muggleton
- 2003
(Show Context)
Citation Context ...am T . The previous formula does not solve the problem by itself, since there can be many di erent ways to estimate p(E|T ). This approach is closely realted to Stochastic Inductive Logic Programming =-=[4, 2, 8]-=-.The idea is to use the program as a stochastic example generator. This is highly related to the PC approach, but we have to derive the probabilities with some conditions. For instance, let us consid... |

17 | Complexity-based induction
- Conklin, Witten
- 1994
(Show Context)
Citation Context ...azor and prefer simple hypotheses. For this purpose we need to measure the complexity of the learnt program with respect to the evidence. Several coding methods for ILP have been previously presented =-=[1]-=-,[6]. In these approaches, the evidence is composed of positiveexamples only 1 . However there are important drawbacks in these schemes: the coding in [6] can be counter-intuitive for programs that p... |

15 |
An information measure for classi cation
- Wallace, Boulton
- 1968
(Show Context)
Citation Context ...tains two examples of how our scheme can be applied. The paper nishes with some conclusions and future work. 2 Minimum Message Length The Minimum Message Length (MML) principle of inductive inference =-=[16, 13, 15]-=-, is based on information theory, and hence lies on the interface of computer science and statistics. A Bayesian interpretation of the MML principle is that it variously states that the best conclusio... |

7 |
Stochastic inductive logic programming
- Kovacic
- 1994
(Show Context)
Citation Context ... The measure of [1] is based on the size of Q(T ) (that is, the size of a subset of the least Herbrand model). Hence, this approach is called Model Complexity (MC). A similar approach can be found in =-=[4]-=-. Using the above notation, three possible situations are distinguished in [1]: 1. E = Q(T ). The theory covers all and only all the examples. 2. E ̸⊂ Q(T ). There are examples not covered by the theo... |

7 |
Compression, signi cance and accuracy
- Muggleton, Srinivasan, et al.
- 1992
(Show Context)
Citation Context ... and prefer simple hypotheses. For this purpose we need to measure the complexity of the learnt program with respect to the evidence. Several coding methods for ILP have been previously presented [1],=-=[6]-=-. In these approaches, the evidence is composed of positiveexamples only 1 . However there are important drawbacks in these schemes: the coding in [6] can be counter-intuitive for programs that perfe... |

3 |
An invariant Bayes method for point estimation. Classi cation
- Wallace, Boulton
- 1975
(Show Context)
Citation Context ...ly subtract from the message length. Another way to think of this is in terms of the equivalence of the paradigms of probability and message length via pi = 2 −li and li = − log2 pi, as emphasised in =-=[14]-=-. If we have several syntactically di erent ways of encoding something semantically equivalent, then the probability of the event increases as a result of the summation and the message length correspo... |

2 |
MML for Inductive Logic Programming
- Dowe, Ferri, et al.
- 2007
(Show Context)
Citation Context ...rt. the theory, we can then tackle the issue of assigning probabilities to evidences, i.e., to sets of examples. In our proposal, we just consider that the evidence doesn't contain repeated examples. =-=[3]-=- introduces a di erent costing method based on the premise that the evidence contains repeated examples. 3.2.1 Coding with no repeated examples Let us consider a set of examples E and a program T . Th... |

1 |
The justi cation of logical theories based on data compression
- Srinivasan, Muggleton, et al.
- 1994
(Show Context)
Citation Context ...asy to compute. In this case pnorep(E|T ) = 1 C 5 3 4 Costing Logic Programs = 3! (P 5 1 = 3 ) 10 . In this section we present our coding scheme for logic programs. The presented coding is similar to =-=[12]-=-. We consider four steps in our scheme: rst, we encode information about predicates and functions, then rule heads, rules bodies and, nally, we encode the links among repeated variables. Therefore, gi... |