Results 11  20
of
30
MDL and MML: Similarities and Differences (Introduction to Minimum Encoding Inference  Part III)
, 1994
"... This paper continues the introduction to minimum encoding inductive inference given by Oliver and Hand. This series of papers was written with the objective of providing an introduction to this area for statisticians. We describe the message length estimates used in Wallace's Minimum Message Le ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
This paper continues the introduction to minimum encoding inductive inference given by Oliver and Hand. This series of papers was written with the objective of providing an introduction to this area for statisticians. We describe the message length estimates used in Wallace's Minimum Message Length (MML) inference and Rissanen's Minimum Description Length (MDL) inference. The differences in the message length estimates of the two approaches are explained. The implications of these differences for applications are discussed.
Minimum Message Length Clustering of SpatiallyCorrelated Data with Varying InterClass Penalties
 6TH IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCE (ICIS 2007
, 2007
"... We present here some applications of the Minimum Message Length (MML) principle to spatially correlated data. Discrete valued Markov Random Fields are used to model spatial correlation. The models for spatial correlation used here are a generalisation of the model used in (Wallace 1998) [14] for uns ..."
Abstract

Cited by 5 (3 self)
 Add to MetaCart
We present here some applications of the Minimum Message Length (MML) principle to spatially correlated data. Discrete valued Markov Random Fields are used to model spatial correlation. The models for spatial correlation used here are a generalisation of the model used in (Wallace 1998) [14] for unsupervised classification of spatially correlated data (such as image segmentation). We discuss how our work can be applied to that type of unsupervised classification. We now make the following three new contributions. First, the rectangular grid used in (Wallace 1998) [14] is generalised to an arbitrary graph of arbitrary edge distances. Secondly, we refine (Wallace 1998) [14] slightly by including a discarded message length term important to small data sets and to a simpler problem presented here. Finally, we show how the Minimum Message Length (MML) principle can be used to test for the presence of spatial correlation and how it can be used to choose between models of varying complexity to infer details of the nature of the spatial correlation.
CIRCULAR CLUSTERING BY MINIMUM MESSAGE LENGTH OF PROTEIN DIHEDRAL ANGLES
, 1995
"... Early work on proteins identified the existence of helices and extended sheets in protein secondary structures, a highlevel classification which remains popular today. Using the Snob program for informationtheoretic Minimum Message Length (MML) intrinsic classification, we are able to take the pro ..."
Abstract

Cited by 4 (4 self)
 Add to MetaCart
Early work on proteins identified the existence of helices and extended sheets in protein secondary structures, a highlevel classification which remains popular today. Using the Snob program for informationtheoretic Minimum Message Length (MML) intrinsic classification, we are able to take the protein dihedral angles as determined by Xray crystallography, and cluster sets of dihedral angles into groups. Previous work by Hunter and States had applied a similar Bayesian classification method, AutoClass, to protein data with site position represented by 3 Cartesian coordinates for each of the αCarbon, βCarbon and Nitrogen, totalling 9 coordinates. By using the von Mises circular distribution in the Snob program rather than the Normal distribution in the Hunter and States model, we are instead able to represent local site properties by the two dihedral angles, φ and ψ. Since each site can be modelled as having 2 degrees of freedom, this orientationinvariant dihedral angle representation of the data is more compact than that of nine highlycorrelated Cartesian coordinates. Using the informationtheoretic message length concepts discussed in the paper, such a more concise model is more likely to represent the underlying generating process from which the data comes. We report on the results of our classification, plotting the classes in (φ,ψ)space and introducing a symmetric informationtheoretic distance measure to build a minimum spanning tree between the classes. We also give a transition matrix between the classes and note the existence of three classes in the region φ ≈−1. 09 rad and ψ ≈−0. 75 rad which are close on the spanning tree and have high intertransition probabilities. These properties give rise to a tight, abundant, selfperpetuating, αhelical structure.
Introducing the Minimum Description Length Principle
"... This chapter provides a conceptual, entirely nontechnical introduction and overview of Rissanen’s minimum description length (MDL) principle. It serves as a basis for the technical introduction given in Chapter 2, in which all the ideas discussed here are made mathematically precise. ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
This chapter provides a conceptual, entirely nontechnical introduction and overview of Rissanen’s minimum description length (MDL) principle. It serves as a basis for the technical introduction given in Chapter 2, in which all the ideas discussed here are made mathematically precise.
Potential Properties of Turing Machines
, 2012
"... In this paper we investigate the notion of potential properties for Turing machines, focussing especially on universality and intelligence. We consider several machine characterisations (noninteractive and interactive) and give definitions for each case, considering permanent and transitory potentia ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
In this paper we investigate the notion of potential properties for Turing machines, focussing especially on universality and intelligence. We consider several machine characterisations (noninteractive and interactive) and give definitions for each case, considering permanent and transitory potentials. From these definitions, we analyse the relation between some potential abilities, we bring out the dependency on the environment distribution and we suggest some ideas on how potential abilities can be measured.
MMLD Inference of the Poisson and Geometric Models
"... Abstract. This paper examines MMLDbased approximations for the inference of two univariate probability densities: the geometric distribution, parameterised in terms of a mean parameter, and the Poisson distribution. The focus is on both parameter estimation and hypothesis testing properties of the ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Abstract. This paper examines MMLDbased approximations for the inference of two univariate probability densities: the geometric distribution, parameterised in terms of a mean parameter, and the Poisson distribution. The focus is on both parameter estimation and hypothesis testing properties of the approximation. The new parameter estimators are compared to the MML87 estimators in terms of bias, squared error risk and KL divergence risk. Empirical experiments demonstrate that the MMLD parameter estimates are more biased, and feature higher squared error risk than the corresponding MML87 estimators. In contrast, the two criteria are virtually indistinguishable in the hypothesis testing experiment. 1
Minimum Message Length Shrinkage Estimation
"... This note considers estimation of the mean of a multivariate Gaussian distribution with known variance within the Minimum Message Length (MML) framework. Interestingly, the resulting MML estimator exactly coincides with the positivepart JamesStein estimator under the choice of an uninformative pri ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
This note considers estimation of the mean of a multivariate Gaussian distribution with known variance within the Minimum Message Length (MML) framework. Interestingly, the resulting MML estimator exactly coincides with the positivepart JamesStein estimator under the choice of an uninformative prior. A new approach for estimating parameters and hyperparameters in general hierarchical Bayes models is also presented.
Luckiness and Regret in Minimum Description Length Inference
, 2009
"... Minimum Description Length (MDL) inference is based on the intuition that understanding the available data can be defined in terms of the ability to compress the data, i.e. to describe it in full using a shorter representation. This brief introduction discusses the design of the various codes used t ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Minimum Description Length (MDL) inference is based on the intuition that understanding the available data can be defined in terms of the ability to compress the data, i.e. to describe it in full using a shorter representation. This brief introduction discusses the design of the various codes used to implement MDL, focusing on the philosophically intriguing concepts of luckiness and regret: a good MDL code exhibits good performance in the worst case over all possible data sets, but achieves even better performance when the data turn out to be simple (although we suggest making no a priori assumptions to that effect). We then discuss how data compression relates to performance in various learning tasks, including parameter estimation, parametric and nonparametric model selection and sequential prediction of outcomes from an unknown source. Last, we briefly outline the history of MDL and its technical and philosophical relationship to other approaches to learning such as Bayesian, frequentist and prequential statistics. 1
1Universal Models for the Exponential Distribution
"... Abstract—This note considers the problem of constructing information theoretic universal models for data distributed according to the exponential distribution. The universal models examined include the sequential Normalised Maximum Likelihood (SNML) code, conditional Normalised Maximum Likelihood ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Abstract—This note considers the problem of constructing information theoretic universal models for data distributed according to the exponential distribution. The universal models examined include the sequential Normalised Maximum Likelihood (SNML) code, conditional Normalised Maximum Likelihood (CNML) code, the Minimum Message Length (MML) code and the Bayes mixture code (BMC). The CNML code yields a codelength identical to the Bayesian mixture code, and within O(1) of the MML codelength, with suitable data driven priors. Index Terms—MDL, MML, Universal Models I.
unknown title
"... In Inductive Logic Programming (ILP), since logic is a complete (universal) language, infinitely many possible hypotheses are compatible (hence plausible) given the evidence. An intrinsic way of selecting the most convenient hypothesis from the set of possible theories is not only useful for model s ..."
Abstract
 Add to MetaCart
(Show Context)
In Inductive Logic Programming (ILP), since logic is a complete (universal) language, infinitely many possible hypotheses are compatible (hence plausible) given the evidence. An intrinsic way of selecting the most convenient hypothesis from the set of possible theories is not only useful for model selection but it is also useful for guiding the search in the hypotheses space, as some ILP systems have done in the past. One selection/search criterion is to apply Occam’s razor, i.e. to first select/try the simplest hypotheses which cover the evidence. In order to do this, it is necessary to measure how simple a theory is. The Minimum Message Length (MML) principle is based on information theory and it reflects Occam’s razor philosophy. In this paper we present a MML method for costing both logic programs and sets of facts according to the theory. Our scheme has a solid foundation and avoids the drawbacks of previous coding schemes in ILP,