Results 1  10
of
14
Minimum Message Length and Kolmogorov Complexity
 Computer Journal
, 1999
"... this paper is to describe some of the relationships among the different streams and to try to clarify some of the important differences in their assumptions and development. Other studies mentioning the relationships appear in [1, Section IV, pp. 10381039], [2, sections 5.2, 5.5] and [3, p. 465] ..."
Abstract

Cited by 104 (25 self)
 Add to MetaCart
this paper is to describe some of the relationships among the different streams and to try to clarify some of the important differences in their assumptions and development. Other studies mentioning the relationships appear in [1, Section IV, pp. 10381039], [2, sections 5.2, 5.5] and [3, p. 465]
MML clustering of multistate, Poisson, von Mises circular and Gaussian distributions
 Statistics Computing
, 2000
"... Minimum Message Length (MML) is an invariant Bayesian point estimation technique which is also statistically consistent and efficient. We provide a brief overview of MML inductive inference ..."
Abstract

Cited by 32 (10 self)
 Add to MetaCart
Minimum Message Length (MML) is an invariant Bayesian point estimation technique which is also statistically consistent and efficient. We provide a brief overview of MML inductive inference
Bayes not Bust! Why Simplicity is no Problem for Bayesians
, 2007
"... The advent of formal definitions of the simplicity of a theory has important implications for model selection. But what is the best way to define simplicity? Forster and Sober ([1994]) advocate the use of Akaike’s Information Criterion (AIC), a nonBayesian formalisation of the notion of simplicity. ..."
Abstract

Cited by 13 (10 self)
 Add to MetaCart
The advent of formal definitions of the simplicity of a theory has important implications for model selection. But what is the best way to define simplicity? Forster and Sober ([1994]) advocate the use of Akaike’s Information Criterion (AIC), a nonBayesian formalisation of the notion of simplicity. This forms an important part of their wider attack on Bayesianism in the philosophy of science. We defend a Bayesian alternative: the simplicity of a theory is to be characterised in terms of Wallace’s Minimum Message Length (MML). We show that AIC is inadequate for many statistical problems where MML performs well. Whereas MML is always defined, AIC can be undefined. Whereas MML is not known ever to be statistically inconsistent, AIC can be. Even when defined and consistent, AIC performs worse than MML on small sample sizes. MML is statistically invariant under 1to1 reparametrisation, thus avoiding a common criticism of Bayesian approaches. We also show that MML provides answers to many of Forster’s objections to Bayesianism. Hence an important part of the attack on
MML mixture modelling of multistate, Poisson, von Mises circular and Gaussian distributions
 In Proc. 6th Int. Workshop on Artif. Intelligence and Statistics
, 1997
"... Minimum Message Length (MML) is an invariant Bayesian point estimation technique which is also consistent and efficient. We provide a brief overview of MML inductive inference (Wallace and Boulton (1968), Wallace and Freeman (1987)), and how it has both an informationtheoretic and a Bayesian interp ..."
Abstract

Cited by 8 (5 self)
 Add to MetaCart
Minimum Message Length (MML) is an invariant Bayesian point estimation technique which is also consistent and efficient. We provide a brief overview of MML inductive inference (Wallace and Boulton (1968), Wallace and Freeman (1987)), and how it has both an informationtheoretic and a Bayesian interpretation. We then outline how MML is used for statistical parameter estimation, and how the MML mixture modelling program, Snob (Wallace and Boulton (1968), Wallace (1986), Wallace and Dowe(1994)) uses the message lengths from various parameter estimates to enable it to combine parameter estimation with selection of the number of components. The message length is (to within a constant) the logarithm of the posterior probability of the theory. So, the MML theory can also be regarded as the theory with the highest posterior probability. Snob currently assumes that variables are uncorrelated, and permits multivariate data from Gaussian, discrete multistate, Poisson and von Mises circular dist...
MDL and MML: Similarities and Differences (Introduction to Minimum Encoding Inference  Part III)
, 1994
"... This paper continues the introduction to minimum encoding inductive inference given by Oliver and Hand. This series of papers was written with the objective of providing an introduction to this area for statisticians. We describe the message length estimates used in Wallace's Minimum Message Length ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
This paper continues the introduction to minimum encoding inductive inference given by Oliver and Hand. This series of papers was written with the objective of providing an introduction to this area for statisticians. We describe the message length estimates used in Wallace's Minimum Message Length (MML) inference and Rissanen's Minimum Description Length (MDL) inference. The differences in the message length estimates of the two approaches are explained. The implications of these differences for applications are discussed.
MML and Bayesianism: Similarities and Differences (Introduction to Minimum Encoding Inference  Part II)
, 1994
"... This paper continues the introduction to minimum encoding inference given by Oliver and Hand. This series of papers were written with the objective of providing an introduction to this area for statisticians. We examine the relationship between Bayesianism and Minimum Message Length (MML) inference. ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
This paper continues the introduction to minimum encoding inference given by Oliver and Hand. This series of papers were written with the objective of providing an introduction to this area for statisticians. We examine the relationship between Bayesianism and Minimum Message Length (MML) inference. We argue that MML augments Bayesian methods by providing a sound Bayesian method for point estimation which is invariant under nonlinear transformations. We explore the issues of invariance of estimators under nonlinear transformations, the role of the Fisher Information matrix in MML inference, and the apparent similarity between MML and the adoption of a Jeffreys' Prior. We then compare MML to an approximate method of Bayesian Model Class Selection. Despite apparent similarities in their expressions, the properties of the two approaches can be different.
CIRCULAR CLUSTERING BY MINIMUM MESSAGE LENGTH OF PROTEIN DIHEDRAL ANGLES
, 1995
"... Early work on proteins identified the existence of helices and extended sheets in protein secondary structures, a highlevel classification which remains popular today. Using the Snob program for informationtheoretic Minimum Message Length (MML) intrinsic classification, we are able to take the pro ..."
Abstract

Cited by 4 (4 self)
 Add to MetaCart
Early work on proteins identified the existence of helices and extended sheets in protein secondary structures, a highlevel classification which remains popular today. Using the Snob program for informationtheoretic Minimum Message Length (MML) intrinsic classification, we are able to take the protein dihedral angles as determined by Xray crystallography, and cluster sets of dihedral angles into groups. Previous work by Hunter and States had applied a similar Bayesian classification method, AutoClass, to protein data with site position represented by 3 Cartesian coordinates for each of the αCarbon, βCarbon and Nitrogen, totalling 9 coordinates. By using the von Mises circular distribution in the Snob program rather than the Normal distribution in the Hunter and States model, we are instead able to represent local site properties by the two dihedral angles, φ and ψ. Since each site can be modelled as having 2 degrees of freedom, this orientationinvariant dihedral angle representation of the data is more compact than that of nine highlycorrelated Cartesian coordinates. Using the informationtheoretic message length concepts discussed in the paper, such a more concise model is more likely to represent the underlying generating process from which the data comes. We report on the results of our classification, plotting the classes in (φ,ψ)space and introducing a symmetric informationtheoretic distance measure to build a minimum spanning tree between the classes. We also give a transition matrix between the classes and note the existence of three classes in the region φ ≈−1. 09 rad and ψ ≈−0. 75 rad which are close on the spanning tree and have high intertransition probabilities. These properties give rise to a tight, abundant, selfperpetuating, αhelical structure.
Fast FullSearch Equivalent NearestNeighbour Search Algorithms
, 1999
"... A fundamental activity common to many image processing, pattern classification, and clustering algorithms involves searching a set of n, kdimensional data for the one which is nearest to a given target item with respect to a distance function. Our goal is to find fast search algorithms which are fu ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
A fundamental activity common to many image processing, pattern classification, and clustering algorithms involves searching a set of n, kdimensional data for the one which is nearest to a given target item with respect to a distance function. Our goal is to find fast search algorithms which are fullsearch equivalentthat is, the resulting match is as good as what we could obtain if we were to search the set exhaustively. We propose a framework made up of three components, namely (i) a technique for obtaining a good initial match, (ii) an inexpensive method for determining whether the current match is a fullsearch equivalent match, and (iii) an effective technique for improving the current match. Our approach is to consider good solutions for each component in order to find an algorithm which balances the overall complexity of the search. We also propose a technique for hierarchical ordering and cluster elimination using a minimal cost spanning tree. Our experiments on vector quantisation coding of images show that the framework and techniques we proposed can be used to construct suitable algorithms for most of our data sets which require fullsearch equivalent matches at an average arithmetic cost of less than O(k log n) while using only O(n) space.
A Preliminary MML Linear Classifier using Principal Components for Multiple Classes
"... In this paper we improve on the supervised classification method developed in Kornienko et al. (2002) by the introduction of Principal Components Analysis to the inference process. We also extend the classifier from dealing with binomial (twoclass) problems only to multinomial (multiclass) problem ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
In this paper we improve on the supervised classification method developed in Kornienko et al. (2002) by the introduction of Principal Components Analysis to the inference process. We also extend the classifier from dealing with binomial (twoclass) problems only to multinomial (multiclass) problems. The application to which the MML criterion has been applied in this paper is the classification of objects via a linear hyperplane, where the objects are able to come from any multiclass distribution. The inclusion of Principal Component Analysis to the original inference scheme reduces the bias present in the classifier’s search technique. Such improvements lead to a method which, when compared against three commercial Support Vector Machine (SVM) classifiers on Binary data, was found to be as good as the most successful SVM tested. Furthermore, the new scheme is able to classify objects of a multiclass distribution with just one hyperplane, whereas SVMs require several hyperplanes.