Results 1 - 10
of
14
Minimum Message Length and Kolmogorov Complexity
- Computer Journal
, 1999
"... this paper is to describe some of the relationships among the different streams and to try to clarify some of the important differences in their assumptions and development. Other studies mentioning the relationships appear in [1, Section IV, pp. 1038--1039], [2, sections 5.2, 5.5] and [3, p. 465] ..."
Abstract
-
Cited by 86 (20 self)
- Add to MetaCart
this paper is to describe some of the relationships among the different streams and to try to clarify some of the important differences in their assumptions and development. Other studies mentioning the relationships appear in [1, Section IV, pp. 1038--1039], [2, sections 5.2, 5.5] and [3, p. 465]
MML clustering of multi-state, Poisson, von Mises circular and Gaussian distributions
- Statistics Computing
, 2000
"... Minimum Message Length (MML) is an invariant Bayesian point estimation technique which is also statistically consistent and efficient. We provide a brief overview of MML inductive inference ..."
Abstract
-
Cited by 29 (8 self)
- Add to MetaCart
Minimum Message Length (MML) is an invariant Bayesian point estimation technique which is also statistically consistent and efficient. We provide a brief overview of MML inductive inference
Bayes not Bust! Why Simplicity is no Problem for Bayesians
, 2007
"... The advent of formal definitions of the simplicity of a theory has important implications for model selection. But what is the best way to define simplicity? Forster and Sober ([1994]) advocate the use of Akaike’s Information Criterion (AIC), a non-Bayesian formalisation of the notion of simplicity. ..."
Abstract
-
Cited by 10 (9 self)
- Add to MetaCart
The advent of formal definitions of the simplicity of a theory has important implications for model selection. But what is the best way to define simplicity? Forster and Sober ([1994]) advocate the use of Akaike’s Information Criterion (AIC), a non-Bayesian formalisation of the notion of simplicity. This forms an important part of their wider attack on Bayesianism in the philosophy of science. We defend a Bayesian alternative: the simplicity of a theory is to be characterised in terms of Wallace’s Minimum Message Length (MML). We show that AIC is inadequate for many statistical problems where MML performs well. Whereas MML is always defined, AIC can be undefined. Whereas MML is not known ever to be statistically inconsistent, AIC can be. Even when defined and consistent, AIC performs worse than MML on small sample sizes. MML is statistically invariant under 1-to-1 re-parametrisation, thus avoiding a common criticism of Bayesian approaches. We also show that MML provides answers to many of Forster’s objections to Bayesianism. Hence an important part of the attack on
MML mixture modelling of multi-state, Poisson, von Mises circular and Gaussian distributions
- In Proc. 6th Int. Workshop on Artif. Intelligence and Statistics
, 1997
"... Minimum Message Length (MML) is an invariant Bayesian point estimation technique which is also consistent and efficient. We provide a brief overview of MML inductive inference (Wallace and Boulton (1968), Wallace and Freeman (1987)), and how it has both an information-theoretic and a Bayesian interp ..."
Abstract
-
Cited by 8 (5 self)
- Add to MetaCart
Minimum Message Length (MML) is an invariant Bayesian point estimation technique which is also consistent and efficient. We provide a brief overview of MML inductive inference (Wallace and Boulton (1968), Wallace and Freeman (1987)), and how it has both an information-theoretic and a Bayesian interpretation. We then outline how MML is used for statistical parameter estimation, and how the MML mixture modelling program, Snob (Wallace and Boulton (1968), Wallace (1986), Wallace and Dowe(1994)) uses the message lengths from various parameter estimates to enable it to combine parameter estimation with selection of the number of components. The message length is (to within a constant) the logarithm of the posterior probability of the theory. So, the MML theory can also be regarded as the theory with the highest posterior probability. Snob currently assumes that variables are uncorrelated, and permits multi-variate data from Gaussian, discrete multi-state, Poisson and von Mises circular dist...
MDL and MML: Similarities and Differences (Introduction to Minimum Encoding Inference -- Part III)
, 1994
"... This paper continues the introduction to minimum encoding inductive inference given by Oliver and Hand. This series of papers was written with the objective of providing an introduction to this area for statisticians. We describe the message length estimates used in Wallace's Minimum Message Length ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
This paper continues the introduction to minimum encoding inductive inference given by Oliver and Hand. This series of papers was written with the objective of providing an introduction to this area for statisticians. We describe the message length estimates used in Wallace's Minimum Message Length (MML) inference and Rissanen's Minimum Description Length (MDL) inference. The differences in the message length estimates of the two approaches are explained. The implications of these differences for applications are discussed.
MML and Bayesianism: Similarities and Differences (Introduction to Minimum Encoding Inference -- Part II)
, 1994
"... This paper continues the introduction to minimum encoding inference given by Oliver and Hand. This series of papers were written with the objective of providing an introduction to this area for statisticians. We examine the relationship between Bayesianism and Minimum Message Length (MML) inference. ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
This paper continues the introduction to minimum encoding inference given by Oliver and Hand. This series of papers were written with the objective of providing an introduction to this area for statisticians. We examine the relationship between Bayesianism and Minimum Message Length (MML) inference. We argue that MML augments Bayesian methods by providing a sound Bayesian method for point estimation which is invariant under non-linear transformations. We explore the issues of invariance of estimators under non-linear transformations, the role of the Fisher Information matrix in MML inference, and the apparent similarity between MML and the adoption of a Jeffreys' Prior. We then compare MML to an approximate method of Bayesian Model Class Selection. Despite apparent similarities in their expressions, the properties of the two approaches can be different.
Fast Full-Search Equivalent Nearest-Neighbour Search Algorithms
, 1999
"... A fundamental activity common to many image processing, pattern classification, and clustering algorithms involves searching a set of n, k-dimensional data for the one which is nearest to a given target item with respect to a distance function. Our goal is to find fast search algorithms which are fu ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
A fundamental activity common to many image processing, pattern classification, and clustering algorithms involves searching a set of n, k-dimensional data for the one which is nearest to a given target item with respect to a distance function. Our goal is to find fast search algorithms which are full-search equivalent---that is, the resulting match is as good as what we could obtain if we were to search the set exhaustively. We propose a framework made up of three components, namely (i) a technique for obtaining a good initial match, (ii) an inexpensive method for determining whether the current match is a full-search equivalent match, and (iii) an effective technique for improving the current match. Our approach is to consider good solutions for each component in order to find an algorithm which balances the overall complexity of the search. We also propose a technique for hierarchical ordering and cluster elimination using a minimal cost spanning tree. Our experiments on vector quantisation coding of images show that the framework and techniques we proposed can be used to construct suitable algorithms for most of our data sets which require full-search equivalent matches at an average arithmetic cost of less than O(k log n) while using only O(n) space.
CIRCULAR CLUSTERING BY MINIMUM MESSAGE LENGTH OF PROTEIN DIHEDRAL ANGLES
, 1995
"... Early work on proteins identified the existence of helices and extended sheets in protein secondary structures, a high-level classification which remains popular today. Using the Snob program for information-theoretic Minimum Message Length (MML) intrinsic classification, we are able to take the pro ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
Early work on proteins identified the existence of helices and extended sheets in protein secondary structures, a high-level classification which remains popular today. Using the Snob program for information-theoretic Minimum Message Length (MML) intrinsic classification, we are able to take the protein dihedral angles as determined by X-ray crystallography, and cluster sets of dihedral angles into groups. Previous work by Hunter and States had applied a similar Bayesian classification method, AutoClass, to protein data with site position represented by 3 Cartesian co-ordinates for each of the α-Carbon, β-Carbon and Nitrogen, totalling 9 co-ordinates. By using the von Mises circular distribution in the Snob program rather than the Normal distribution in the Hunter and States model, we are instead able to represent local site properties by the two dihedral angles, φ and ψ. Since each site can be modelled as having 2 degrees of freedom, this orientation-invariant dihedral angle representation of the data is more compact than that of nine highly-correlated Cartesian co-ordinates. Using the information-theoretic message length concepts discussed in the paper, such a more concise model is more likely to represent the underlying generating process from which the data comes. We report on the results of our classification, plotting the classes in (φ,ψ)-space and introducing a symmetric information-theoretic distance measure to build a minimum spanning tree between the classes. We also give a transition matrix between the classes and note the existence of three classes in the region φ ≈−1. 09 rad and ψ ≈−0. 75 rad which are close on the spanning tree and have high inter-transition probabilities. These properties give rise to a tight, abundant, self-perpetuating, α-helical structure.
A Preliminary MML Linear Classifier using Principal Components for Multiple Classes
"... In this paper we improve on the supervised classification method developed in Kornienko et al. (2002) by the introduction of Principal Components Analysis to the inference process. We also extend the classifier from dealing with binomial (two-class) problems only to multinomial (multi-class) problem ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
In this paper we improve on the supervised classification method developed in Kornienko et al. (2002) by the introduction of Principal Components Analysis to the inference process. We also extend the classifier from dealing with binomial (two-class) problems only to multinomial (multi-class) problems. The application to which the MML criterion has been applied in this paper is the classification of objects via a linear hyperplane, where the objects are able to come from any multi-class distribution. The inclusion of Principal Component Analysis to the original inference scheme reduces the bias present in the classifier’s search technique. Such improvements lead to a method which, when compared against three commercial Support Vector Machine (SVM) classifiers on Binary data, was found to be as good as the most successful SVM tested. Furthermore, the new scheme is able to classify objects of a multiclass distribution with just one hyperplane, whereas SVMs require several hyperplanes.

