Results 1 
7 of
7
Minimum Message Length and Kolmogorov Complexity
 Computer Journal
, 1999
"... this paper is to describe some of the relationships among the different streams and to try to clarify some of the important differences in their assumptions and development. Other studies mentioning the relationships appear in [1, Section IV, pp. 10381039], [2, sections 5.2, 5.5] and [3, p. 465] ..."
Abstract

Cited by 104 (25 self)
 Add to MetaCart
this paper is to describe some of the relationships among the different streams and to try to clarify some of the important differences in their assumptions and development. Other studies mentioning the relationships appear in [1, Section IV, pp. 10381039], [2, sections 5.2, 5.5] and [3, p. 465]
Bayes not Bust! Why Simplicity is no Problem for Bayesians
, 2007
"... The advent of formal definitions of the simplicity of a theory has important implications for model selection. But what is the best way to define simplicity? Forster and Sober ([1994]) advocate the use of Akaike’s Information Criterion (AIC), a nonBayesian formalisation of the notion of simplicity. ..."
Abstract

Cited by 13 (10 self)
 Add to MetaCart
The advent of formal definitions of the simplicity of a theory has important implications for model selection. But what is the best way to define simplicity? Forster and Sober ([1994]) advocate the use of Akaike’s Information Criterion (AIC), a nonBayesian formalisation of the notion of simplicity. This forms an important part of their wider attack on Bayesianism in the philosophy of science. We defend a Bayesian alternative: the simplicity of a theory is to be characterised in terms of Wallace’s Minimum Message Length (MML). We show that AIC is inadequate for many statistical problems where MML performs well. Whereas MML is always defined, AIC can be undefined. Whereas MML is not known ever to be statistically inconsistent, AIC can be. Even when defined and consistent, AIC performs worse than MML on small sample sizes. MML is statistically invariant under 1to1 reparametrisation, thus avoiding a common criticism of Bayesian approaches. We also show that MML provides answers to many of Forster’s objections to Bayesianism. Hence an important part of the attack on
K.B.: Learning hybrid Bayesian networks by MML
 Proc. 19th Australian Joint Conf. on AI, LNAI (2006
"... Abstract. We use a Markov Chain Monte Carlo (MCMC) MML algorithm to learn hybrid Bayesian networks from observational data. Hybrid networks represent local structure, using conditional probability tables (CPT), logit models, decision trees or hybrid models, i.e., combinations of the three. We compar ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Abstract. We use a Markov Chain Monte Carlo (MCMC) MML algorithm to learn hybrid Bayesian networks from observational data. Hybrid networks represent local structure, using conditional probability tables (CPT), logit models, decision trees or hybrid models, i.e., combinations of the three. We compare this method with alternative local structure learning algorithms using the MDL and BDe metrics. Results are presented for both real and artificial data sets. Hybrid models compare favourably to other local structure learners, allowing simple representations given limited data combined with richer representations given massive data. 1
Minimum Message Length Shrinkage Estimation
"... This note considers estimation of the mean of a multivariate Gaussian distribution with known variance within the Minimum Message Length (MML) framework. Interestingly, the resulting MML estimator exactly coincides with the positivepart JamesStein estimator under the choice of an uninformative pri ..."
Abstract
 Add to MetaCart
This note considers estimation of the mean of a multivariate Gaussian distribution with known variance within the Minimum Message Length (MML) framework. Interestingly, the resulting MML estimator exactly coincides with the positivepart JamesStein estimator under the choice of an uninformative prior. A new approach for estimating parameters and hyperparameters in general hierarchical Bayes models is also presented.
Advance Access publication on June 18, 2008 doi:10.1093/comjnl/bxm117
"... One of the second generation of computer scientists, Chris Wallace completed his tertiary education in 1959 with a Ph.D. in nuclear physics, on cosmic ray showers, under Dr Paul George at Sydney University. Needless to say, computer science was not, at that stage, an established academic discipline. ..."
Abstract
 Add to MetaCart
One of the second generation of computer scientists, Chris Wallace completed his tertiary education in 1959 with a Ph.D. in nuclear physics, on cosmic ray showers, under Dr Paul George at Sydney University. Needless to say, computer science was not, at that stage, an established academic discipline. With Max Brennan 1 andJohnMaloshehaddesignedand built a large automatic data logging system for recording cosmic ray air shower events and with Max Brennan also developed a complex computer programme for Bayesian analysis of cosmic ray events on the recently installed SILLIAC computer. Appointed lecturer in Physics at Sydney in 1960 he was sent almost immediately to the University of Illinois to copy the design of ILLIAC II, a duplicate of which was to be built at Sydney. ILLIAC II was not in fact completed at that stage and, after an initial less than warm welcome by a department who seemed unsure exactly what this Australian was doing in their midst, his talents were recognized and he was invited to join their staff (under very generous conditions) to assist in ILLIAC II design 2. He remained there for two years helping in particular to design the input output channels and aspects of the advanced control unit (first stage pipeline). In the event, Sydney decided it would be too expensive to build a copy of ILLIAC II, although a successful copy (the Golem) was built in Israel using circuit designs developed by Wallace and Ken Smith. In spite of the considerable financial and academic inducements to remain in America, Wallace returned to Australia after three months spent in England familiarizing himself with the KDF9 computer being purchased by Sydney University to replace SILLIAC. Returning to the School of Physics he joined the Basser
A NEW MESSAGE LENGTH APPROXIMATION FOR PARAMETER ESTIMATION AND MODEL SELECTION
"... This paper examines Bayesian twopart coding schemes as tools for parameter estimation and model selection. The Wallace–Freeman message length approximation to strict minimum message length can be used to obtain twopart message lengths. However, this approximation relies on some strong assumptions ..."
Abstract
 Add to MetaCart
This paper examines Bayesian twopart coding schemes as tools for parameter estimation and model selection. The Wallace–Freeman message length approximation to strict minimum message length can be used to obtain twopart message lengths. However, this approximation relies on some strong assumptions regarding the likelihood function and prior distribution which do not hold for a large range of models. We present a new twopart message length formula that is more widely applicable than the popular Wallace–Freeman message length approximation, while remaining significantly easier to compute than the exact strict minimum message length procedure. 1. MML TWOPART CODES Consider the problem of choosing a plausible explanation for some observed data yn = (y1,..., yn) ′ ∈ Yn ⊆ Rn. The possible explanations are the distributions, or (fully specified) models, contained in a countable set of parametric model structures γ ∈ Γ. Let pγ(yn θ) denote the model1, in model structure γ, indexed by θ ∈ Θγ ⊆ Rk. The minimum encoding approach [1, 2] to inference suggests that the model that most compresses the data is the most plausible explanation. One way to compress the data is by twopart coding, in which the model and the data are compressed together as a twopart message. This idea is central to the minimum message length principle (MML). The MML principle is explicitly Bayesian in nature, so we further assume that a suitable prior distribution, πγ(θ), θ ∈ Θγ exists for all γ ∈ Γ. The first part of the message, or assertion, states which model, pγ(·θ), from the structure γ, is to be used to compress the data. The second part, or detail, states the data yn using the nominated model from γ. Let the length of these two terms be denoted by I(θ; γ) and I(yn θ; γ), respectively. Further, let I(γ) denote the length of a preamble code stating which structure from Γ is being used. Estimation of both a model structure, as well as the model parameters, may be simultaneously performed by solving