Results 1  10
of
18
Minimum Message Length and Kolmogorov Complexity
 Computer Journal
, 1999
"... this paper is to describe some of the relationships among the different streams and to try to clarify some of the important differences in their assumptions and development. Other studies mentioning the relationships appear in [1, Section IV, pp. 10381039], [2, sections 5.2, 5.5] and [3, p. 465] ..."
Abstract

Cited by 120 (27 self)
 Add to MetaCart
(Show Context)
this paper is to describe some of the relationships among the different streams and to try to clarify some of the important differences in their assumptions and development. Other studies mentioning the relationships appear in [1, Section IV, pp. 10381039], [2, sections 5.2, 5.5] and [3, p. 465]
MML clustering of multistate, Poisson, von Mises circular and Gaussian distributions
 Statistics Computing
, 2000
"... Minimum Message Length (MML) is an invariant Bayesian point estimation technique which is also statistically consistent and efficient. We provide a brief overview of MML inductive inference ..."
Abstract

Cited by 35 (10 self)
 Add to MetaCart
Minimum Message Length (MML) is an invariant Bayesian point estimation technique which is also statistically consistent and efficient. We provide a brief overview of MML inductive inference
A Simple Statistical Algorithm for Biological Sequence Compression
 DATA COMPRESSION CONFERENCE
, 2007
"... This paper introduces a novel algorithm for biological sequence compression that makes use of both statistical properties and repetition within sequences. A panel of experts is maintained to estimate the probability distribution of the next symbol in the sequence to be encoded. Expert probabilities ..."
Abstract

Cited by 21 (1 self)
 Add to MetaCart
This paper introduces a novel algorithm for biological sequence compression that makes use of both statistical properties and repetition within sequences. A panel of experts is maintained to estimate the probability distribution of the next symbol in the sequence to be encoded. Expert probabilities are combined to obtain the final distribution. The resulting information sequence provides insight for further study of the biological sequence. Each symbol is then encoded by arithmetic coding. Experiments show that our algorithm outperforms existing compressors on typical DNA and protein sequence datasets while maintaining a practical running time. 1.
Circular Clustering Of Protein Dihedral Angles By Minimum Message Length
 In Proceedings of the 1st Pacific Symposium on Biocomputing (PSB1
, 1996
"... this paper is given in [DADH95] and is available from ftp://www.cs.monash.edu.au/www/publications/1995/TR237.ps.Z.) Section 2introduces the MML principle and how it can be used for this circular clustering problem. The remaining sections give the results of the secondary structure groups [KaSa83] th ..."
Abstract

Cited by 15 (11 self)
 Add to MetaCart
this paper is given in [DADH95] and is available from ftp://www.cs.monash.edu.au/www/publications/1995/TR237.ps.Z.) Section 2introduces the MML principle and how it can be used for this circular clustering problem. The remaining sections give the results of the secondary structure groups [KaSa83] that resulted from applying Snob to cluster our dihedral angle data.
MML mixture modelling of multistate, Poisson, von Mises circular and Gaussian distributions
 In Proc. 6th Int. Workshop on Artif. Intelligence and Statistics
, 1997
"... Minimum Message Length (MML) is an invariant Bayesian point estimation technique which is also consistent and efficient. We provide a brief overview of MML inductive inference (Wallace and Boulton (1968), Wallace and Freeman (1987)), and how it has both an informationtheoretic and a Bayesian interp ..."
Abstract

Cited by 11 (5 self)
 Add to MetaCart
(Show Context)
Minimum Message Length (MML) is an invariant Bayesian point estimation technique which is also consistent and efficient. We provide a brief overview of MML inductive inference (Wallace and Boulton (1968), Wallace and Freeman (1987)), and how it has both an informationtheoretic and a Bayesian interpretation. We then outline how MML is used for statistical parameter estimation, and how the MML mixture modelling program, Snob (Wallace and Boulton (1968), Wallace (1986), Wallace and Dowe(1994)) uses the message lengths from various parameter estimates to enable it to combine parameter estimation with selection of the number of components. The message length is (to within a constant) the logarithm of the posterior probability of the theory. So, the MML theory can also be regarded as the theory with the highest posterior probability. Snob currently assumes that variables are uncorrelated, and permits multivariate data from Gaussian, discrete multistate, Poisson and von Mises circular dist...
Sequence Complexity for Biological Sequence Analysis
, 2000
"... A new statistical model for DNA considers a sequence to be a mixture of regions with little structure and regions that are approximate repeats of other subsequences, i.e. instances of repeats do not need to match each other exactly. Both forward and reversecomplementary repeats are allowed. The mo ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
A new statistical model for DNA considers a sequence to be a mixture of regions with little structure and regions that are approximate repeats of other subsequences, i.e. instances of repeats do not need to match each other exactly. Both forward and reversecomplementary repeats are allowed. The model has a small number of parameters which are fitted to the data. In general there are many explanations for a given sequence and how to compute the total probability of the data given the model is shown. Computer algorithms are described for these tasks. The model can be used to compute the information content of a sequence, either in total or base by base. This amounts to looking at sequences from a datacompression point of view and it is argued that this is a good way to tackle intelligent sequence analysis in general.
Intrinsic Classification by MML—the Snob Program
 Proc. Seventh Australian Joint Conf. Artificial Intelligence
, 1994
"... Abstract: We provide a brief overview ofMinimum Message Length (MML) inductive inference (Wallace and Boulton (1968), Wallace and Freeman (1987)). We then outline how MML is used for statistical parameter estimation, and how the MML intrinsic classification program, Snob (Wallace and Boulton (1968), ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
Abstract: We provide a brief overview ofMinimum Message Length (MML) inductive inference (Wallace and Boulton (1968), Wallace and Freeman (1987)). We then outline how MML is used for statistical parameter estimation, and how the MML intrinsic classification program, Snob (Wallace and Boulton (1968), Wallace (1986), Wallace (1990)) uses the message lengths from various parameter estimates to enable it to combine parameter estimation with model selection in intrinsic classification. We mention here the most recent extensions to Snob, permitting Poisson and von Mises circular distributions. We also survey some applications of Snob (albeit briefly), and further provide some documentation on how the user can guide Snob’s search through various models of the given data to try to obtain that model whose message length is a minimum.
Compression and Approximate Matching
 The Computer Journal
, 1999
"... A population of sequences is called nonrandom if there is a statistical model and an associated compression algorithm that allows members of the population to be compressed, on average. Any available statistical model of a population should be incorporated into algorithms for alignment of the seque ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
A population of sequences is called nonrandom if there is a statistical model and an associated compression algorithm that allows members of the population to be compressed, on average. Any available statistical model of a population should be incorporated into algorithms for alignment of the sequences and doing so changes the rank order of possible alignments in general. The model should also be used in deciding if a resulting approximate match between two sequences is significant or not. It is shown how to do this for two plausible interpretations involving pairs of sequences that might or might not be related. Efficient alignment algorithms are described for quite general statistical models of sequences. The new alignment algorithms are more sensitive to what might be termed 'features' of the sequences. A natural significance test is shown to be rarely fooled by apparent similarities between two sequences that are merely typical of all or most members of the population, even unrelated members. The Computer Journal, Volume 42, Issue 1, pp. 110, 1999. http://www.csse.monash.edu.au/~lloyd/tildeStrings/
MML, HYBRID BAYESIAN NETWORK GRAPHICAL MODELS, STATISTICAL CONSISTENCY, INVARIANCE AND UNIQUENESS
"... The problem of statistical — or inductive — inference pervades a large number of human activities and a large number of (human and nonhuman) actions requiring ‘intelligence’. Human and other ‘intelligent ’ activity often entails making inductive inferences, remembering and recording observations fr ..."
Abstract

Cited by 6 (5 self)
 Add to MetaCart
The problem of statistical — or inductive — inference pervades a large number of human activities and a large number of (human and nonhuman) actions requiring ‘intelligence’. Human and other ‘intelligent ’ activity often entails making inductive inferences, remembering and recording observations from which one can make
Compression of Strings with Approximate Repeats
 Intelligent Systems in Molecular Biology, ISMB ’98
, 1998
"... We describe a model for strings of characters that is loosely based on the Lempel Ziv model with the addition that a repeated substring can be an approximate match to the original substring; this is close to the situation of DNA, for example. Typically there are many explanations for a given string ..."
Abstract

Cited by 5 (3 self)
 Add to MetaCart
(Show Context)
We describe a model for strings of characters that is loosely based on the Lempel Ziv model with the addition that a repeated substring can be an approximate match to the original substring; this is close to the situation of DNA, for example. Typically there are many explanations for a given string under the model, some optimal and many suboptimal. Rather than commit to one optimal explanation, we sum the probabilities over all explanations under the model because this gives the probability of the data under the model. The model has a small number of parameters and these can be estimated from the given string by an expectationmaximization (EM) algorithm. Each iteration of the EM algorithm takes O(n2) time and a few iterations are typically sufficient. O(n2) complexity is impractical for strings of more than a few tens of thousands of characters and a faster approximation algorithm is also given. The model is further extended to include approximate reverse complementary repeats when analyzing DNA strings. Tests include the recovery of parameter estimates from known sources and applications to real DNA strings. http://www.csse.monash.edu.au/~lloyd/tildeStrings/Compress/1998ISMB.html