Results 1 - 10
of
12
MML clustering of multi-state, Poisson, von Mises circular and Gaussian distributions
- Statistics Computing
, 2000
"... Minimum Message Length (MML) is an invariant Bayesian point estimation technique which is also statistically consistent and efficient. We provide a brief overview of MML inductive inference ..."
Abstract
-
Cited by 29 (8 self)
- Add to MetaCart
Minimum Message Length (MML) is an invariant Bayesian point estimation technique which is also statistically consistent and efficient. We provide a brief overview of MML inductive inference
Hidden markov models that use predicted local structure for fold recognition: alphabets of backbone geometry
- Proteins
, 2003
"... An important problem in computational biology is predicting the structure of the large number of pu-tative proteins discovered by genome sequencing projects. Fold-recognition methods attempt to solve the problem by relating the target proteins to known structures, searching for template proteins hom ..."
Abstract
-
Cited by 24 (10 self)
- Add to MetaCart
An important problem in computational biology is predicting the structure of the large number of pu-tative proteins discovered by genome sequencing projects. Fold-recognition methods attempt to solve the problem by relating the target proteins to known structures, searching for template proteins homologous to the target. Remote homologs which may have significant structural similarity are often not detectable by sequence similarities alone. To address this, we incorporated predicted local structure, a generalization of secondary structure, into two-track profile HMMs. We did not rely on a simple helix-strand-coil definition of secondary structure,
An MML Classification of Protein Structure that knows about Angles and Sequence
"... this paper we apply a Hidden Markov Model to model the structure of a collection of known proteins. This Markov classi#cation is able to take advantage of information implicit in the order of a sequence of observations and hence is better suited to modelling protein data than a classi#cation model t ..."
Abstract
-
Cited by 9 (4 self)
- Add to MetaCart
this paper we apply a Hidden Markov Model to model the structure of a collection of known proteins. This Markov classi#cation is able to take advantage of information implicit in the order of a sequence of observations and hence is better suited to modelling protein data than a classi#cation model that assumes independence between observations. We use an Minimum Message Length #MML# information measure to evaluate our protein structure model which enables us to #nd the model best supported by the known evidence
MML mixture modelling of multi-state, Poisson, von Mises circular and Gaussian distributions
- In Proc. 6th Int. Workshop on Artif. Intelligence and Statistics
, 1997
"... Minimum Message Length (MML) is an invariant Bayesian point estimation technique which is also consistent and efficient. We provide a brief overview of MML inductive inference (Wallace and Boulton (1968), Wallace and Freeman (1987)), and how it has both an information-theoretic and a Bayesian interp ..."
Abstract
-
Cited by 8 (5 self)
- Add to MetaCart
Minimum Message Length (MML) is an invariant Bayesian point estimation technique which is also consistent and efficient. We provide a brief overview of MML inductive inference (Wallace and Boulton (1968), Wallace and Freeman (1987)), and how it has both an information-theoretic and a Bayesian interpretation. We then outline how MML is used for statistical parameter estimation, and how the MML mixture modelling program, Snob (Wallace and Boulton (1968), Wallace (1986), Wallace and Dowe(1994)) uses the message lengths from various parameter estimates to enable it to combine parameter estimation with selection of the number of components. The message length is (to within a constant) the logarithm of the posterior probability of the theory. So, the MML theory can also be regarded as the theory with the highest posterior probability. Snob currently assumes that variables are uncorrelated, and permits multi-variate data from Gaussian, discrete multi-state, Poisson and von Mises circular dist...
Analysis of Three-Dimensional Protein Images
- Journal of Arti Intelligence research
, 1997
"... A fundamental goal of research in molecular biology is to understand protein structure. Protein crystallography is currently the most successful method for determining the threedimensional (3D) conformation of a protein, yet it remains labor intensive and relies on an expert's ability to derive and ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
A fundamental goal of research in molecular biology is to understand protein structure. Protein crystallography is currently the most successful method for determining the threedimensional (3D) conformation of a protein, yet it remains labor intensive and relies on an expert's ability to derive and evaluate a protein scene model. In this paper, the problem of protein structure determination is formulated as an exercise in scene analysis. A computational methodology is presented in which a 3D image of a protein is segmented into a graph of critical points. Bayesian and certainty factor approaches are described and used to analyze critical point graphs and identify meaningful substructures, such as ff-helices and fi-sheets. Results of applying the methodologies to protein images at low and medium resolution are reported. The research is related to approaches to representation, segmentation and classification in vision, as well as to top-down approaches to protein structure prediction. 1...
CIRCULAR CLUSTERING BY MINIMUM MESSAGE LENGTH OF PROTEIN DIHEDRAL ANGLES
, 1995
"... Early work on proteins identified the existence of helices and extended sheets in protein secondary structures, a high-level classification which remains popular today. Using the Snob program for information-theoretic Minimum Message Length (MML) intrinsic classification, we are able to take the pro ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
Early work on proteins identified the existence of helices and extended sheets in protein secondary structures, a high-level classification which remains popular today. Using the Snob program for information-theoretic Minimum Message Length (MML) intrinsic classification, we are able to take the protein dihedral angles as determined by X-ray crystallography, and cluster sets of dihedral angles into groups. Previous work by Hunter and States had applied a similar Bayesian classification method, AutoClass, to protein data with site position represented by 3 Cartesian co-ordinates for each of the α-Carbon, β-Carbon and Nitrogen, totalling 9 co-ordinates. By using the von Mises circular distribution in the Snob program rather than the Normal distribution in the Hunter and States model, we are instead able to represent local site properties by the two dihedral angles, φ and ψ. Since each site can be modelled as having 2 degrees of freedom, this orientation-invariant dihedral angle representation of the data is more compact than that of nine highly-correlated Cartesian co-ordinates. Using the information-theoretic message length concepts discussed in the paper, such a more concise model is more likely to represent the underlying generating process from which the data comes. We report on the results of our classification, plotting the classes in (φ,ψ)-space and introducing a symmetric information-theoretic distance measure to build a minimum spanning tree between the classes. We also give a transition matrix between the classes and note the existence of three classes in the region φ ≈−1. 09 rad and ψ ≈−0. 75 rad which are close on the spanning tree and have high inter-transition probabilities. These properties give rise to a tight, abundant, self-perpetuating, α-helical structure.
MML, HYBRID BAYESIAN NETWORK GRAPHICAL MODELS, STATISTICAL CONSISTENCY, INVARIANCE AND UNIQUENESS
"... The problem of statistical — or inductive — inference pervades a large number of human activities and a large number of (human and non-human) actions requiring ‘intelligence’. Human and other ‘intelligent ’ activity often entails making inductive inferences, remembering and recording observations fr ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
The problem of statistical — or inductive — inference pervades a large number of human activities and a large number of (human and non-human) actions requiring ‘intelligence’. Human and other ‘intelligent ’ activity often entails making inductive inferences, remembering and recording observations from which one can make
An MML Classi cation of Protein Structure that knows about Angles and Sequence
- Proceedings of the 3rd Paci c Symposium on Biocomputing Fisher N.I
, 1998
"... The MML classi cation program, Snob, deals with mixture modelling (or clustering) of circular data. It has recently been extended to do Markov modelling of the serial correlation between clusters such as modelling the fact that a Helix cluster favours being followed by another Helix cluster. Such a ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
The MML classi cation program, Snob, deals with mixture modelling (or clustering) of circular data. It has recently been extended to do Markov modelling of the serial correlation between clusters such as modelling the fact that a Helix cluster favours being followed by another Helix cluster. Such a model is better known as a Hidden Markov Model. The search for the most appropriate secondary structure classi cation of protein data is of signi cant importance and was addressed by Hunter and States (1992) using the Bayesian classi er, AutoClass, on Cartesian co-ordinate data of protein residues. Dowe et al. (1996) improved upon this earlier work by using Snob to cluster dihedral angle data, with the advantage that 3x3=9 Cartesian co-ordinates can be represented by the 2 orientation-invariant angles, and. The Hidden Markov Model used here is shown to be a more appropriate way again of modelling protein data and results in the selection of a simpler class model with 17 structure classes. We report on this classi cation, including the class transition matrix, and relate it back to the amino-acid sequence and the simple Helix, Beta, Turn classi cation. We nd 3 types of Helix, 2 types of Beta and many types of Turn. The most numerous Turn class de nes a continuous exible structure that is negatively correlated to all the other classes. 1

