Results 1 - 10
of
11
Hidden Markov models in computational biology: applications to protein modeling
- JOURNAL OF MOLECULAR BIOLOGY
, 1994
"... Hidden.Markov Models (HMMs) are applied t.0 the problems of statistical modeling, database searching and multiple sequence alignment of protein families and protein domains. These methods are demonstrated the on globin family, the protein kinase catalytic domain, and the EF-hand calcium binding moti ..."
Abstract
-
Cited by 436 (29 self)
- Add to MetaCart
Hidden.Markov Models (HMMs) are applied t.0 the problems of statistical modeling, database searching and multiple sequence alignment of protein families and protein domains. These methods are demonstrated the on globin family, the protein kinase catalytic domain, and the EF-hand calcium binding motif. In each case the parameters of an HMM are estimated from a training set of unaligned sequences. After the HMM is built, it is used to obtain a multiple alignment of all the training sequences. It is also used to search the. SWISS-PROT 22 database for other sequences. that are members of the given protein family, or contain the given domain. The Hi " produces multiple alignments of good quality that agree closely with the alignments produced by programs that incorporate threedimensional structural information. When employed in discrimination tests (by examining how closely the sequences in a database fit the globin, kinase and EF-hand HMMs), the '\ HMM is able to distinguish members of these families from non-members with a high degree of accuracy. Both the HMM and PROFILESEARCH (a technique used to search for relationships between a protein sequence and multiply aligned sequences) perform better in these tests than PROSITE (a dictionary of sites and patterns in proteins). The HMM appecvs to have a slight advantage over PROFILESEARCH in terms of lower rates of false
Dirichlet Mixtures: A Method for Improving Detection of Weak but Significant Protein Sequence Homology
, 1996
"... This paper presents the mathematical foundations of Dirichlet mixtures, which have been used to improve database search results for homologous sequences, when a variable number of sequences from a protein family or domain are known. We present a method for condensing the information in a protein dat ..."
Abstract
-
Cited by 105 (20 self)
- Add to MetaCart
This paper presents the mathematical foundations of Dirichlet mixtures, which have been used to improve database search results for homologous sequences, when a variable number of sequences from a protein family or domain are known. We present a method for condensing the information in a protein database into a mixture of Dirichlet densities. These mixtures are designed to be combined with observed amino acid frequencies, to form estimates of expected amino acid probabilities at each position in a profile, hidden Markov model, or other statistical model. These estimates give a statistical model greater generalization capacity, such that remotely related family members can be more reliably recognized by the model. Dirichlet mixtures have been shown to outperform substitution matrices and other methods for computing these expected amino acid distributions in database search, resulting in fewer false positives and false negatives for the families tested. This paper corrects a previously p...
Hmmstr: a hidden markov model for local sequence-structure correlations in proteins
- Journal of Molecular Biology
, 2000
"... *Corresponding authors ..."
Using Dirichlet Mixture Priors to Derive Hidden Markov Models for Protein Families
- PROCEEDINGS OF THE THIRD INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS FOR MOLECULAR BIOLOGY
, 1993
"... A Bayesian method for estimating the amino acid distributions in the states of a hidden Markov model (HMM) for a protein family or the columns of a multiple alignment of that family is introduced. This method uses Dirichlet mixture densities as priors over amino acid distributions. These mixtu ..."
Abstract
-
Cited by 56 (6 self)
- Add to MetaCart
A Bayesian method for estimating the amino acid distributions in the states of a hidden Markov model (HMM) for a protein family or the columns of a multiple alignment of that family is introduced. This method uses Dirichlet mixture densities as priors over amino acid distributions. These mixture densities are determined from examination of previously constructed HMMs or multiple alignments. It is shown that this Bayesian method can improve the quality of HMMs produced from small training sets. Specific experiments on the EF-hand motif are reported, for which these priors are shown to produce HMMs with higher likelihood on unseen data, and fewer false positives and false negatives in a database search task.
Predicting protein structure using hidden Markov models
, 1997
"... We discuss how methods based on hidden Markov models performed in the fold recognition section of the CASP2 experiment. Hidden Markov models were built for a set of about a thousand structures from the PDB database, and each CASP2 target sequence was scored against this library of hidden Markov mode ..."
Abstract
-
Cited by 46 (18 self)
- Add to MetaCart
We discuss how methods based on hidden Markov models performed in the fold recognition section of the CASP2 experiment. Hidden Markov models were built for a set of about a thousand structures from the PDB database, and each CASP2 target sequence was scored against this library of hidden Markov models. In addition, a hidden Markov model was built for each of the target sequences, and all of the sequences in PDB were scored against that target model. Having high scores from both methods was found to be highly indicative of the target and a structure being homologous. Predictions were made based on several criteria: the scores with the structure models, the scores with the target models, consistency between the secondary structure in the known structure and predictions for the target (using the program PhD), human examination of predicted alignments between target and structure (using RASMOL), and solvation preferences in the alignment of the target and structure. The method worked well in comparison to other methods used at CASP2 for targets of moderate difficulty, where the closest structure in PDB could be aligned to the target with at least 15 % residue identity. There was no evidence for the method's e ectiveness for harder cases, where the residue identity was much lower than 15%.
Bayesian Segmentation of Protein Secondary Structure
- JOURNAL OF COMPUTATIONAL BIOLOGY
, 2000
"... We present a novel method for predicting the secondary structure of a protein from its amino acid sequence. Most existing methods predict each position in turn based on a local window of residues, sliding this window along the length of the sequence. In contrast, we develop a probabilistic model of ..."
Abstract
-
Cited by 32 (6 self)
- Add to MetaCart
We present a novel method for predicting the secondary structure of a protein from its amino acid sequence. Most existing methods predict each position in turn based on a local window of residues, sliding this window along the length of the sequence. In contrast, we develop a probabilistic model of protein sequence/structure relationships in terms of structural segments, and formulate secondary structure prediction as a general Bayesian inference problem. A distinctive feature of our approach is the ability to develop explicit probabilistic models for -helices, -strands, and other classes of secondary structure, incorporating experimentally and empirically observed aspects of protein structure such as helical capping signals, side chain correlations, and segment length distributions. Our model is Markovian in the segments, permitting ef# cient exact calculation of the posterior probability distribution over all possible segmentations of the sequence using dynamic programming. The optimal segmentation is computed and compared to a predictor based on marginal posterior modes, and the latter is shown to provide signi# cant improvement in predictive accuracy. The marginalization procedure provides exact secondary structure probabilities at each sequence position, which are shown to be reliable estimates of prediction uncertainty. We apply this model to a database of 452 nonhomologous structures, achieving accuracies as high as the best currently available methods. We conclude by discussing an extension of this framework to model nonlocal interactions in protein structures, providing a possible direction for future improvements in secondary structure prediction accuracy.
Grouping Web Page References into Transactions for Mining World Wide Web Browsing Patterns
- Dept. of Computer Science, Univ. of Minnesota
, 1997
"... Web-based organizations often generate and collect large volumes of data in their daily operations. Analyzing such data involves the discovery of meaningful relationships from a large collection of primarily unstructured data, often stored in Web server access logs. While traditional domains for dat ..."
Abstract
-
Cited by 27 (2 self)
- Add to MetaCart
Web-based organizations often generate and collect large volumes of data in their daily operations. Analyzing such data involves the discovery of meaningful relationships from a large collection of primarily unstructured data, often stored in Web server access logs. While traditional domains for data mining, such as point of sale databases, have naturally defined transactions, there is no convenient method of clustering web references into transactions. This paper identifies a model of user browsing behavior that separates web page references into those made for navigation purposes and those for information content purposes. A transaction identification method based on the browsing model is defined and successfully tested against other methods, such as the maximal forward reference algorithm proposed in [1]. Transactions identified by the proposed methods are used to discover association rules from real world data using the WEBMINER system [7]. 1 Introduction and Background As more or...
Parameterization studies for the SAM and HMMER methods of hidden Markov model generation
- In: ISMB-96
, 1996
"... Multiple sequence alignment of distantly related viral proteins remains a challenge to all currently available alignment methods. The hidden Markov model approach offers a new, flexible method for the generation of multiple sequence alignments. The results of studies attempting to infer appropriate ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
Multiple sequence alignment of distantly related viral proteins remains a challenge to all currently available alignment methods. The hidden Markov model approach offers a new, flexible method for the generation of multiple sequence alignments. The results of studies attempting to infer appropriate parameter constraints for the generation of de novo HMMs for globin, kinase, aspartic acid protease, and ribonuclease H sequences by both the SAM and HMMER methods are described.
Stochastic segment interaction models for biological sequence analysis
, 2004
"... We introduce a class of probability models for sequences of random variables with complex long-range dependency structure, called stochastic segment interaction models, motivated by problems arising in the analysis of biopolymer sequence data. We generalize and extend previous work in this area, and ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
We introduce a class of probability models for sequences of random variables with complex long-range dependency structure, called stochastic segment interaction models, motivated by problems arising in the analysis of biopolymer sequence data. We generalize and extend previous work in this area, and make explicit the relations to existing literature on hidden Markov models (HMMs) and “generalized ” HMMs. We show that this class of models allows for incorporation of non-local interaction information in biological sequence analysis. We demonstrate this approach by developing models for prediction of 3D contacts in protein sequences using models for amino acid dependencies in β-sheets. We provide algorithms for Bayesian inference on these models via dynamic programming and Markov chain Monte Carlo simulation. Results are presented from an application to protein structure prediction from sequence.
Journal Of Computational Biology
- Journal of Computational Biology
, 2001
"... We consider the problem of inferring fold changes in gene expression from cDNA microarray data. Standard procedures focus on the ratio of measured # uorescent intensities at each spot on the microarray, but to do so is to ignore the fact that the variation of such ratios is not constant. Estimates o ..."
Abstract
- Add to MetaCart
We consider the problem of inferring fold changes in gene expression from cDNA microarray data. Standard procedures focus on the ratio of measured # uorescent intensities at each spot on the microarray, but to do so is to ignore the fact that the variation of such ratios is not constant. Estimates of gene expression changes are derived within a simple hierarchical model that accounts for measurement error and # uctuations in absolute gene expression levels. Signi# cant gene expression changes are identi# ed by deriving the posterior odds of change within a similar model. The methods are tested via simulation and are applied to a panel of Escherichia coli microarrays.

