Results 1  10
of
65
Hidden Markov models in computational biology: applications to protein modeling
 JOURNAL OF MOLECULAR BIOLOGY
, 1994
"... Hidden.Markov Models (HMMs) are applied t.0 the problems of statistical modeling, database searching and multiple sequence alignment of protein families and protein domains. These methods are demonstrated the on globin family, the protein kinase catalytic domain, and the EFhand calcium binding moti ..."
Abstract

Cited by 521 (35 self)
 Add to MetaCart
Hidden.Markov Models (HMMs) are applied t.0 the problems of statistical modeling, database searching and multiple sequence alignment of protein families and protein domains. These methods are demonstrated the on globin family, the protein kinase catalytic domain, and the EFhand calcium binding motif. In each case the parameters of an HMM are estimated from a training set of unaligned sequences. After the HMM is built, it is used to obtain a multiple alignment of all the training sequences. It is also used to search the. SWISSPROT 22 database for other sequences. that are members of the given protein family, or contain the given domain. The Hi " produces multiple alignments of good quality that agree closely with the alignments produced by programs that incorporate threedimensional structural information. When employed in discrimination tests (by examining how closely the sequences in a database fit the globin, kinase and EFhand HMMs), the '\ HMM is able to distinguish members of these families from nonmembers with a high degree of accuracy. Both the HMM and PROFILESEARCH (a technique used to search for relationships between a protein sequence and multiply aligned sequences) perform better in these tests than PROSITE (a dictionary of sites and patterns in proteins). The HMM appecvs to have a slight advantage over PROFILESEARCH in terms of lower rates of false
Fitting a mixture model by expectation maximization to discover motifs in biopolymers
 Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology
, 1994
"... ABSTRACT: The algorithm described in this paper discovers one or more motifs in a collection of DNA or protein sequences by using the technique of expectation maximization to fit a twocomponent finite mixture model to the set of sequences. Multiple motifs are found by fitting a twocomponent finite ..."
Abstract

Cited by 520 (4 self)
 Add to MetaCart
ABSTRACT: The algorithm described in this paper discovers one or more motifs in a collection of DNA or protein sequences by using the technique of expectation maximization to fit a twocomponent finite mixture model to the set of sequences. Multiple motifs are found by fitting a twocomponent finite mixture model to the data, probabilistically erasing the occurrences of the motif thus found, and repeating the process to find successive motifs. The algorithm requires only a set of sequences and a number specifying the width of the motifs as input. It returns a model of each motif and a threshold which together can be used as a Bayesoptimal classifier for searching for occurrences of the motif in other databases. The algorithm estimates how many times each motif occurs in the input dataset and outputs an alignment of the occurrences of the motif. The algorithm is capable of discovering several different motifs with differing numbers of occurrences in a single dataset. Motifs are discovered by treating the set of sequences as though they were created by a stochastic process which can be modelled as a mixture of two densities, one of which generated the occurrences of the motif, and the other the rest of the positions in the sequences. Expectation maximization is used to estimate the parameters of the two densities and the mixing
Predicting the Semantic Orientation of Adjectives
, 1997
"... We identify and validate from a large corpus constraints from conjunctions on the positive or negative semantic orientation of the conjoined adjectives. A loglinear regression model uses these constraints to predict whether conjoined adjectives are of same or different orientations, achiev ..."
Abstract

Cited by 298 (5 self)
 Add to MetaCart
We identify and validate from a large corpus constraints from conjunctions on the positive or negative semantic orientation of the conjoined adjectives. A loglinear regression model uses these constraints to predict whether conjoined adjectives are of same or different orientations, achiev ing 82% accuracy in this task when each conjunction is considered independently.
Dirichlet Mixtures: A Method for Improving Detection of Weak but Significant Protein Sequence Homology
, 1996
"... This paper presents the mathematical foundations of Dirichlet mixtures, which have been used to improve database search results for homologous sequences, when a variable number of sequences from a protein family or domain are known. We present a method for condensing the information in a protein dat ..."
Abstract

Cited by 129 (22 self)
 Add to MetaCart
This paper presents the mathematical foundations of Dirichlet mixtures, which have been used to improve database search results for homologous sequences, when a variable number of sequences from a protein family or domain are known. We present a method for condensing the information in a protein database into a mixture of Dirichlet densities. These mixtures are designed to be combined with observed amino acid frequencies, to form estimates of expected amino acid probabilities at each position in a profile, hidden Markov model, or other statistical model. These estimates give a statistical model greater generalization capacity, such that remotely related family members can be more reliably recognized by the model. Dirichlet mixtures have been shown to outperform substitution matrices and other methods for computing these expected amino acid distributions in database search, resulting in fewer false positives and false negatives for the families tested. This paper corrects a previously p...
Using Dirichlet Mixture Priors to Derive Hidden Markov Models for Protein Families
 PROCEEDINGS OF THE THIRD INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS FOR MOLECULAR BIOLOGY
, 1993
"... A Bayesian method for estimating the amino acid distributions in the states of a hidden Markov model (HMM) for a protein family or the columns of a multiple alignment of that family is introduced. This method uses Dirichlet mixture densities as priors over amino acid distributions. These mixtu ..."
Abstract

Cited by 73 (6 self)
 Add to MetaCart
A Bayesian method for estimating the amino acid distributions in the states of a hidden Markov model (HMM) for a protein family or the columns of a multiple alignment of that family is introduced. This method uses Dirichlet mixture densities as priors over amino acid distributions. These mixture densities are determined from examination of previously constructed HMMs or multiple alignments. It is shown that this Bayesian method can improve the quality of HMMs produced from small training sets. Specific experiments on the EFhand motif are reported, for which these priors are shown to produce HMMs with higher likelihood on unseen data, and fewer false positives and false negatives in a database search task.
Predicting protein structure using hidden Markov models
, 1997
"... We discuss how methods based on hidden Markov models performed in the fold recognition section of the CASP2 experiment. Hidden Markov models were built for a set of about a thousand structures from the PDB database, and each CASP2 target sequence was scored against this library of hidden Markov mode ..."
Abstract

Cited by 58 (20 self)
 Add to MetaCart
We discuss how methods based on hidden Markov models performed in the fold recognition section of the CASP2 experiment. Hidden Markov models were built for a set of about a thousand structures from the PDB database, and each CASP2 target sequence was scored against this library of hidden Markov models. In addition, a hidden Markov model was built for each of the target sequences, and all of the sequences in PDB were scored against that target model. Having high scores from both methods was found to be highly indicative of the target and a structure being homologous. Predictions were made based on several criteria: the scores with the structure models, the scores with the target models, consistency between the secondary structure in the known structure and predictions for the target (using the program PhD), human examination of predicted alignments between target and structure (using RASMOL), and solvation preferences in the alignment of the target and structure. The method worked well in comparison to other methods used at CASP2 for targets of moderate difficulty, where the closest structure in PDB could be aligned to the target with at least 15 % residue identity. There was no evidence for the method's e ectiveness for harder cases, where the residue identity was much lower than 15%.
Predicting Protein Structure using only Sequence Information
 Proteins
, 1999
"... A prediction server using the SAMT98 method disThis paper presents results of blind predictions subcussed here is available on the WorldWide Web mitted to the CASP3 protein structure prediction experiment. We made predictions using the SAMT98 method, an iterative hidden Markov model based method ..."
Abstract

Cited by 55 (14 self)
 Add to MetaCart
A prediction server using the SAMT98 method disThis paper presents results of blind predictions subcussed here is available on the WorldWide Web mitted to the CASP3 protein structure prediction experiment. We made predictions using the SAMT98 method, an iterative hidden Markov model based method for constructing protein family profiles. The method is purely sequence based—using no structural information—and yet was able to predict structures as well as all but five of the structurebased methods in CASP3. 1
Bayesian Estimation of Dynamical Systems: An Application to fMRI
 NeuroImage
, 2002
"... This paper presents a method for estimating the conditional or posterior distribution of the parameters of deterministic dynamical systems. The procedure conforms to an EM implementation of a Gauss–Newton search for the maximum of the conditional or posterior density. The inclusion of priors in the ..."
Abstract

Cited by 50 (24 self)
 Add to MetaCart
This paper presents a method for estimating the conditional or posterior distribution of the parameters of deterministic dynamical systems. The procedure conforms to an EM implementation of a Gauss–Newton search for the maximum of the conditional or posterior density. The inclusion of priors in the estimation procedure ensures robust and rapid convergence and the resulting conditional densities enable Bayesian inference about the model parameters. The method is demonstrated using an input–state–output model of the hemodynamic coupling between experimentally designed causes or factors in fMRI studies and the ensuing BOLD response. This example represents a generalization of current fMRI analysis models that accommodates nonlinearities and in which the parameters have an explicit physical interpretation. Second, the approach extends classical inference, based on the likelihood of the data given a null hypothesis about the parameters, to more plausible inferences about the parameters of the model given the data. This inference provides for confidence intervals based on the
Learning Methods for Combining Linguistic Indicators to Classify Verbs
, 1997
"... Fourteen linguisticallymotivated numeri cal indicators are evaluated for their abil ity to categorize verbs as either states or events. The values for each indicator are computed automatically across a corpus of text. To improve classification performance, machine learning techniques are employed ..."
Abstract

Cited by 42 (3 self)
 Add to MetaCart
Fourteen linguisticallymotivated numeri cal indicators are evaluated for their abil ity to categorize verbs as either states or events. The values for each indicator are computed automatically across a corpus of text. To improve classification performance, machine learning techniques are employed to combine multiple indicators. Three machine learning methods are compared for this task: decision tree induction, a genetic algorithm, and loglinear regres sion.
ZRanking: Using Statistical Analysis to Counter the Impact of Static Analysis Approximations
 In Proceedings of 10th Annual International Static Analysis Symposium
, 2003
"... This paper explores zranking, a technique to rank error reports emitted by static program checking analysis tools. Such tools often use approximate analysis schemes, leading to false error reports. These reports can easily render the error checker useless by hiding real errors amidst the false, and ..."
Abstract

Cited by 38 (2 self)
 Add to MetaCart
This paper explores zranking, a technique to rank error reports emitted by static program checking analysis tools. Such tools often use approximate analysis schemes, leading to false error reports. These reports can easily render the error checker useless by hiding real errors amidst the false, and by potentially causing the tool to be discarded as irrelevant. Empirically, all tools that effectively find errors have false positive rates that can easily reach 30100%. Zranking employs a simple statistical model to rank those error messages most likely to be true errors over those that are least likely. This paper demonstrates that zranking applies to a range of program checking problems and that it performs up to an order of magnitude better than randomized ranking. Further, it has transformed previously unusable analysis tools into e#ective program error finders.