Results 1  10
of
168
Hidden Markov models in computational biology: applications to protein modeling
 JOURNAL OF MOLECULAR BIOLOGY
, 1994
"... Hidden.Markov Models (HMMs) are applied t.0 the problems of statistical modeling, database searching and multiple sequence alignment of protein families and protein domains. These methods are demonstrated the on globin family, the protein kinase catalytic domain, and the EFhand calcium binding moti ..."
Abstract

Cited by 525 (35 self)
 Add to MetaCart
Hidden.Markov Models (HMMs) are applied t.0 the problems of statistical modeling, database searching and multiple sequence alignment of protein families and protein domains. These methods are demonstrated the on globin family, the protein kinase catalytic domain, and the EFhand calcium binding motif. In each case the parameters of an HMM are estimated from a training set of unaligned sequences. After the HMM is built, it is used to obtain a multiple alignment of all the training sequences. It is also used to search the. SWISSPROT 22 database for other sequences. that are members of the given protein family, or contain the given domain. The Hi " produces multiple alignments of good quality that agree closely with the alignments produced by programs that incorporate threedimensional structural information. When employed in discrimination tests (by examining how closely the sequences in a database fit the globin, kinase and EFhand HMMs), the '\ HMM is able to distinguish members of these families from nonmembers with a high degree of accuracy. Both the HMM and PROFILESEARCH (a technique used to search for relationships between a protein sequence and multiply aligned sequences) perform better in these tests than PROSITE (a dictionary of sites and patterns in proteins). The HMM appecvs to have a slight advantage over PROFILESEARCH in terms of lower rates of false
Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions
 J. MOL. BIOL
, 1997
"... We explore the ability of a simple simulated annealing procedure to assemble nativelike structures from fragments of unrelated protein structures with similar local sequences using Bayesian scoring functions. Environment and residue pair specific contributions to the scoring functions appear as the ..."
Abstract

Cited by 252 (65 self)
 Add to MetaCart
We explore the ability of a simple simulated annealing procedure to assemble nativelike structures from fragments of unrelated protein structures with similar local sequences using Bayesian scoring functions. Environment and residue pair specific contributions to the scoring functions appear as the first two terms in a series expansion for the residue probability distributions in the protein database; the decoupling of the distance and environment dependencies of the distributions resolves the major problems with current databasederived scoring functions noted by Thomas and Dill. The simulated annealing procedure rapidly and frequently generates nativelike structures for small helical proteins and better than random structures for small b sheet containing proteins. Most of the simulated structures have nativelike solvent accessibility and secondary structure patterns, and thus ensembles of these structures provide a particularly challenging set of decoys for evaluating scoring functions. We investigate the effects of multiple sequence information and different types of conformational constraints on the overall performance of the method, and the ability of a variety of recently developed scoring functions to recognize the nativelike conformations in the ensembles of simulated structures.
DT: GenTHREADER: an efficient and reliable protein fold recognition method for genomic sequences
 J Mol Biol
, 1999
"... A new protein fold recognition method is described which is both fast and reliable. The method uses a traditional sequence alignment algorithm to generate alignments which are then evaluated by a method derived from threading techniques. As a ®nal step, each threaded model is evaluated by a neural n ..."
Abstract

Cited by 152 (10 self)
 Add to MetaCart
A new protein fold recognition method is described which is both fast and reliable. The method uses a traditional sequence alignment algorithm to generate alignments which are then evaluated by a method derived from threading techniques. As a ®nal step, each threaded model is evaluated by a neural network in order to produce a single measure of con®dence in the proposed prediction. The speed of the method, along with its sensitivity and very low falsepositive rate makes it ideal for automatically predicting the structure of all the proteins in a translated bacterial genome (proteome). The method has been applied to the genome of Mycoplasma genitalium, and analysis of the results shows that as many as 46 % of the proteins derived from the predicted protein coding regions have a signi®cant relationship to a protein of known structure. In some cases, however, only one domain of the protein can be predicted, giving a total coverage of 30 % when calculated as a fraction of the number of amino acid residues in the whole proteome. # 1999 Academic Press
Multiple alignment using hidden Markov models
 Proc. Int. Conf. Intell. Syst. Mol. Biol
, 1995
"... eddy~genetics.wustl.edu A simulated annealing method is described for training hidden Markov models and producing multiple sequence alignments from initially unaligned protein or DNA sequences. Simulated annealing in turn uses a dynamic programming algorithm for correctly sampling suboptimal multipl ..."
Abstract

Cited by 142 (0 self)
 Add to MetaCart
eddy~genetics.wustl.edu A simulated annealing method is described for training hidden Markov models and producing multiple sequence alignments from initially unaligned protein or DNA sequences. Simulated annealing in turn uses a dynamic programming algorithm for correctly sampling suboptimal multiple alignments according to their probability and a Boltzmann temperature factor. The quality of simulated annealing alignments is evaluated on structural alignments of ten different protein families, and compared to the performance of other HMM training methods and the ClnstalW program. Simulated annealing is better able to find nearglobal optima in the multiple alignment probability landscape than the other tested HMM training methods. Neither ClustalW nor simulated annealing produce consistently better alignments compared to each other. Examination of the specific cases in which ClustalW outperforms simulated annealing, and vice versa, provides insight into the strengths and weaknesses of current hidden Markov model approaches.
Polynomial Splines and Their Tensor Products in Extended Linear Modeling
 Ann. Statist
, 1997
"... ANOVA type models are considered for a regression function or for the logarithm of a probability function, conditional probability function, density function, conditional density function, hazard function, conditional hazard function, or spectral density function. Polynomial splines are used to m ..."
Abstract

Cited by 142 (14 self)
 Add to MetaCart
ANOVA type models are considered for a regression function or for the logarithm of a probability function, conditional probability function, density function, conditional density function, hazard function, conditional hazard function, or spectral density function. Polynomial splines are used to model the main effects, and their tensor products are used to model any interaction components that are included. In the special context of survival analysis, the baseline hazard function is modeled and nonproportionality is allowed. In general, the theory involves the L 2 rate of convergence for the fitted model and its components. The methodology involves least squares and maximum likelihood estimation, stepwise addition of basis functions using Rao statistics, stepwise deletion using Wald statistics, and model selection using BIC, crossvalidation or an independent test set. Publically available software, written in C and interfaced to S/SPLUS, is used to apply this methodology to...
Dirichlet Mixtures: A Method for Improving Detection of Weak but Significant Protein Sequence Homology
, 1996
"... This paper presents the mathematical foundations of Dirichlet mixtures, which have been used to improve database search results for homologous sequences, when a variable number of sequences from a protein family or domain are known. We present a method for condensing the information in a protein dat ..."
Abstract

Cited by 129 (22 self)
 Add to MetaCart
This paper presents the mathematical foundations of Dirichlet mixtures, which have been used to improve database search results for homologous sequences, when a variable number of sequences from a protein family or domain are known. We present a method for condensing the information in a protein database into a mixture of Dirichlet densities. These mixtures are designed to be combined with observed amino acid frequencies, to form estimates of expected amino acid probabilities at each position in a profile, hidden Markov model, or other statistical model. These estimates give a statistical model greater generalization capacity, such that remotely related family members can be more reliably recognized by the model. Dirichlet mixtures have been shown to outperform substitution matrices and other methods for computing these expected amino acid distributions in database search, resulting in fewer false positives and false negatives for the families tested. This paper corrects a previously p...
Residueresidue potentials with a favorable contact pair term and an unfavorable high packing density term, for simulation and threading
, 1996
"... Attractive interresidue contact energies for proteins have been reevaluated with the same assumptions and approximations used originally by us in 1985, but with a significantly larger set of protein crystal structures. An additional repulsive packing energy term, operative at higher densities to p ..."
Abstract

Cited by 120 (6 self)
 Add to MetaCart
Attractive interresidue contact energies for proteins have been reevaluated with the same assumptions and approximations used originally by us in 1985, but with a significantly larger set of protein crystal structures. An additional repulsive packing energy term, operative at higher densities to prevent overpacking, has also been estimated for all 20 amino acids as a function of the number of contacting residues, based on their observed distributions. The two terms of opposite sign are intended to be used together to provide an estimate of the overall energies of interresidue interactions in simplified proteins without atomic details. To overcome the problem of how to utilize the many homologous proteins in the Protein Data Bank, a new scheme has been devised to assign different weights to each protein, based on similarities among amino acid sequences. A total of 1168 protein structures containing 1661 subunit sequences are actually used here. After the sequence weights have been applied, these correspond to an effective number of residue–residue contacts of 113,914, or about six
An allatom distancedependent conditional probability discriminatory function for protein structure prediction
 J. Mol. Biol
, 1998
"... Any algorithm that attempts to predict protein structure requires a discriminatory function that can distinguish between correct and incorrect conformations. These discriminatory functions can be ..."
Abstract

Cited by 109 (21 self)
 Add to MetaCart
Any algorithm that attempts to predict protein structure requires a discriminatory function that can distinguish between correct and incorrect conformations. These discriminatory functions can be
Energy Functions that Discriminate Xray and Nearnative Folds from Wellconstructed Decoys
, 1996
"... this paper is concerned, have been derived in several ways. Levitt (1976) generated potentials of mean force by averaging energies over all relative orientations of pairs of sidechains. More recently these kinds of energy functions have been derived as potentials of mean force from the evergrowing ..."
Abstract

Cited by 106 (8 self)
 Add to MetaCart
this paper is concerned, have been derived in several ways. Levitt (1976) generated potentials of mean force by averaging energies over all relative orientations of pairs of sidechains. More recently these kinds of energy functions have been derived as potentials of mean force from the evergrowing database of known protein structures (see the references in Sippl, 1995). Huang et al. (1995) have devised a potential which does not explicitly use the database of known structures; they use only a simple classification of different residues as hydrophobic or hydrophilic, reminiscent of the theoretical energy models of Dill et al. (reviewed by Dill et al., 1995; Yue & Dill, 1995). Maiorov & Crippen (1992) generated a potential function by an optimization procedure which sought to maximize the difference in energy between correct and incorrect protein conformations.
Protein Fold Recognition by Predictionbased Threading
 J. MOL. BIOL
, 1997
"... ... (including sequence information). For the 22% rst hits detected at highest scores, the expected accuracy rose to 75%. However, the task of detecting entire folds rather than homologous fragments was managed much better; 45 to 75% of the rst hits correctly recognised the fold. ..."
Abstract

Cited by 77 (9 self)
 Add to MetaCart
... (including sequence information). For the 22% rst hits detected at highest scores, the expected accuracy rose to 75%. However, the task of detecting entire folds rather than homologous fragments was managed much better; 45 to 75% of the rst hits correctly recognised the fold.