Results 1 - 10
of
45
Hmmstr: a hidden markov model for local sequence-structure correlations in proteins
- Journal of Molecular Biology
, 2000
"... *Corresponding authors ..."
Profile-based string kernels for remote homology detection and motif extraction
- Journal of Bioinformatics and Computational Biology
, 2004
"... We introduce novel profile-based string kernels for use with support vector machines (SVMs) for the problems of protein classification and remote homology detection. These kernels use probabilistic profiles, such as those produced by the PSI-BLAST algorithm, to define position-dependent mutation nei ..."
Abstract
-
Cited by 57 (7 self)
- Add to MetaCart
We introduce novel profile-based string kernels for use with support vector machines (SVMs) for the problems of protein classification and remote homology detection. These kernels use probabilistic profiles, such as those produced by the PSI-BLAST algorithm, to define position-dependent mutation neighborhoods along protein sequences for inexact matching of k-length subsequences (“k-mers”) in the data. By use of an efficient data structure, the kernels are fast to compute once the profiles have been obtained. For example, the time needed to run PSI-BLAST in order to build the profiles is significantly longer than both the kernel computation time and the SVM training time. We present remote homology detection experiments based on the SCOP database where we show that profile-based string kernels used with
Improved recognition of native-like protein structures using a combination of sequence-dependent and sequence-independent features of proteins
- Proteins
, 1999
"... ABSTRACT We describe the development of a scoring function based on the decomposition P(structure0sequence) � P(sequence0structure) *P(structure), which outperforms previous scoring functions in correctly identifying native-like protein structures in large ensembles of compact decoys. The first ter ..."
Abstract
-
Cited by 47 (18 self)
- Add to MetaCart
ABSTRACT We describe the development of a scoring function based on the decomposition P(structure0sequence) � P(sequence0structure) *P(structure), which outperforms previous scoring functions in correctly identifying native-like protein structures in large ensembles of compact decoys. The first term captures sequence-dependent features of protein structures, such as the burial of hydrophobic residues in the core, the second term, universal sequence-independent features, such as the assembly of �-strands into �-sheets. The efficacies of a wide variety of sequence-dependent and sequence-independent features of protein structures for recognizing native-like structures were systematically evaluated using ensembles ofD30,000 compact conformations with fixed secondary structure for each of 17 small protein domains. The best results were obtained using a core scoring function with P(sequence0structure) parameterized similarly to our previous work (Simons et al., J Mol Biol 1997;268:209–225] and P(structure) focused on secondary structure packing preferences; while several additional features had some discriminatory power on their own, they did not provide any additional discriminatory power when combined with the core scoring function. Our results, on both the training set and the independent decoy set of Park and Levitt (J Mol Biol 1996;258:367–392), suggest that this scoring function should contribute to the prediction of tertiary structure from knowledge of sequence and secondary structure. Proteins 1999;34:82–95. � 1999 Wiley-Liss, Inc. Key words: protein folding; structure prediction; knowledge-based scoring functions; fold recognition
The emergence of pattern discovery techniques in computational biology
- Metabolic Engineering
, 2000
"... In the past few years, pattern discovery has been emerging as a generic tool of choice for tackling problems from the computational biology domain. In this presentation, and after defining the problem in its generality, we review some of the algorithms that have appeared in the literature and descri ..."
Abstract
-
Cited by 28 (4 self)
- Add to MetaCart
In the past few years, pattern discovery has been emerging as a generic tool of choice for tackling problems from the computational biology domain. In this presentation, and after defining the problem in its generality, we review some of the algorithms that have appeared in the literature and describe several applications of pattern discovery to problems from computational biology. 2000 Academic Press 1.
Hidden markov models that use predicted local structure for fold recognition: alphabets of backbone geometry
- Proteins
, 2003
"... An important problem in computational biology is predicting the structure of the large number of pu-tative proteins discovered by genome sequencing projects. Fold-recognition methods attempt to solve the problem by relating the target proteins to known structures, searching for template proteins hom ..."
Abstract
-
Cited by 24 (10 self)
- Add to MetaCart
An important problem in computational biology is predicting the structure of the large number of pu-tative proteins discovered by genome sequencing projects. Fold-recognition methods attempt to solve the problem by relating the target proteins to known structures, searching for template proteins homologous to the target. Remote homologs which may have significant structural similarity are often not detectable by sequence similarities alone. To address this, we incorporated predicted local structure, a generalization of secondary structure, into two-track profile HMMs. We did not rely on a simple helix-strand-coil definition of secondary structure,
Predicting interresidue contacts using templates and pathways
- Proteins
, 2003
"... ABSTRACT We present a novel method, HMMSTR-CM, for protein contact map predictions. Contact potentials were calculated by using HMMSTR, a hidden Markov model for local sequence structure correlations. Targets were aligned against protein templates using a Bayesian method, and contact maps were gener ..."
Abstract
-
Cited by 23 (5 self)
- Add to MetaCart
ABSTRACT We present a novel method, HMMSTR-CM, for protein contact map predictions. Contact potentials were calculated by using HMMSTR, a hidden Markov model for local sequence structure correlations. Targets were aligned against protein templates using a Bayesian method, and contact maps were generated by using these alignments. Contact potentials then were used to evaluate these templates. An ab initio method based on the target contact potentials using a rule-based strategy to model the protein-folding pathway was developed. Fold recognition and ab initio methods were combined to produce accurate, protein-like contact maps. Pathways sometimes led to an unambiguous prediction of topology, even without using templates. The results on CASP5 targets are discussed. Also included is a brief update on the quality of fully automated ab initio predictions using the I-sites server. Proteins 2003;53:497–502.
Bayesian probabilistic approach for predicting backbone structures in terms of protein blocks
- Proteins
, 2000
"... ABSTRACT By using an unsupervised cluster analyzer, we have identified a local structural alphabet composed of 16 folding patterns of five consecutive C � (“protein blocks”). The dependence that exists between successive blocks is explicitly taken into account. A Bayesian approach based on the relat ..."
Abstract
-
Cited by 23 (5 self)
- Add to MetaCart
ABSTRACT By using an unsupervised cluster analyzer, we have identified a local structural alphabet composed of 16 folding patterns of five consecutive C � (“protein blocks”). The dependence that exists between successive blocks is explicitly taken into account. A Bayesian approach based on the relation protein block-amino acid propensity is used for prediction and leads to a success rate close to 35%. Sharing sequence windows associated with certain blocks into “sequence families ” improves the prediction accuracy by 6%. This prediction accuracy exceeds 75 % when keeping the first four predicted protein blocks at each site of the protein. In addition, two different strategies are proposed: the first one defines the number of protein blocks in each site needed for respecting a user-fixed prediction accuracy, and alternatively, the second one defines the different protein sites to be predicted with a user-fixed number of blocks and a chosen accuracy. This last strategy applied to the ubiquitin conjugating enzyme (�/ � protein) shows that 91 % of the sites may be predicted with a prediction accuracy larger than 77 % considering only three blocks per site. The prediction strategies proposed improve our knowledge about sequence-structure dependence and should be very useful in ab initio protein modelling. Proteins 2000;41:271–287. © 2000 Wiley-Liss, Inc. Key words: protein backbone structure; unsupervised classifier; structure-sequence relationships; structure prediction; protein block; Bayesian approach; prediction strategies
A hidden Markov model derived structural alphabet for proteins
- J Mol Biol
, 2004
"... Understanding and predicting protein structures depend on the complexity and the accuracy of the models used to represent them. We have recently set up a Hidden Markov Model to optimally compress protein three-dimensional conformations into a one-dimensional series of letters of a structural alphabe ..."
Abstract
-
Cited by 19 (3 self)
- Add to MetaCart
Understanding and predicting protein structures depend on the complexity and the accuracy of the models used to represent them. We have recently set up a Hidden Markov Model to optimally compress protein three-dimensional conformations into a one-dimensional series of letters of a structural alphabet. Such a model learns simultaneously the shape of representative structural letters describing the local conformation and the logic of their connections, i.e. the transition matrix between the letters. Here, we move one step further and report some evidence that such a model of protein local architecture also captures some accurate amino acid features. All the letters have specific and distinct amino acid distributions. Moreover, we show that words of amino acids can have significant propensities for some letters. Perspectives point towards the prediction of the series of letters describing the structure of a protein from its amino acid sequence. D 2005 Elsevier B.V. All rights reserved.
Efficient Remote Homology Detection Using Local Structure
- BIOINFORMATICS
, 2003
"... Motivation: The function of an unknown biological sequence can often be accurately inferred if we are able to map this unknown sequence to its corresponding homologous family. At present, discriminative methods such as SVM-Fisher and SVM-pairwise, which combine support vector machine and sequence si ..."
Abstract
-
Cited by 18 (2 self)
- Add to MetaCart
Motivation: The function of an unknown biological sequence can often be accurately inferred if we are able to map this unknown sequence to its corresponding homologous family. At present, discriminative methods such as SVM-Fisher and SVM-pairwise, which combine support vector machine and sequence similarity, are recognized as the most accurate methods, with SVM-pairwise being the most accurate. However, these methods typically encode sequence information into their feature vectors and ignore the structure information. They are also computationally inefficient. Based on these observations, we present an alternative method for SVM-based protein classification. Our proposed method, SVM-I-sites, utilizes structure similarity for remote homology detection. Result:
Mining residue contacts in proteins using local structure predictions
- In IEEE Int. Symposium on Bioinformatics and Biomedical Engineering
, 2000
"... In this paper we develop data mining techniques to predict 3D contact potentials among protein residues (or amino acids) based on the hierarchical nucleationpropagation model of protein folding. We apply a hybrid approach, using a Hidden Markov Model to extract folding initiation sites, and then app ..."
Abstract
-
Cited by 17 (7 self)
- Add to MetaCart
In this paper we develop data mining techniques to predict 3D contact potentials among protein residues (or amino acids) based on the hierarchical nucleationpropagation model of protein folding. We apply a hybrid approach, using a Hidden Markov Model to extract folding initiation sites, and then apply association mining to discover contact potentials. The new hybrid approach achieves accuracy results better than those reported previously. 1

