Results 1 - 10
of
231
Multi-class Protein Fold Recognition Using Support Vector Machines and Neural Networks
- Bioinformatics
, 2001
"... Motivation: Protein fold recognition is an important approach to structure discovery without relying on sequence similarity. We study this approach with new multi-class classication methods and examined many issues important for a practical recognition system. Results: Most current discriminative ..."
Abstract
-
Cited by 92 (5 self)
- Add to MetaCart
Motivation: Protein fold recognition is an important approach to structure discovery without relying on sequence similarity. We study this approach with new multi-class classication methods and examined many issues important for a practical recognition system. Results: Most current discriminative methods for protein fold prediction use the one-againstothers method, which has the well-known \False Positives" problem. We investigated two new methods: the unique one-against-others and the all-against-all methods. Both improve prediction accuracy by 14-110% on a dataset containing 27 SCOP folds. We used the Support Vector Machine and the Neural Network learning methods as base classiers. SVM converges fast and leads to high accuracy. When scores of multiple parameter datasets are combined, majority voting reduces noise and increases recognition accuracy. We examined many issues involved with large number of classes, including dependencies of prediction accuracy on the number of folds and on the number of representatives in a fold. Overall, recognition systems achieve 56% fold prediction accuracy on a protein test dataset, where most of the proteins have below 25% sequence identity with the proteins used in training. Contact: chqding@lbl.gov, ildubchak@lbl.gov Supplementary Information: The protein parameter datasets used in this paper is available online (http://www.nersc.gov/ cding/protein). Keywords: protein fold recognition, protein structure, multi-class classication, support vection machines, neural networks. To whom correspondence should be addressed. 1
Review: Protein Secondary Structure Prediction Continues to Rise
- J. Struct. Biol
, 2001
"... f prediction accuracy? We shall see. 2001 Academic Press INTRODUCTION History. Linus Pauling correctly guessed the formation of helices and strands (14, 15) (and falsely hypothesized other structures). Three years before Pauling's guess was verified by the publications of the first X-ray structure ..."
Abstract
-
Cited by 91 (13 self)
- Add to MetaCart
f prediction accuracy? We shall see. 2001 Academic Press INTRODUCTION History. Linus Pauling correctly guessed the formation of helices and strands (14, 15) (and falsely hypothesized other structures). Three years before Pauling's guess was verified by the publications of the first X-ray structures (16, 17), one group had already ventured to predict secondary structure from sequence (18). The first-generation prediction methods following in the 1960s and 1970s were all based on single amino acid propensities (19). The second-generation methods dominating the scene until the early 1990s used propensities for segments of 3--51 adjacent residues (19). Basically any imaginable theoretical algorithm had been applied to the problem of predicting secondary structure from sequence. However, it seemed that prediction accuracy stalled at levels slightly above 60% (percentage of residues predicted correctly in one of the three states: helix, strand, and other). The reason for this limit was the
Exploiting the Past and the Future in Protein Secondary Structure Prediction
, 1999
"... Motivation: Predicting the secondary structure of a protein (alpha-helix, beta-sheet, coil) is an important step towards elucidating its three dimensional structure, as well as its function. Presently, the best predictors are based on machine learning approaches, in particular neural network archite ..."
Abstract
-
Cited by 91 (19 self)
- Add to MetaCart
Motivation: Predicting the secondary structure of a protein (alpha-helix, beta-sheet, coil) is an important step towards elucidating its three dimensional structure, as well as its function. Presently, the best predictors are based on machine learning approaches, in particular neural network architectures with a fixed, and relatively short, input window of amino acids, centered at the prediction site. Although a fixed small window avoids overfitting problems, it does not permit to capture variable long-ranged information. Results: We introduce a family of novel architectures which can learn to make predictions based on variable ranges of dependencies. These architectures extend recurrent neural networks, introducing non-causal bidirectional dynamics to capture both upstream and downstream information. The prediction algorithm is completed by the use of mixtures of estimators that leverage evolutionary information, expressed in terms of multiple alignments, both at the input and output levels. While our system currently achieves an overall performance close to 76% correct prediction---at least comparable to the best existing systems---the main emphasis here is on the development of new algorithmic ideas. Availability: The executable program for predicting protein secondary structure is available from the authors free of charge. Contact: pfbaldi@ics.uci.edu, gpollast@ics.uci.edu, brunak@cbs.dtu.dk, paolo@dsi.unifi.it. 1
Prediction of local structure in proteins using a library of sequence-structure motifs
- J. MOL. BIOL
, 1998
"... ..."
Disk-covering, a fast-converging method for phylogenetic tree reconstruction
- JOURNAL OF COMPUTATIONAL BIOLOGY
, 1999
"... The evolutionary history of a set of species is represented by a phylogenetic tree, which is a rooted, leaf-labeled tree, where internal nodes represent ancestral species and the leaves represent modern day species. Accurate (or even boundedly inaccurate) topology reconstructions of large and diverg ..."
Abstract
-
Cited by 65 (6 self)
- Add to MetaCart
The evolutionary history of a set of species is represented by a phylogenetic tree, which is a rooted, leaf-labeled tree, where internal nodes represent ancestral species and the leaves represent modern day species. Accurate (or even boundedly inaccurate) topology reconstructions of large and divergent trees from realistic length sequences have long been considered one of the major challenges in systematic biology. In this paper, we present a simple method, the Disk-Covering Method (DCM), which boosts the performance of base phylogenetic methods under various Markov models of evolution. We analyze the performance of DCM-boosted distance methods under the Jukes–Cantor Markov model of biomolecular sequence evolution, and prove that for almost all trees, polylogarithmic length sequences suffice for complete accuracy with high probability, while polynomial length sequences always suffice. We also provide an experimental study based upon simulating sequence evolution on model trees. This study confirms substantial reductions in error rates at realistic sequence lengths.
Improving Prediction of Protein Secondary Structure using Structured Neural Networks and Multiple Sequence Alignments
- J. Comput. Biol
, 1996
"... The prediction of protein secondary structure by use of carefully structured neural networks and multiple sequence alignments has been investigated. Separate networks are used for predicting the three secondary structures ff-helix, fi-strand and coil. The networks are designed using a priori knowled ..."
Abstract
-
Cited by 53 (4 self)
- Add to MetaCart
The prediction of protein secondary structure by use of carefully structured neural networks and multiple sequence alignments has been investigated. Separate networks are used for predicting the three secondary structures ff-helix, fi-strand and coil. The networks are designed using a priori knowledge of amino acid properties with respect to the secondary structure and of the characteristic periodicity in ff-helices. Since these single-structure networks all have less than 600 adjustable weights over-fitting is avoided. To obtain a three-state prediction of ff-helix, fi-strand or coil, ensembles of single-structure networks are combined with another neural network. This method gives an overall prediction accuracy of 66.3% when using seven-fold cross-validation on a database of 126 non-homologous globular proteins. Applying the method to multiple sequence alignments of homologous proteins increases the prediction accuracy significantly to 71.3% with corresponding Matthews' correlation c...
Hybrid Fold Recognition: Combining Sequence Derived Properties with Evolutionary Information.
- Pac. Symp. Biocomput
, 2000
"... Introduction Protein fold recognition aims to assign each new amino acid sequence to the known three-dimensional fold which it most closely resembles. The assignment is carried out by searching a library of known structures for a compatible fold. Fold-recognition methods have demonstrated their cap ..."
Abstract
-
Cited by 45 (5 self)
- Add to MetaCart
Introduction Protein fold recognition aims to assign each new amino acid sequence to the known three-dimensional fold which it most closely resembles. The assignment is carried out by searching a library of known structures for a compatible fold. Fold-recognition methods have demonstrated their capabilities in computeraided assessment experiments such as CASP 1 as well as in fully automated assessment experiments such as CAFASP-1 2 . In the former, fold-recognition programs coupled with human intervention were able to correctly predict the folds of proteins of (then) unknown structure. In the latter, the performance of the methods was not as good, but still it was superior to sequence-comparison methods such as PSI-BLAST 3 . CAFASP-1 demonstrated that no single approach was markedly superior to the others evaluated when considered across the entire range of targets. In some cases, exploiting evolutionary information from ne
representative are the known structures of the proteins in a complete genome? A comprehensive structural census. Fold Des 3
, 1998
"... Manuscript is 43 Pages in Length (including this one) ..."
Abstract
-
Cited by 44 (24 self)
- Add to MetaCart
Manuscript is 43 Pages in Length (including this one)
Topology Prediction for Helical Transmembrane Proteins at 86% Accuracy
- Protein Sci
, 1996
"... Previously, we introduced a neural network system predicting locations of transmembrane helices based on evolutionary profiles (PHDhtm, (Rost et al., 1995). Here, we describe an improvement and an extension of that system. The improvement is achieved by a dynamic programming-like algorithm that opt ..."
Abstract
-
Cited by 43 (11 self)
- Add to MetaCart
Previously, we introduced a neural network system predicting locations of transmembrane helices based on evolutionary profiles (PHDhtm, (Rost et al., 1995). Here, we describe an improvement and an extension of that system. The improvement is achieved by a dynamic programming-like algorithm that optimises helices compatible with the neural network output. The extension is the prediction of topology (orientation of first loop region with respect to membrane) by applying to the refined prediction the observation that positively charged residues are more abundant in extra-cytoplasmic regions. Furthermore, we introduce a method to reduce the number of false positives, i.e., proteins falsely predicted with membrane helices. The evaluation of prediction accuracy is based on a cross-validation and a double-blind test set (in total 131 proteins). The final method appears to be more accurate than other methods published. (1) For almost 89% (3%) of the test proteins all transmembrane helices are predicted correctly. (2) For more than 86% (3%) of the proteins topology is predicted correctly. (3) We define reliability indices which correlate with prediction accuracy: for one half of the proteins segment accuracy raises to 98%; and for two-thirds accuracy of topology prediction is 95%. (4) The rate of proteins for which transmembrane helices are predicted falsely is below 2% (1%). Finally, the method is applied to 1616 sequences of Haemophilus influenzae. We predict 19% of the genome sequences to contain one or more transmembrane helices. This appears to be lower than what we predicted previously for the yeast VIII chromosome (about 25%).

