Results 1 - 10
of
68
Multi-class Protein Fold Recognition Using Support Vector Machines and Neural Networks
- Bioinformatics
, 2001
"... Motivation: Protein fold recognition is an important approach to structure discovery without relying on sequence similarity. We study this approach with new multi-class classication methods and examined many issues important for a practical recognition system. Results: Most current discriminative ..."
Abstract
-
Cited by 92 (5 self)
- Add to MetaCart
Motivation: Protein fold recognition is an important approach to structure discovery without relying on sequence similarity. We study this approach with new multi-class classication methods and examined many issues important for a practical recognition system. Results: Most current discriminative methods for protein fold prediction use the one-againstothers method, which has the well-known \False Positives" problem. We investigated two new methods: the unique one-against-others and the all-against-all methods. Both improve prediction accuracy by 14-110% on a dataset containing 27 SCOP folds. We used the Support Vector Machine and the Neural Network learning methods as base classiers. SVM converges fast and leads to high accuracy. When scores of multiple parameter datasets are combined, majority voting reduces noise and increases recognition accuracy. We examined many issues involved with large number of classes, including dependencies of prediction accuracy on the number of folds and on the number of representatives in a fold. Overall, recognition systems achieve 56% fold prediction accuracy on a protein test dataset, where most of the proteins have below 25% sequence identity with the proteins used in training. Contact: chqding@lbl.gov, ildubchak@lbl.gov Supplementary Information: The protein parameter datasets used in this paper is available online (http://www.nersc.gov/ cding/protein). Keywords: protein fold recognition, protein structure, multi-class classication, support vection machines, neural networks. To whom correspondence should be addressed. 1
Exploiting the Past and the Future in Protein Secondary Structure Prediction
, 1999
"... Motivation: Predicting the secondary structure of a protein (alpha-helix, beta-sheet, coil) is an important step towards elucidating its three dimensional structure, as well as its function. Presently, the best predictors are based on machine learning approaches, in particular neural network archite ..."
Abstract
-
Cited by 91 (19 self)
- Add to MetaCart
Motivation: Predicting the secondary structure of a protein (alpha-helix, beta-sheet, coil) is an important step towards elucidating its three dimensional structure, as well as its function. Presently, the best predictors are based on machine learning approaches, in particular neural network architectures with a fixed, and relatively short, input window of amino acids, centered at the prediction site. Although a fixed small window avoids overfitting problems, it does not permit to capture variable long-ranged information. Results: We introduce a family of novel architectures which can learn to make predictions based on variable ranges of dependencies. These architectures extend recurrent neural networks, introducing non-causal bidirectional dynamics to capture both upstream and downstream information. The prediction algorithm is completed by the use of mixtures of estimators that leverage evolutionary information, expressed in terms of multiple alignments, both at the input and output levels. While our system currently achieves an overall performance close to 76% correct prediction---at least comparable to the best existing systems---the main emphasis here is on the development of new algorithmic ideas. Availability: The executable program for predicting protein secondary structure is available from the authors free of charge. Contact: pfbaldi@ics.uci.edu, gpollast@ics.uci.edu, brunak@cbs.dtu.dk, paolo@dsi.unifi.it. 1
Improving the Prediction of Protein Secondary Structure in Three and Eight Classes Using Recurrent Neural Networks and Profiles
, 2001
"... Secondarystructurepredictions areincreasinglybecomingtheworkhorseforseveralmethodsaimingatpredictingproteinstructure andfunction.Hereweuseensemblesofbidirectionalrecurrentneuralnetworkarchitectures, PSIBLAST -derivedprofiles,andalargenonredundant trainingsettoderivetwonewpredictors:(a)the secondvers ..."
Abstract
-
Cited by 86 (21 self)
- Add to MetaCart
Secondarystructurepredictions areincreasinglybecomingtheworkhorseforseveralmethodsaimingatpredictingproteinstructure andfunction.Hereweuseensemblesofbidirectionalrecurrentneuralnetworkarchitectures, PSIBLAST -derivedprofiles,andalargenonredundant trainingsettoderivetwonewpredictors:(a)the secondversionoftheSSproprogramforsecondary structureclassificationintothreecategoriesand(b) thefirstversionoftheSSpro8programforsecondarystructureclassificationintotheeightclasses producedbytheDSSPprogram.Wedescribethe resultsofthreedifferenttestsetsonwhichSSpro achievedasustainedperformanceofabout78% correctprediction.Wereportconfusionmatrices, comparePSI-BLASTtoBLAST-derivedprofiles,and assessthecorrespondingperformanceimprovements. SSproandSSpro8areimplementedasweb servers,availabletogetherwithotherstructural featurepredictorsat:http://promoter.ics.uci.edu/ BRNN-PRED/.Proteins2002;47:228--235.
Predicting Protein-Protein Interactions From Primary Structure
, 2001
"... Motivation: An ambitious goal of proteomics is to elucidate the structure, interactions and functions of all proteins within cells and organisms. The expectation is that this will provide a fuller appreciation of cellular processes and networks at the protein level, ultimately leading to a better un ..."
Abstract
-
Cited by 76 (2 self)
- Add to MetaCart
Motivation: An ambitious goal of proteomics is to elucidate the structure, interactions and functions of all proteins within cells and organisms. The expectation is that this will provide a fuller appreciation of cellular processes and networks at the protein level, ultimately leading to a better understanding of disease mechanisms and suggesting new means for intervention. This paper addresses the question: can protein--protein interactions be predicted directly from primary structure and associated data? Using a diverse database of known protein interactions, a Support Vector Machine (SVM) learning system was trained to recognize and predict interactions based solely on primary structure and associated physicochemical properties. Results: Inductive accuracy of the trained system, defined here as the percentage of correct protein interaction predictions for previously unseen test sets, averaged 80% for the ensemble of statistical experiments. Future proteomics studies may benefit from this research by proceeding directly from the automated identification of a cell's gene products to prediction of protein interaction pairs. Contact: dgough@bioeng.ucsd.edu
An Iterated loop matching approach to the prediction of RNA secondary structures with pseudoknots
, 2004
"... Motivation: Pseudoknots have generally been excluded from the prediction of RNA secondary structures due to its difficulty in modeling. Although, several dynamic programming algorithms exist for the prediction of pseudoknots using thermodynamic approaches, they are neither reliable nor efficient. On ..."
Abstract
-
Cited by 39 (2 self)
- Add to MetaCart
Motivation: Pseudoknots have generally been excluded from the prediction of RNA secondary structures due to its difficulty in modeling. Although, several dynamic programming algorithms exist for the prediction of pseudoknots using thermodynamic approaches, they are neither reliable nor efficient. On the other hand, comparative methods are more reliable, but are often done in an ad hoc manner and require expert intervention. Maximum weighted matching, an algorithm for pseudoknot prediction with comparative analysis, suffers from low-prediction accuracy in many cases.
Prediction of Coordination Number and Relative Solvent Accessibility in Proteins
, 2001
"... Knowingthecoordinationnumber andrelativesolventaccessibilityofalltheresidues inaproteiniscrucialforderivingconstraintsuseful inmodelingproteinfoldingandproteinstructure andinscoringremotehomologysearches.Wedevelopensemblesofbidirectionalrecurrentneural networkarchitecturestoimprovethestateofthe arti ..."
Abstract
-
Cited by 29 (10 self)
- Add to MetaCart
Knowingthecoordinationnumber andrelativesolventaccessibilityofalltheresidues inaproteiniscrucialforderivingconstraintsuseful inmodelingproteinfoldingandproteinstructure andinscoringremotehomologysearches.Wedevelopensemblesofbidirectionalrecurrentneural networkarchitecturestoimprovethestateofthe artinbothcontactandaccessibilityprediction, leveragingalargecorpusofcurateddatatogether withevolutionaryinformation.Theensemblesare usedtodiscriminatebetweentwodifferentstatesof residuecontactsorrelativesolventaccessibility, higherorlowerthanathresholddeterminedbythe averagevalueoftheresiduedistributionorthe accessibilitycutoff.Forcoordinationnumbers,the ensembleachievesperformancesrangingwithin 70.6--73.9%dependingontheradiusadoptedtodiscriminatecontacts (6--12).Theseperformances representgainsof16--20%overthebaselinestatisticalpredictor, alwaysassigninganaminoacidtothe largestclass,andare4--7%betterthananyprevious method.Acombinationofdifferentradiuspredictorsfurtherimprovesperformance. Foraccessibilitythresholdsintherelevant15 --30%range,the ensembleconsistentlyachievesaperformanceabove 77%,whichis10--16%abovethebaselineprediction andbetterthanotherexistingpredictors,byupto severalpercentagepoints.Forbothproblems,we quantifytheimprovementduetoevolutionaryinformationintheformofPSI -BLAST-generatedprofiles overBLASTprofiles.Thepredictionprogramsare implementedintheformoftwowebservers,CONproandACCpro, availableathttp://promoter.ics. uci.edu/BRNN-PRED/.Proteins2002;47:142--153.
New methods for splice site recognition
, 2002
"... Splice sites are locations in DNA which separate protein-coding regions (exons) from noncoding regions (introns). Accurate splice site detectors thus form important components of computational gene finders. We pose splice site recognition as a classification problem with the classifier learnt from ..."
Abstract
-
Cited by 19 (4 self)
- Add to MetaCart
Splice sites are locations in DNA which separate protein-coding regions (exons) from noncoding regions (introns). Accurate splice site detectors thus form important components of computational gene finders. We pose splice site recognition as a classification problem with the classifier learnt from a labeled data set consisting of only local information around the potential splice site. Note that finding the correct position of splice sites without using global information is a rather hard task. We analyze the genomes of the nematode Caenorhabditis elegans and of humans using specially designed support vector kernels. One of the kernels is adapted from our previous work on detecting translation initiation sites in vertebrates and another uses an extension to the well-known Fisher-kernel. We find excellent performance on both data sets.
Bidirectional dynamics for protein secondary structure prediction
- Sequence Learning: Paradigms, Algorithms, and Applications
, 2000
"... For certain categories of sequences, information from both the past and the future can be used for analysis and predictions at time t. This is the case for biological sequences where the nature and function of a region in a sequence may strongly depend on events located both upstream and downstream. ..."
Abstract
-
Cited by 18 (5 self)
- Add to MetaCart
For certain categories of sequences, information from both the past and the future can be used for analysis and predictions at time t. This is the case for biological sequences where the nature and function of a region in a sequence may strongly depend on events located both upstream and downstream. We develop a new family of adaptive graphical model architectures for learning non-causal sequence translations. These architectures employ two chains of hidden variables that propagate information from the past and from the future, respectively. This general idea can be instantiated either as a stochastic model (generalizing input output hidden Markov models), or as a neural network (generalizing recurrent neural networks). We illustrate the methodology by applying bidirectional models to the problem of protein secondary structure prediction. 1
Network exploration via the adaptive LASSO and SCAD penalties
- Ann. Appl. Stat
, 2009
"... Graphical models are frequently used to explore networks, such as genetic networks, among a set of variables. This is usually carried out via exploring the sparsity of the precision matrix of the variables under consideration. Penalized likelihood methods are often used in such explorations. Yet, po ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
Graphical models are frequently used to explore networks, such as genetic networks, among a set of variables. This is usually carried out via exploring the sparsity of the precision matrix of the variables under consideration. Penalized likelihood methods are often used in such explorations. Yet, positive-definiteness constraints of precision matrices make the optimization problem challenging. We introduce nonconcave penalties and the adaptive LASSO penalty to attenuate the bias problem in the network estimation. Through the local linear approximation to the nonconcave penalty functions, the problem of precision matrix estimation is recast as a sequence of penalized likelihood problems with a weighted L1 penalty and solved using the efficient algorithm of Friedman et al. [Biostatistics 9 (2008) 432–441]. Our estimation schemes are applied to two real datasets. Simulation experiments and asymptotic theory are used to justify our proposed methods. 1. Introduction. Network
Protein function classification via support vector machine approach
- MATHEMATICAL BIOSCIENCES
, 2003
"... ..."

