Results 1 - 10
of
15
A Discriminative Framework for Detecting Remote Protein Homologies
, 1999
"... A new method for detecting remote protein homologies is introduced and shown to perform well in classifying protein domains by SCOP superfamily. The method is a variant of support vector machines using a new kernel function. The kernel function is derived from a generative statistical model for a ..."
Abstract
-
Cited by 163 (4 self)
- Add to MetaCart
A new method for detecting remote protein homologies is introduced and shown to perform well in classifying protein domains by SCOP superfamily. The method is a variant of support vector machines using a new kernel function. The kernel function is derived from a generative statistical model for a protein family, in this case a hidden Markov model. This general approach of combining generative models like HMMs with discriminative methods such as support vector machines may have applications in other areas of biosequence analysis as well.
Using the Fisher kernel method to detect remote protein homologies
- In Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology
, 1999
"... A new method, called the Fisher kernel method, for detecting remote protein homologies is introduced and shown to perform well in classifying protein domains by SCOP superfamily. The method is a variant of support vector machines using a new kernel function. The kernel function is derived from a hid ..."
Abstract
-
Cited by 125 (3 self)
- Add to MetaCart
A new method, called the Fisher kernel method, for detecting remote protein homologies is introduced and shown to perform well in classifying protein domains by SCOP superfamily. The method is a variant of support vector machines using a new kernel function. The kernel function is derived from a hidden Markov model. The general approach of combining generative models like HMMs with discriminative methods such as support vector machines may have applications in other areas of biosequence analysis as well.
Multi-class Protein Fold Recognition Using Support Vector Machines and Neural Networks
- Bioinformatics
, 2001
"... Motivation: Protein fold recognition is an important approach to structure discovery without relying on sequence similarity. We study this approach with new multi-class classication methods and examined many issues important for a practical recognition system. Results: Most current discriminative ..."
Abstract
-
Cited by 92 (5 self)
- Add to MetaCart
Motivation: Protein fold recognition is an important approach to structure discovery without relying on sequence similarity. We study this approach with new multi-class classication methods and examined many issues important for a practical recognition system. Results: Most current discriminative methods for protein fold prediction use the one-againstothers method, which has the well-known \False Positives" problem. We investigated two new methods: the unique one-against-others and the all-against-all methods. Both improve prediction accuracy by 14-110% on a dataset containing 27 SCOP folds. We used the Support Vector Machine and the Neural Network learning methods as base classiers. SVM converges fast and leads to high accuracy. When scores of multiple parameter datasets are combined, majority voting reduces noise and increases recognition accuracy. We examined many issues involved with large number of classes, including dependencies of prediction accuracy on the number of folds and on the number of representatives in a fold. Overall, recognition systems achieve 56% fold prediction accuracy on a protein test dataset, where most of the proteins have below 25% sequence identity with the proteins used in training. Contact: chqding@lbl.gov, ildubchak@lbl.gov Supplementary Information: The protein parameter datasets used in this paper is available online (http://www.nersc.gov/ cding/protein). Keywords: protein fold recognition, protein structure, multi-class classication, support vection machines, neural networks. To whom correspondence should be addressed. 1
representative are the known structures of the proteins in a complete genome? A comprehensive structural census. Fold Des 3
, 1998
"... Manuscript is 43 Pages in Length (including this one) ..."
Abstract
-
Cited by 44 (24 self)
- Add to MetaCart
Manuscript is 43 Pages in Length (including this one)
Patterns of Protein-Fold Usage in Eight Microbial Genomes: A Comprehensive Structural Census
- Proteins
, 1998
"... Eight microbial genomes are compared in terms of protein structure. Specifically, yeast, H. influenzae, M. genitalium, M. jannaschii, Synechocystis, M. pneumoniae, H. pylori,andE. coli are compared in terms of patterns of fold usage---whether a given fold occurs in a particular organism. Of the ,340 ..."
Abstract
-
Cited by 38 (27 self)
- Add to MetaCart
Eight microbial genomes are compared in terms of protein structure. Specifically, yeast, H. influenzae, M. genitalium, M. jannaschii, Synechocystis, M. pneumoniae, H. pylori,andE. coli are compared in terms of patterns of fold usage---whether a given fold occurs in a particular organism. Of the ,340 soluble protein folds currently in the structure databank (PDB), 240 occur in at least one of the eight genomes, and 30 are shared amongst all eight. The shared folds are depleted in allhelical structure and enriched in mixed helixsheet structure compared to the folds in the PDB. The top-10 most common of the shared 30 are enriched in superfolds, uniting many nonhomologous sequence families, and are especially similar in overall architecture---eight having helices packed onto a central sheet. They are also very different from the common folds in the PBD, highlighting databank biases. Folds can be ranked in terms of expression as well as genome duplication. In yeast the top-10 most highly ex...
Nonlinear Models Using Dirichlet Process Mixtures
"... We introduce a new nonlinear model for classification, in which we model the joint distribution of response variable, y, and covariates, x, non-parametrically using Dirichlet process mixtures. We keep the relationship between y and x linear within each component of the mixture. The overall relations ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
We introduce a new nonlinear model for classification, in which we model the joint distribution of response variable, y, and covariates, x, non-parametrically using Dirichlet process mixtures. We keep the relationship between y and x linear within each component of the mixture. The overall relationship becomes nonlinear if the mixture contains more than one component, with different regression coefficients. We use simulated data to compare the performance of this new approach to alternative methods such as multinomial logit (MNL) models, decision trees, and support vector machines. We also evaluate our approach on two classification problems: identifying the folding class of protein sequences and detecting Parkinson’s disease. Our model can sometimes improve predictive accuracy. Moreover, by grouping observations into sub-populations (i.e., mixture components), our model can sometimes provide insight into hidden structure in the data.
A multilevel approach to scop fold recognition
- In: Proceedings of the IEEE Symposium on Bioinformatics and Bioengineering. (2005
, 2005
"... The classification of proteins based on their structure can play an important role in the deduction or discovery of protein function. However, the relatively low number of solved protein structures and the unknown relationship between structure and sequence requires an alternative method of represen ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
The classification of proteins based on their structure can play an important role in the deduction or discovery of protein function. However, the relatively low number of solved protein structures and the unknown relationship between structure and sequence requires an alternative method of representation for classification to be effective. Furthermore, the large number of potential folds causes problems for many classification strategies, increasing the likelihood that the classifier will reach a local optima while trying to distinguish between all of the possible structural categories. Here we present a hierarchical strategy for structural classification that first partitions proteins based on their SCOP class before attempting to assign a protein fold. Using a well-known dataset derived from the 27 most-populated SCOP folds and several sequence-based descriptor properties as input features, we test a number of classification methods, including Naïve Bayes and Boosted C4.5. Our strategy achieves an average fold recognition of 74%, which is significantly higher than the 56-60 % previously reported in the literature, indicating the effectiveness of a multi-level approach. 1
A Feature-Based Approach to Discrimination and Prediction of Protein Folding Groups
"... This paper belongs to the second direction. ..."
AN ADAPTIVE STRATEGY FOR THE CLASSIFICATION OF G-PROTEIN COUPLED RECEPTORS
"... Abstract: One of the major problems in computational biology is the inability of existing classification models to incorporate expanding and new domain knowledge. This problem of static classification models is addressed in this paper by the introduction of incremental learning for problems in bioin ..."
Abstract
- Add to MetaCart
Abstract: One of the major problems in computational biology is the inability of existing classification models to incorporate expanding and new domain knowledge. This problem of static classification models is addressed in this paper by the introduction of incremental learning for problems in bioinformatics. Many machine learning tools have been applied to this problem using static machine learning structures such as neural networks or support vector machines that are unable to accommodate new information into their existing models. We utilize the fuzzy ARTMAP as an alternate machine learning system that has the ability of incrementally learning new data as it becomes available. The fuzzy ARTMAP is found to be comparable to many of the widespread machine learning systems. The use of an evolutionary strategy in the selection and combination of individual classifiers into an ensemble system, coupled with the incremental learning ability of the fuzzy ARTMAP is proven to be suitable as a pattern classifier. The algorithm presented is tested using data from the G-Coupled Protein Receptors Database and shows good accuracy of 83%. The system presented is also generally applicable, and can be used in problems in genomics and proteomics.
functional families
, 2006
"... Efficacy of different protein descriptors in predicting protein ..."

