Results 1 - 10
of
16
Multi-class Protein Fold Recognition Using Support Vector Machines and Neural Networks
- Bioinformatics
, 2001
"... Motivation: Protein fold recognition is an important approach to structure discovery without relying on sequence similarity. We study this approach with new multi-class classication methods and examined many issues important for a practical recognition system. Results: Most current discriminative ..."
Abstract
-
Cited by 92 (5 self)
- Add to MetaCart
Motivation: Protein fold recognition is an important approach to structure discovery without relying on sequence similarity. We study this approach with new multi-class classication methods and examined many issues important for a practical recognition system. Results: Most current discriminative methods for protein fold prediction use the one-againstothers method, which has the well-known \False Positives" problem. We investigated two new methods: the unique one-against-others and the all-against-all methods. Both improve prediction accuracy by 14-110% on a dataset containing 27 SCOP folds. We used the Support Vector Machine and the Neural Network learning methods as base classiers. SVM converges fast and leads to high accuracy. When scores of multiple parameter datasets are combined, majority voting reduces noise and increases recognition accuracy. We examined many issues involved with large number of classes, including dependencies of prediction accuracy on the number of folds and on the number of representatives in a fold. Overall, recognition systems achieve 56% fold prediction accuracy on a protein test dataset, where most of the proteins have below 25% sequence identity with the proteins used in training. Contact: chqding@lbl.gov, ildubchak@lbl.gov Supplementary Information: The protein parameter datasets used in this paper is available online (http://www.nersc.gov/ cding/protein). Keywords: protein fold recognition, protein structure, multi-class classication, support vection machines, neural networks. To whom correspondence should be addressed. 1
Scratch: a protein structure and structural feature prediction server
- Nucleic Acids Res
, 2005
"... server ..."
Automated discovery of structural signatures of protein fold and function
- Journal of Molecular Biology
, 2001
"... Within the collection of determined protein structures, there is a wealth of principles governing the complex sequence-conformation-function relationships. Historically, many of these principles ..."
Abstract
-
Cited by 17 (5 self)
- Add to MetaCart
Within the collection of determined protein structures, there is a wealth of principles governing the complex sequence-conformation-function relationships. Historically, many of these principles
Rapid determination of protein folds using residual dipolar couplings
, 2000
"... Over the next few years, various genome projects will sequence many new genes and yield many new gene products. Many of these products will have no known function and little, if any, sequence homology to existing proteins. There is reason to believe that a rapid determination of a protein fold, even ..."
Abstract
-
Cited by 14 (2 self)
- Add to MetaCart
Over the next few years, various genome projects will sequence many new genes and yield many new gene products. Many of these products will have no known function and little, if any, sequence homology to existing proteins. There is reason to believe that a rapid determination of a protein fold, even at low resolution, can aid in the identification of function and expedite the determination of structure at higher resolution. Recently devised NMR methods of measuring residual dipolar couplings provide one route to the determination of a fold. They do this by allowing the alignment of previously identified secondary structural elements with respect to each other. When combined with constraints involving loops connecting elements or other short-range experimental distance information, a fold is produced. We illustrate this approach to protein fold determination on 15 N-labeled Eschericia coli acyl carrier protein using a limited set of 15 N- 1 H and 1 H- 1 H dipolar couplings. We also illustrate an approach using a more extended set of heteronuclear couplings on a related protein,
A domain combination based probabilistic framework for protein-protein interaction prediction, Genome Informatics
, 2003
"... In this paper, we propose a probabilistic framework to predict the interaction probability of proteins. The notion of domain combination and domain combination pair is newly introduced and the prediction model in the framework takes domain combination pair as a basic unit of protein interactions to ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
In this paper, we propose a probabilistic framework to predict the interaction probability of proteins. The notion of domain combination and domain combination pair is newly introduced and the prediction model in the framework takes domain combination pair as a basic unit of protein interactions to overcome the limitations of the conventional domain pair based prediction systems. The framework largely consists of prediction preparation and service stages. In the prediction preparation stage, two appearance probability matrices are constructed. Each matrix holds information on appearance frequencies of domain combination pairs in the interacting and non-interacting sets of protein pairs, respectively. Based on the appearance probability matrix, a probability equation is devised. The equation maps a protein pair to a real number in the range of 0 to 1. Two distributions of interacting and non-interacting sets of protein pairs are obtained using the equation. In the prediction service stage, the interaction probability of a protein pair is predicted using the distributions and the equation. The validity of the prediction model is evaluated for the interacting set of protein pairs in a Yeast organism and artificially generated noninteracting set of protein pairs. When 80 % of the set of interacting protein pairs in DIP (Database of Interacting Proteins) is used as a learning set of interacting protein pairs, very high sensitivity (86%) and moderate specificity (56%) are achieved within our framework.
Helmer-Citterich M: pdbFun: mass selection and fast comparison of annotated PDB residues
- Nucleic Acids Res
, 2005
"... pdbFun ..."
IDENTIFYING STRUCTURAL MOTIFS IN PROTEINS
"... In biological macromolecules, structural patterns (motifs) are often repeated across different molecules. Detection of these common motifs in a new molecule can provide useful clues to the functional properties of such a molecule. We formulate the problem of identifying a given structural motif (pat ..."
Abstract
- Add to MetaCart
In biological macromolecules, structural patterns (motifs) are often repeated across different molecules. Detection of these common motifs in a new molecule can provide useful clues to the functional properties of such a molecule. We formulate the problem of identifying a given structural motif (pattern) in a target protein (example) and discuss the notion of complete matches vis-a-vis partial matches. We describe the precise error criterion that has to be minimized and also discuss different metrics for evaluating the quality of partial matches. Secondly, we present a new polynomial time algorithm for the problem of matching a given motif in a target protein. We also use the sequence and (if available) secondary structure information to annotate the different points in motif and the target protein, thus reducing the search space size. Our algorithm guarantees the detection of a perfect match, if present. Even otherwise, the algorithm computes very good matches. Unlike other methods, the error minimized by our algorithm directly translates to root mean square deviation (RMSD), the most commonly accepted metric for structure matching in biological macromolecules. The algorithm does not involve any preprocessing and is suitable for the detection of both small and large motifs in the target protein. We also present experiments exploring the quality of matches found by the algorithm. We examine its performance in matching (both full and partial) active sites in proteins. 1
Pacific Symposium on Biocomputing 8:228-239(2003) IDENTIFYING STRUCTURAL MOTIFS IN PROTEINS
"... In biological macromolecules, structural patterns (motifs) are often repeated across different molecules. Detection of these common motifs in a new molecule can provide useful clues to the functional properties of such a molecule. We formulate the problem of identifying a given structural motif (pat ..."
Abstract
- Add to MetaCart
In biological macromolecules, structural patterns (motifs) are often repeated across different molecules. Detection of these common motifs in a new molecule can provide useful clues to the functional properties of such a molecule. We formulate the problem of identifying a given structural motif (pattern) in a target protein (example) and discuss the notion of complete matches vis-a-vis partial matches. We describe the precise error criterion that has to be minimized and also discuss different metrics for evaluating the quality of partial matches. Secondly, we present a new polynomial time algorithm for the problem of matching a given motif in a target protein. We also use the sequence and (if available) secondary structure information to annotate the different points in motif and the target protein, thus reducing the search space size. Our algorithm guarantees the detection of a perfect match, if present. Even otherwise, the algorithm computes very good matches. Unlike other methods, the error minimized by our algorithm directly translates to root mean square deviation (RMSD), the most commonly accepted metric for structure matching in biological macromolecules. The algorithm does not involve any preprocessing and is suitable for the detection of both small and large motifs in the target protein. We also present experiments exploring the quality of matches found by the algorithm. We examine its performance in matching (both full and partial) active sites in proteins. 1
unknown title
"... Motivation: Structural templates consisting of a few atoms in a specific geometric conformation provide a powerful tool for studying the relationship between protein structure and function. Current methods for template searching constrain template syntax and semantics by their design. Hence there is ..."
Abstract
- Add to MetaCart
Motivation: Structural templates consisting of a few atoms in a specific geometric conformation provide a powerful tool for studying the relationship between protein structure and function. Current methods for template searching constrain template syntax and semantics by their design. Hence there is a need for a more flexible core algorithm upon which to build more sophisticated tools. Statistical analysis of structural similarity is still in its infancy when compared with its analogue in sequence alignment. In the context of template matching, there is an urgent need for normalization of scores so that results from templates with differing sensitivity may be compared directly. Results: We introduce Jess, a fast and flexible algorithm for searching protein structures for small groups of atoms under arbitrary constraints on geometry and chemistry. We apply the algorithm to a set of manually derived enzyme active site templates, and derive an empirical measure for estimating the relative significance of hits encountered using differing templates. Availability: Jess will be available in the near future under a restricted open source licence. Contact:

