Results 1 - 10
of
12
Learning structured prediction models: a large margin approach
, 2004
"... We consider large margin estimation in a broad range of prediction models where inference involves solving combinatorial optimization problems, for example, weighted graphcuts or matchings. Our goal is to learn parameters such that inference using the model reproduces correct answers on the training ..."
Abstract
-
Cited by 127 (7 self)
- Add to MetaCart
We consider large margin estimation in a broad range of prediction models where inference involves solving combinatorial optimization problems, for example, weighted graphcuts or matchings. Our goal is to learn parameters such that inference using the model reproduces correct answers on the training data. Our method relies on the expressive power of convex optimization problems to compactly capture inference or solution optimality in structured prediction models. Directly embedding this structure within the learning formulation produces concise convex problems for efficient estimation of very complex and diverse models. We describe experimental results on a matching task, disulfide connectivity prediction, showing significant improvements over state-of-the-art methods. 1.
Disulfide connectivity prediction using recursive neural networks and evolutionary information
- Bioinformatics
, 2004
"... Motivation. We focus on the prediction of disulfide bridges in proteins starting from their amino acid sequence and from the knowledge of the disulfide bonding state of each cysteine. The location of disulfide bridges is a structural feature that conveys important information about the protein main ..."
Abstract
-
Cited by 30 (3 self)
- Add to MetaCart
Motivation. We focus on the prediction of disulfide bridges in proteins starting from their amino acid sequence and from the knowledge of the disulfide bonding state of each cysteine. The location of disulfide bridges is a structural feature that conveys important information about the protein main chain conformation and can therefore help towards the solution of the folding problem. Existing approaches based on weighted graph matching algorithms do not take advantage of evolutionary information. Recursive neural networks (RNN), on the other hand, can handle in a natural way complex data structures such as graphs whose vertices are labeled by real vectors, allowing us to incorporate multiple alignment profiles in the graphical representation of disulfide connectivity patterns. Results. The core of the method is the use of machine learning tools to rank alternative disulfide connectivity patterns. We develop an ad-hoc RNN architecture for scoring labeled undirected graphs that represent connectivity patterns. In order to compare our algorithm with previous methods, we report experimental results on the SWISS-PROT 39 data set. We find that using multiple alignment profiles allows us to obtain significant prediction accuracy improvements, clearly demonstrating the important role played by evolutionary information. Availability. The Web interface of the predictor is available at
Large-scale prediction of disulphide bridges using kernel methods, two-dimensional recursive neural networks, and weighted graph matching
- Proteins
, 2006
"... ABSTRACT The formation of disulphide bridges between cysteines plays an important role in protein folding, structure, function, and evolution. Here, we develop new methods for predicting disulphide bridges in proteins. We first build a large curated data set of proteins containing disulphide bridges ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
ABSTRACT The formation of disulphide bridges between cysteines plays an important role in protein folding, structure, function, and evolution. Here, we develop new methods for predicting disulphide bridges in proteins. We first build a large curated data set of proteins containing disulphide bridges to extract relevant statistics. We then use kernel methods to predict whether a given protein chain contains intrachain disulphide bridges or not, and recursive neural networks to predict the bonding probabilities of each pair of cysteines in the chain. These probabilities in turn lead to an accurate estimation of the total number of disulphide bridges and to a weighted graph matching problem that can be addressed efficiently to infer the global disulphide bridge connectivity pattern. This approach can be applied both in situations where the bonded state of each cysteine is known, or in ab initio mode where the state is unknown. Furthermore, it can easily cope with chains containing an arbitrary number of disulphide bridges, overcoming one of the major limitations of previous approaches. It can classify individual cysteine residues as bonded or nonbonded with 87 % specificity and 89 % sensitivity. The estimate for the total number of bridges in each chain is correct 71 % of the times, and within one from the true value over 94 % of the times. The prediction of the overall disulphide connectivity pattern is exact in about 51 % of the chains. In addition to using profiles in the input to leverage evolutionary information, including true (but not predicted) secondary structure and solvent accessibility information yields small but noticeable improvements. Finally, once the system is trained, predictions can be computed rapidly on a proteomic or protein-engineering scale. The disulphide bridge prediction server (DIpro), software, and datasets are available through www.igb.uci.edu/servers/pass.html.
A two-stage SVM architecture for predicting the disulfide bonding state of cysteines
- In Proc. of the IEEE Workshop on Neural Networks for Signal Processing
, 2002
"... Abstract. Cysteines may form covalent bonds, known as disulfide bridges, that have an important role in stabilizing the native conformation of proteins. Several methods have been proposed for predicting the bonding state of cysteines, either using local context or using global protein descriptors. I ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
Abstract. Cysteines may form covalent bonds, known as disulfide bridges, that have an important role in stabilizing the native conformation of proteins. Several methods have been proposed for predicting the bonding state of cysteines, either using local context or using global protein descriptors. In this paper we introduce an SVM based predictor that operates in two stages. The first stage is a multi-class classifier that operates at the protein level. The second stage is a binary classifier that refines the prediction by exploiting local context enriched with evolutionary information in the form of multiple alignment profiles. The prediction accuracy of the system is 83.6 % measured by 5-fold cross validation, on a set of 716 proteins from the September 2001 PDB Select dataset.
A novel database of disulfide patterns and its application to the discovery of distantly related homologs
- J. Mol. Biol
, 2004
"... Disulfide bonds are conserved strongly among proteins of related structure and function. Despite the explosive growth of protein sequence databases and the vast numbers of sequence search tools, no tool exists to draw relations between the disulfide patterns of homologous proteins. We present a comp ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Disulfide bonds are conserved strongly among proteins of related structure and function. Despite the explosive growth of protein sequence databases and the vast numbers of sequence search tools, no tool exists to draw relations between the disulfide patterns of homologous proteins. We present a comprehensive database of disulfide bonding patterns and a search method to find proteins with similar disulfide patterns. The disulfide database was constructed using disulfide annotations extracted from SwissProt, and was expanded significantly from 16,736 to 94,499 disulfide-containing domains by an inference method that combines SwissProt annotations with Pfam multiple alignments. To search the database, we define a disulfide description, called the disulfide signature, which encodes both spacings between cysteine residues and cysteine connectivity. A web tool was developed that allows users to search for related disulfide patterns and for subpatterns resulting from the removal of one or more disulfides from the pattern. We explore the possibility of using
DISULFIND: a disulfide bonding state and cysteine connectivity prediction server
- Nucleic Acids Res
, 2006
"... DISULFIND is a server for predicting the disulfide bonding state of cysteines and their disulfide connectivity starting from sequence alone. Optionally, disulfide connectivity can be predicted from sequence and a bonding state assignment given as input. The output is a simple visualization of the as ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
DISULFIND is a server for predicting the disulfide bonding state of cysteines and their disulfide connectivity starting from sequence alone. Optionally, disulfide connectivity can be predicted from sequence and a bonding state assignment given as input. The output is a simple visualization of the assigned bonding state (with confidence degrees) and the most likely connectivity patterns. The server is available at
CysView: Protein classification based on cysteine pairing patterns. Nucleic Acids Res
- Nucleic Acids Res
, 2004
"... CysView is a web-based application tool that identifies and classifies proteins according to their disulfide connectivity patterns. It accepts a dataset of annotated protein sequences in various formats and returns a graphical representation of cysteine pairing patterns. CysView displays cysteine pa ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
CysView is a web-based application tool that identifies and classifies proteins according to their disulfide connectivity patterns. It accepts a dataset of annotated protein sequences in various formats and returns a graphical representation of cysteine pairing patterns. CysView displays cysteine patterns for those records in the data with disulfide annotations. It allows the viewing of records grouped by connectivity patterns. CysView’s utility as an analysis tool was demonstrated by the rapid and correct classification of scorpion toxin entries from GenPept on the basis of their disulfide pairing patterns. It has proved useful for rapid detection of irrelevant and partial records, or those with incomplete annotations. CysView can be used to support distant homology between proteins. CysView is publicly available at
Machine Learning in Structural Genomics
"... Proteins are polymer chains composed of twenty simpler molecules, called amino acids, that carry out most of the molecular functions in living organisms. Although a protein can be first characterized by its amino acid sequence, or primary sequence, most proteins fold into three-dimensional ..."
Abstract
- Add to MetaCart
Proteins are polymer chains composed of twenty simpler molecules, called amino acids, that carry out most of the molecular functions in living organisms. Although a protein can be first characterized by its amino acid sequence, or primary sequence, most proteins fold into three-dimensional
Performance comparison of generalized PSSM in in signal peptide cleavage site
- Third IEEE Symposium on BioInformativs and BioEngineering (BIBE '03). Bourbakis N
"... We generalize the familiar position-dependent positionspecific score matrix (PSSM), aka weight matrix, by considering a log-odds score for (nonadjacent) -tuple frequencies, each -tuple score weighted by the product of its mutual information and its statistical significance, as measured by a point e ..."
Abstract
- Add to MetaCart
We generalize the familiar position-dependent positionspecific score matrix (PSSM), aka weight matrix, by considering a log-odds score for (nonadjacent) -tuple frequencies, each -tuple score weighted by the product of its mutual information and its statistical significance, as measured by a point estimator for the -value of the mutual information. Performance of this new approach, along with other variants of generalized PSSM and profile methods, is measured by receiver-operating characteristic (ROC) curves for the specific problem of signal peptide cleavage site recognition. We additionally compare Vert's recent support vector machine string kernel [29], Brown's joint probability approximation algorithm [7, 4, 18] and the method WAM [31].

