Results 1 - 10
of
15
Large-scale prediction of disulphide bridges using kernel methods, two-dimensional recursive neural networks, and weighted graph matching
- Proteins
, 2006
"... ABSTRACT The formation of disulphide bridges between cysteines plays an important role in protein folding, structure, function, and evolution. Here, we develop new methods for predicting disulphide bridges in proteins. We first build a large curated data set of proteins containing disulphide bridges ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
ABSTRACT The formation of disulphide bridges between cysteines plays an important role in protein folding, structure, function, and evolution. Here, we develop new methods for predicting disulphide bridges in proteins. We first build a large curated data set of proteins containing disulphide bridges to extract relevant statistics. We then use kernel methods to predict whether a given protein chain contains intrachain disulphide bridges or not, and recursive neural networks to predict the bonding probabilities of each pair of cysteines in the chain. These probabilities in turn lead to an accurate estimation of the total number of disulphide bridges and to a weighted graph matching problem that can be addressed efficiently to infer the global disulphide bridge connectivity pattern. This approach can be applied both in situations where the bonded state of each cysteine is known, or in ab initio mode where the state is unknown. Furthermore, it can easily cope with chains containing an arbitrary number of disulphide bridges, overcoming one of the major limitations of previous approaches. It can classify individual cysteine residues as bonded or nonbonded with 87 % specificity and 89 % sensitivity. The estimate for the total number of bridges in each chain is correct 71 % of the times, and within one from the true value over 94 % of the times. The prediction of the overall disulphide connectivity pattern is exact in about 51 % of the chains. In addition to using profiles in the input to leverage evolutionary information, including true (but not predicted) secondary structure and solvent accessibility information yields small but noticeable improvements. Finally, once the system is trained, predictions can be computed rapidly on a proteomic or protein-engineering scale. The disulphide bridge prediction server (DIpro), software, and datasets are available through www.igb.uci.edu/servers/pass.html.
Prediction of subcellular localization using sequence-biased recurrent networks
- Bioinformatics
, 2005
"... doi:10.1093/bioinformatics/bti372 ..."
Neural Methods for Non-Standard Data
- proceedings of the 12 th European Symposium on Artificial Neural Networks (ESANN 2004), d-side pub
, 2004
"... Standard pattern recognition provides effective and noise-tolerant tools for machine learning tasks; however, most approaches only deal with real vectors of a finite and fixed dimensionality. In this tutorial paper, we give an overview about extensions of pattern recognition towards non-standard ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
Standard pattern recognition provides effective and noise-tolerant tools for machine learning tasks; however, most approaches only deal with real vectors of a finite and fixed dimensionality. In this tutorial paper, we give an overview about extensions of pattern recognition towards non-standard data which are not contained in a finite dimensional space, such as strings, sequences, trees, graphs, or functions. Two major directions can be distinguished in the neural networks literature: models can be based on a similarity measure adapted to non-standard data, including kernel methods for structures as a very prominent approach, but also alternative metric based algorithms and functional networks; alternatively, non-standard data can be processed recursively within supervised and unsupervised recurrent and recursive networks and fully recurrent systems.
Identifying Cysteines and Histidines in Transition-Metal-Binding Sites Using Support Vector Machines and Neural Networks
"... ABSTRACT Accurate predictions of metal-binding sites in proteins by using sequence as the only source of information can significantly help in the prediction of protein structure and function, genome annotation, and in the experimental determination of protein structure. Here, we introduce a method ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
ABSTRACT Accurate predictions of metal-binding sites in proteins by using sequence as the only source of information can significantly help in the prediction of protein structure and function, genome annotation, and in the experimental determination of protein structure. Here, we introduce a method for identifying histidines and cysteines that participate in binding of several transition metals and iron complexes. The method predicts histidines as being in either of two states (free or metal bound) and cysteines in either of three states (free, metal bound, or in disulfide bridges). The method uses only sequence information by utilizing position-specific evolutionary profiles as well as more global descriptors such as protein length and amino acid composition. Our solution is based on a two-stage machine-learning approach. The first stage consists of a support vector machine trained to locally classify the binding state of single histidines and cysteines. The second stage consists of a bidirectional recurrent neural network trained to refine local predictions by taking into account dependencies among residues within the same protein. A simple finite state automaton is employed as a postprocessing in the second stage in order to enforce an even number of disulfide-bonded cysteines. We predict histidines and cysteines in transition-metal-binding sites at 73% precision and 61 % recall. We observe significant differences in performance depending on the ligand (histidine or cysteine) and on the metal bound. We also predict cysteines participating in disulfide bridges at 86% precision and 87 % recall. Results are compared to those that would be obtained by using expert information as represented by PROSITE motifs and, for disulfide bonds, to state-of-the-art methods. Proteins 2006;
Detecting and Sorting Targeting Peptides with Neural Networks and
"... This paper presents a composite multi-layer classifier system for predicting the subcellular localization of proteins based on their amino acid sequence. The work is an extension of our previous predictor PProwler v1.1 which is itself built upon the series of predictors SignalP and TargetP. In th ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
This paper presents a composite multi-layer classifier system for predicting the subcellular localization of proteins based on their amino acid sequence. The work is an extension of our previous predictor PProwler v1.1 which is itself built upon the series of predictors SignalP and TargetP. In this study we outline experiments conducted to improve the classifier design. The major improvement came from using Support Vector machines as a 'smart gate' sorting the outputs of several di#erent targeting peptide detection networks
DISULFIND: a disulfide bonding state and cysteine connectivity prediction server
- Nucleic Acids Res
, 2006
"... DISULFIND is a server for predicting the disulfide bonding state of cysteines and their disulfide connectivity starting from sequence alone. Optionally, disulfide connectivity can be predicted from sequence and a bonding state assignment given as input. The output is a simple visualization of the as ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
DISULFIND is a server for predicting the disulfide bonding state of cysteines and their disulfide connectivity starting from sequence alone. Optionally, disulfide connectivity can be predicted from sequence and a bonding state assignment given as input. The output is a simple visualization of the assigned bonding state (with confidence degrees) and the most likely connectivity patterns. The server is available at
CysView: Protein classification based on cysteine pairing patterns. Nucleic Acids Res
- Nucleic Acids Res
, 2004
"... CysView is a web-based application tool that identifies and classifies proteins according to their disulfide connectivity patterns. It accepts a dataset of annotated protein sequences in various formats and returns a graphical representation of cysteine pairing patterns. CysView displays cysteine pa ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
CysView is a web-based application tool that identifies and classifies proteins according to their disulfide connectivity patterns. It accepts a dataset of annotated protein sequences in various formats and returns a graphical representation of cysteine pairing patterns. CysView displays cysteine patterns for those records in the data with disulfide annotations. It allows the viewing of records grouped by connectivity patterns. CysView’s utility as an analysis tool was demonstrated by the rapid and correct classification of scorpion toxin entries from GenPept on the basis of their disulfide pairing patterns. It has proved useful for rapid detection of irrelevant and partial records, or those with incomplete annotations. CysView can be used to support distant homology between proteins. CysView is publicly available at
Improving prediction of zinc binding sites by modeling the linkage between residues close in sequence
- of Lecture Notes in Computer Science
, 2006
"... Abstract. We describe and empirically evaluate machine learning methods for the prediction of zinc binding sites from protein sequences. We start by observing that a data set consisting of single residues as examples is affected by autocorrelation and we propose an ad-hoc remedy in which sequentiall ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Abstract. We describe and empirically evaluate machine learning methods for the prediction of zinc binding sites from protein sequences. We start by observing that a data set consisting of single residues as examples is affected by autocorrelation and we propose an ad-hoc remedy in which sequentially close pairs of candidate residues are classified as being jointly involved in the coordination of a zinc ion. We develop a kernel for this particular type of data that can handle variable length gaps between candidate coordinating residues. Our empirical evaluation on a data set of non redundant protein chains shows that explicit modeling the correlation between residues close in sequence allows us to gain a significant improvement in the prediction performance. 1
Private correspondence
, 1998
"... Disulfide bonds play important roles in the folding and stability of proteins and are evolutionary conserved. A classic example is RNase A (also known as bovine pancreatic ribonuclease), which contains four conserved disulfide bonds among eight cysteines. However, human RNase 8, a paralog of RNase A ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Disulfide bonds play important roles in the folding and stability of proteins and are evolutionary conserved. A classic example is RNase A (also known as bovine pancreatic ribonuclease), which contains four conserved disulfide bonds among eight cysteines. However, human RNase 8, a paralog of RNase A uniquely expressed in the placenta, has lost one of the conserved cysteines but gained another, when compared to RNase 8 of various monkeys and to RNase A. We here show that both the loss and gain of the cysteines in human RNase 8 occurred in the common ancestor of African great apes (humans, chimps, and gorillas) 7-13 million years ago. Computational predictions suggest changes of disulfide bonding by these cysteine substitutions. Site-directed mutagenesis indicates that if the ribonucleolytic activity is essential for RNase 8’s function, the gain of the cysteine must have preceded the loss. Human RNase 8 represents one of the first examples in which the presumable evolutionary change of a disulfide bond involves one loss and one gain of cysteine, instead of two losses or two gains. Our results provide the foundation for detailed analysis toward understanding the impact of disulfide-bond reshuffling on the structure, function, and evolution of proteins in general and human RNase 8 in particular. 2
Detecting Residues in Targeting Peptides
, 2004
"... This paper presents a system of recurrent neural networks which demonstrate an ability to detect residues belonging to specific targeting peptides with greater accuracy than current feed forward models. The system can subsequently be used for determining sub-cellular localisation of proteins and ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
This paper presents a system of recurrent neural networks which demonstrate an ability to detect residues belonging to specific targeting peptides with greater accuracy than current feed forward models. The system can subsequently be used for determining sub-cellular localisation of proteins and for understanding the factors underlying translocation. The work can be seen as building upon the currently popular series of predictors SignalP and TargetP, by exploiting the inherent bias for sequential pattern recognition exhibited by recurrent networks

