Results 1 - 10
of
21
Prediction of local structure in proteins using a library of sequence-structure motifs
- J. MOL. BIOL
, 1998
"... ..."
Hidden markov models that use predicted local structure for fold recognition: alphabets of backbone geometry
- Proteins
, 2003
"... An important problem in computational biology is predicting the structure of the large number of pu-tative proteins discovered by genome sequencing projects. Fold-recognition methods attempt to solve the problem by relating the target proteins to known structures, searching for template proteins hom ..."
Abstract
-
Cited by 24 (10 self)
- Add to MetaCart
An important problem in computational biology is predicting the structure of the large number of pu-tative proteins discovered by genome sequencing projects. Fold-recognition methods attempt to solve the problem by relating the target proteins to known structures, searching for template proteins homologous to the target. Remote homologs which may have significant structural similarity are often not detectable by sequence similarities alone. To address this, we incorporated predicted local structure, a generalization of secondary structure, into two-track profile HMMs. We did not rely on a simple helix-strand-coil definition of secondary structure,
Small libraries of protein fragments model native protein structures accurately
- J. Mol. Biol
, 2002
"... The three-dimensional structure of proteins has been a subject of intense study for several decades. A common way to simplify these complex structures is to consider restrictions on the local mainchain ..."
Abstract
-
Cited by 24 (5 self)
- Add to MetaCart
The three-dimensional structure of proteins has been a subject of intense study for several decades. A common way to simplify these complex structures is to consider restrictions on the local mainchain
Bayesian probabilistic approach for predicting backbone structures in terms of protein blocks
- Proteins
, 2000
"... ABSTRACT By using an unsupervised cluster analyzer, we have identified a local structural alphabet composed of 16 folding patterns of five consecutive C � (“protein blocks”). The dependence that exists between successive blocks is explicitly taken into account. A Bayesian approach based on the relat ..."
Abstract
-
Cited by 23 (5 self)
- Add to MetaCart
ABSTRACT By using an unsupervised cluster analyzer, we have identified a local structural alphabet composed of 16 folding patterns of five consecutive C � (“protein blocks”). The dependence that exists between successive blocks is explicitly taken into account. A Bayesian approach based on the relation protein block-amino acid propensity is used for prediction and leads to a success rate close to 35%. Sharing sequence windows associated with certain blocks into “sequence families ” improves the prediction accuracy by 6%. This prediction accuracy exceeds 75 % when keeping the first four predicted protein blocks at each site of the protein. In addition, two different strategies are proposed: the first one defines the number of protein blocks in each site needed for respecting a user-fixed prediction accuracy, and alternatively, the second one defines the different protein sites to be predicted with a user-fixed number of blocks and a chosen accuracy. This last strategy applied to the ubiquitin conjugating enzyme (�/ � protein) shows that 91 % of the sites may be predicted with a prediction accuracy larger than 77 % considering only three blocks per site. The prediction strategies proposed improve our knowledge about sequence-structure dependence and should be very useful in ab initio protein modelling. Proteins 2000;41:271–287. © 2000 Wiley-Liss, Inc. Key words: protein backbone structure; unsupervised classifier; structure-sequence relationships; structure prediction; protein block; Bayesian approach; prediction strategies
A hidden Markov model derived structural alphabet for proteins
- J Mol Biol
, 2004
"... Understanding and predicting protein structures depend on the complexity and the accuracy of the models used to represent them. We have recently set up a Hidden Markov Model to optimally compress protein three-dimensional conformations into a one-dimensional series of letters of a structural alphabe ..."
Abstract
-
Cited by 19 (3 self)
- Add to MetaCart
Understanding and predicting protein structures depend on the complexity and the accuracy of the models used to represent them. We have recently set up a Hidden Markov Model to optimally compress protein three-dimensional conformations into a one-dimensional series of letters of a structural alphabet. Such a model learns simultaneously the shape of representative structural letters describing the local conformation and the logic of their connections, i.e. the transition matrix between the letters. Here, we move one step further and report some evidence that such a model of protein local architecture also captures some accurate amino acid features. All the letters have specific and distinct amino acid distributions. Moreover, we show that words of amino acids can have significant propensities for some letters. Perspectives point towards the prediction of the series of letters describing the structure of a protein from its amino acid sequence. D 2005 Elsevier B.V. All rights reserved.
Machine Discovery Of Protein Motifs
- MACHINE LEARNING
, 1995
"... The investigation of relations between protein tertiary structure and amino acid sequence is a topic of tremendous importance in molecular biology. The automated discovery of recurrent patterns of structure and sequence is an essential part of this investigation. These patterns, known as protein mot ..."
Abstract
-
Cited by 17 (2 self)
- Add to MetaCart
The investigation of relations between protein tertiary structure and amino acid sequence is a topic of tremendous importance in molecular biology. The automated discovery of recurrent patterns of structure and sequence is an essential part of this investigation. These patterns, known as protein motifs, are abstractions of fragments drawn from proteins of known sequence and tertiary structure. This paper has two objectives. The first is to introduce and define protein motifs, and provide a survey of previous research on protein motif discovery. The second is to present and apply a novel approach to protein motif representation and discovery, which is based on a spatial description logic and the symbolic machine learning paradigm of structured concept formation. A large database of protein fragments is processed using this approach, and several interesting and significant protein motifs are discovered.
Protein block expert (pbe): a web-based protein structure analysis server using a structural alphabet
- Nucl. Acids. Res
, 2006
"... Encoding protein 3D structures into 1D string using short structural prototypes or structural alphabets opens a new front for structure comparison and analysis. Using the well-documented 16 motifs of Protein Blocks (PBs) as structural alphabet, we have developed a methodology to compare protein stru ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Encoding protein 3D structures into 1D string using short structural prototypes or structural alphabets opens a new front for structure comparison and analysis. Using the well-documented 16 motifs of Protein Blocks (PBs) as structural alphabet, we have developed a methodology to compare protein structures that are encoded as sequences of PBs by aligning them using dynamic programming which uses a substitution matrix for PBs. This methodology is implemented in the applications available in Protein Block Expert (PBE) server. PBE addresses common issues in the field of protein structure analysis such as comparison of proteins structures and identification of protein structures in structural databanks that resemble a given structure. PBE-T provides facility to transform any PDB file into sequences of PBs. PBE-ALIGNc performs comparison of two protein structures based on the alignment of their corresponding PB sequences. PBE-ALIGNm is a facility for mining SCOP database for similar structures based on the alignment of PBs. Besides, PBE provides an interface to a database (PBE-SAdb) of preprocessed PB sequences from SCOP culled at 95 % and of all-against-all pairwise PB alignments at family and superfamily levels. PBE server is freely available at
Machine Learning and its Application to Bioinformatics: An Overview
, 2001
"... Biological research has become a data driven discipline due to high-throughput research, the biomolecular databases are expanding at an enormous rate. As a result, bioinformatics has emerged as an important discipline in the post genome era, and exploring and explaining the knowledge hidden in the b ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Biological research has become a data driven discipline due to high-throughput research, the biomolecular databases are expanding at an enormous rate. As a result, bioinformatics has emerged as an important discipline in the post genome era, and exploring and explaining the knowledge hidden in the biomolecular database has become the grand challenge for bioinformatics. An efficient and inexpensive approach is required to solve problems in molecular biology; machine learning which is an automatic and intelligent learning technique may help to achieve this role. The aim of this survey paper is to introduce machine learning techniques in the context of their application in bioinformatics, to experimental biologists and bioinformaticians.
Classification of protein 3d folds by hidden markov learning on sequences of structural alphabets
- In RECOMB
, 2005
"... Fragment-based analysis of protein three-dimensional (3D) structures has received increased attention in recent years. Here, we used a set of pentamer local structure alphabets (LSAs) recently derived in our laboratory to represent protein structures, i.e. we transformed the 3D structures into one-d ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Fragment-based analysis of protein three-dimensional (3D) structures has received increased attention in recent years. Here, we used a set of pentamer local structure alphabets (LSAs) recently derived in our laboratory to represent protein structures, i.e. we transformed the 3D structures into one-dimensional (1D) sequences of LSAs. We then applied Hidden Markov Model training to these LSA sequences to assess their ability to capture features characteristic of 43 populated protein folds. In the size range of LSAs examined (5 to 41 alphabets), the performance was optimal using 20 alphabets, giving an accuracy of fold classification of 82 % in a 5-fold cross-validation on training-set structures sharing < 40 % pairwise sequence identity at the amino acid level. For test-set structures, the accuracy was as high as for the training set, but fell to 65 % for those sharing no more than 25 % amino acid sequence identity with the training-set structures. These results suggest that sufficient 3D information can be retained during the drastic 3D->1D transformation for use as a framework for developing efficient and useful structural bioinformatics tools. 1.

