Results 1 - 10
of
34
Rapid protein side-chain packing via tree decomposition
- Research in Computational Molecular Biology, Lecture Notes in Computer Science
, 2005
"... Abstract. This paper proposes a novel tree decomposition based side-chain assignment algorithm, which can obtain the globally optimal solution of the side-chain packing problem very efficiently. Theoretically, the computational complexity of this algorithm is O((N +M)n tw+1 rot) where N is the numbe ..."
Abstract
-
Cited by 17 (1 self)
- Add to MetaCart
Abstract. This paper proposes a novel tree decomposition based side-chain assignment algorithm, which can obtain the globally optimal solution of the side-chain packing problem very efficiently. Theoretically, the computational complexity of this algorithm is O((N +M)n tw+1 rot) where N is the number of residues in the protein, M the number of interacting residue pairs, nrot the average number of rotamers for each residue and tw( = O(N 2 3 log N)) the tree width of the residue interaction graph. Based on this algorithm, we have developed a side-chain prediction program SCATD (Side Chain Assignment via Tree Decomposition). Experimental results show that after the Goldstein DEE is conducted, nrot is around 3.5, tw is only 3 or 4 for most of the test proteins in the SCWRL benchmark and less than 10 for all the test proteins. SCATD runs up to 90 times faster than SCWRL 3.0 on some large proteins in the SCWRL benchmark and achieves an average of five times faster speed on all the test proteins. If only the post-DEE stage is taken into consideration, then our tree-decomposition based energy minimization algorithm is more than 200 times faster than that in SCWRL 3.0 on some large proteins. SCATD is freely available for academic research upon request. 1
Fold recognition by predicted alignment accuracy
- ACM/IEEE Transactions on Computational Biology and Bioinformatics
, 2005
"... Abstract—One of the key components in protein structure prediction by protein threading technique is to choose the best overall template for a given target sequence after all the optimal sequence-template alignments are generated. The chosen template should have the best alignment with the target se ..."
Abstract
-
Cited by 10 (3 self)
- Add to MetaCart
Abstract—One of the key components in protein structure prediction by protein threading technique is to choose the best overall template for a given target sequence after all the optimal sequence-template alignments are generated. The chosen template should have the best alignment with the target sequence since the three-dimensional structure of the target sequence is built on the sequence-template alignment. The traditional method for template selection is called Z-score, which uses a statistical test to rank all the sequence-template alignments and then chooses the first-ranked template for the sequence. However, the calculation of Z-score is time-consuming and not suitable for genome-scale structure prediction. Z-scores are also hard to interpret when the threading scoring function is the weighted sum of several energy items of different physical meanings. This paper presents a Support Vector Machine (SVM) regression approach to directly predict the alignment accuracy of a sequence-template alignment, which is used to rank all the templates for a specific target sequence. Experimental results on a large-scale benchmark demonstrate that SVM regression performs much better than the composition-corrected Z-score method. SVM regression also runs much faster than the Z-score method. Index Terms—Protein structure prediction, protein threading, protein fold recognition, SVM regression. 1
Pcons5: combining consensus, structural evaluation and fold recognition scores
- Bioinformatics
, 2005
"... doi:10.1093/bioinformatics/bti702 ..."
Fold recognition by combining profile–profile alignment and support vector machine
- Bioinformatics
, 2005
"... Motivation: Currently, the most accurate fold recognition method is to perform profile-profile alignments and estimate the statistical significances of those alignments by calculating z-score or E-value. Although this scheme is reliable in recognizing relatively close homologs related at the family ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
Motivation: Currently, the most accurate fold recognition method is to perform profile-profile alignments and estimate the statistical significances of those alignments by calculating z-score or E-value. Although this scheme is reliable in recognizing relatively close homologs related at the family level, it has difficulty in finding the remote homologs that are related at the superfamily or fold level. Results: Here, we present an alternative way to estimate the significance of the alignments. The alignment between a query protein and a template of length n in the fold library is transformed into a feature vector of length n+1, which is then evaluated by support vector machine (SVM). The output from SVM is converted to a posterior probability that a query sequence is related to a template given SVM output. Results show that a new method shows significantly better performance than PSI-BLAST and profile-profile alignment with z-score scheme. While PSI-BLAST and z-score scheme detect 16 % and 20 % of superfamily-related proteins, respectively, at 90 % specificity, a new method detects 46 % of these proteins, resulting in more than two fold increase in sensitivity. More significantly, at the fold level, a new method can detect 14 % of remotely related proteins at 90 % specificity, remarkable result considering the fact that the other methods can detect almost none at the same level of specificity. Contact:
ACE: Consensus Fold Recognition by Predicted Model Quality
, 2004
"... I hereby declare that I am the sole author of this thesis. This is a true copy of the thesis, including any required final revision, as accepted by my examiners. I understand that my thesis may be made electronically available to the public. ii Protein structure prediction has been a fundamental cha ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
I hereby declare that I am the sole author of this thesis. This is a true copy of the thesis, including any required final revision, as accepted by my examiners. I understand that my thesis may be made electronically available to the public. ii Protein structure prediction has been a fundamental challenge in the biological field. In this post-genomic era, the need for automated protein structure prediction has never been more evident and researchers are now focusing on developing computa-tional techniques to predict three-dimensional structures with high throughput. Consensus-based protein structure prediction methods are state-of-the-art in automatic protein structure prediction. A consensus-based server combines the outputs of several individual servers and tends to generate better predictions than any individual server. Consensus-based methods have proved to be successful in recent CASP (Critical Assessment of Structure Prediction). In this thesis, a Support Vector Machine (SVM) regression-based consensus method is proposed for protein fold recognition, a key component for high through-put protein structure prediction and protein function annotation. The SVM first extracts the features of a structural model by comparing the model to the other models produced by all the individual servers. Then, the SVM predicts the quality of each model. The experimental results from several LiveBench data sets confirm that our proposed consensus method, SVM regression, consistently performs better than any individual server. Based on this method, we developed a meta server, the Alignment by Consensus Estimation (ACE). iii
IPASS: error tolerant NMR backbone resonance assignment by linear programming
, 2009
"... Abstract. The automation of the entire NMR protein structure determination process requires a superior error tolerant backbone resonance assignment method. Although a variety of assignment approaches have been developed, none works well on noisy automatically picked peaks. IPASS is proposed as a nov ..."
Abstract
-
Cited by 5 (4 self)
- Add to MetaCart
Abstract. The automation of the entire NMR protein structure determination process requires a superior error tolerant backbone resonance assignment method. Although a variety of assignment approaches have been developed, none works well on noisy automatically picked peaks. IPASS is proposed as a novel integer linear programming (ILP) based assignment method. In order to reduce size of the problem, IPASS employs probabilistic spin system typing based on chemical shifts and secondary structure predictions. Furthermore, IPASS extracts connectivity information from the inter-residue information and the 15 N-edited NOESY peaks which are then used to fix reliable fragments. The experimental results demonstrate that IPASS significantly outperforms the previous assignment methods on the synthetic data sets. It achieves an average of 99 % precision and 96 % recall on the synthesized spin systems, and an average of 96 % precision and 90 % recall on the synthesized peak lists. When applied on automatically picked peaks from experimentally derived data sets, it achieves an average precision and recall of 78 % and 67%, respectively. In contrast, the next best method, MARS, achieved an average precision and recall of 50 % and 40%, respectively. Availability: IPASS is available upon request, and the web server for IPASS is under construction.
Efficient Parameterized Algorithm for Biopolymer Structure-Sequence Alignment
- In Proceedings of Workshop on Algorithms for Bioinformatics
, 2005
"... Abstract. Computational alignment of a biopolymer sequence (e.g., an RNA or a protein) to a structure is an effective approach to predict and search for the structure of new sequences. To identify the structure of remote homologs, the structure-sequence alignment has to consider not only sequence si ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
Abstract. Computational alignment of a biopolymer sequence (e.g., an RNA or a protein) to a structure is an effective approach to predict and search for the structure of new sequences. To identify the structure of remote homologs, the structure-sequence alignment has to consider not only sequence similarity but also spatially conserved conformations caused by residue interactions, and consequently is computationally intractable. It is difficult to cope with the inefficiency without compromising alignment accuracy, especially for structure search in genomes or large databases. This paper introduces a novel method and a parameterized algorithm for structuresequence alignment. Both the structure and the sequence are represented as graphs, where in general the graph for a biopolymer structure has a naturally small tree width. The algorithm constructs an optimal alignment by finding in the sequence graph the maximum valued subgraph isomorphic to the structure graph. It has the computational time complexity O(k t N 2) for the structure of N residues and its tree decomposition of width t. The parameter k, small in nature, is determined by a statistical cutoff for the correspondence between the structure and the sequence. The paper demonstrates a successful application of the algorithm to developing a fast program for RNA structural homology search. 1
Assessment of RAPTOR's Linear Programming Approach in CAFASP3
- in cafasp3. Proteins
, 2003
"... We have developed a new algorithm based on the mathematical theory of linear programming (LP) and implemented it in our program RAPTOR. Our new approach provides an elegant formulation of the protein threading problem, overcomes the intractability problem of protein threading, in practice, and allow ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
We have developed a new algorithm based on the mathematical theory of linear programming (LP) and implemented it in our program RAPTOR. Our new approach provides an elegant formulation of the protein threading problem, overcomes the intractability problem of protein threading, in practice, and allows us to use existing powerful linear programming software to obtain optimal protein threading solutions. CASP5 and CAFASP3 gave us the rst chance to test RAPTOR in an unbiased way. RAPTOR was ranked as the top individual (automatic) server for fold recognition by the CAFASP3 organizers. In this short paper, we describe RAPTOR's LP formulation, assess RAPTOR's performance in CAFASP3/CASP5, explain why it has superceded other existing automatic individual methods, and point out its strengths, limitations, extensions and prospects for improvement.
Generalized Pattern Search Algorithm for Peptide Structure Prediction
, 2008
"... AQ1Š ABSTRACT Finding the near-native structure of a protein is one of the most important open problems in structural biology and AQ2Š biological physics. The problem becomes dramatically more difficult when a given protein has no regular secondary structure or it does not show a fold similar to str ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
AQ1Š ABSTRACT Finding the near-native structure of a protein is one of the most important open problems in structural biology and AQ2Š biological physics. The problem becomes dramatically more difficult when a given protein has no regular secondary structure or it does not show a fold similar to structures already known. This situation occurs frequently when we need to predict the tertiary structure of small molecules, called peptides. In this research work, we propose a new ab initio algorithm, the generalized pattern search algorithm, based on the well-known class of Search-and-Poll algorithms. Inspired by the approach proposed by other researchers, we performed an extensive set of simulations over a well-known set of 44 peptides to investigate the robustness and reliability of the proposed algorithm, and we compared the peptide conformation with a state-of-the-art algorithm for peptide structure prediction known as PEPstr. In particular, we tested the algorithm on the instances proposed by the originators of PEPstr, to validate the proposed algorithm; the experimental results confirm that the generalized pattern search algorithm outperforms AQ3Š When analyzing the complex structure of a biological system, proteins are the most attracting molecular devices. They are likely involved in all processes of a living organism; they are responsible for behavioral changes in the cells. Due to the
Improved pairwise alignments of proteins in the Twilight Zone using local structure predictions
- Bioinformatics
, 2006
"... doi:10.1093/bioinformatics/bti828 ..."

