Results 1 - 10
of
43
3DCoffee: Combining protein sequences and structures within multiple sequence alignments
- J Mol Biol
, 2004
"... It has long been assumed that using structural information can increase the accuracy of multiple protein sequence alignments (MSA). 1 Recent results 2,3 suggest that accurate MSAs obtained this ..."
Abstract
-
Cited by 29 (5 self)
- Add to MetaCart
It has long been assumed that using structural information can increase the accuracy of multiple protein sequence alignments (MSA). 1 Recent results 2,3 suggest that accurate MSAs obtained this
Multiple structural alignment by secondary structures: Algorithm and applications
- PROTEIN SCI.
, 2003
"... ..."
Towards Index-based Similarity Search for Protein Structure Databases
- In IEEE Computer Society Bioinformatics Conf
, 2003
"... We propose two methods for finding similarities in protein structure databases. Our techniques extract feature vectors on triplets of SSEs (Secondary Structure Elements) of proteins. These feature vectors are then indexed using a multidimensional index structure. Our first technique considers the pr ..."
Abstract
-
Cited by 17 (4 self)
- Add to MetaCart
We propose two methods for finding similarities in protein structure databases. Our techniques extract feature vectors on triplets of SSEs (Secondary Structure Elements) of proteins. These feature vectors are then indexed using a multidimensional index structure. Our first technique considers the problem of finding proteins similar to a given query protein in a protein dataset. This technique quickly finds promising proteins using the index structure. These proteins are then aligned to the query protein using a popular pairwise alignment tool such as VAST. We also develop a novel statistical model to estimate the goodness of a match using the SSEs. Our second technique considers the problem of joining two protein datasets to find an all-to-all similarity. Experimental results show that our techniques improve the pruning time of VAST 3 to 3.5 times while keeping the sensitivity similar.
Sensitivity and selectivity in protein structure comparison
- Protein Sci
, 2004
"... Seven protein structure comparison methods and two sequence comparison programs were evaluated on their ability to detect either protein homologs or domains with the same topology (fold) as defined by the CATH structure database. The structure alignment programs Dali, Structal, Combinatorial Extensi ..."
Abstract
-
Cited by 15 (0 self)
- Add to MetaCart
Seven protein structure comparison methods and two sequence comparison programs were evaluated on their ability to detect either protein homologs or domains with the same topology (fold) as defined by the CATH structure database. The structure alignment programs Dali, Structal, Combinatorial Extension (CE), VAST, and Matras were tested along with SGM and PRIDE, which calculate a structural distance between two domains without aligning them. We also tested two sequence alignment programs, SSEARCH and PSI-BLAST. Depending upon the level of selectivity and error model, structure alignment programs can detect roughly twice as many homologous domains in CATH as sequence alignment programs. Dali finds the most homologs, 321–533 of 1120 possible true positives (28.7%–45.7%), at an error rate of 0.1 errors per query (EPQ), whereas PSI-BLAST finds 365 true positives (32.6%), regardless of the error model. At an EPQ of 1.0, Dali finds 42%–70 % of possible homologs, whereas Matras finds 49%–57%; PSI-BLAST finds 36.9%. However, Dali achieves>84 % coverage before the first error for half of the families tested. Dali and PSI-BLAST find 9.2 % and 5.2%, respectively, of the 7056 possible topology pairs at an EPQ of 0.1 and 19.5, and 5.9 % at an EPQ of 1.0. Most statistical significance estimates reported by the structural alignment programs overestimate the significance of an alignment by orders of magnitude when compared with the actual distribution of errors. These results help quantify the statistical distinction between analogous and homologous structures, and provide a benchmark for structure comparison statistics.
CTSS: A Robust and Efficient Method for Protein Structure Alignment Based on Local Geometrical and Biological Features
- In CSB
, 2003
"... We present a new method for conducting protein structure similarity searches, which improves on the accuracy, robustness, and efficiency of some existing techniques. Our method is grounded in the theory of differential geometry on 3D space curve matching. We generate shape signatures for proteins th ..."
Abstract
-
Cited by 15 (3 self)
- Add to MetaCart
We present a new method for conducting protein structure similarity searches, which improves on the accuracy, robustness, and efficiency of some existing techniques. Our method is grounded in the theory of differential geometry on 3D space curve matching. We generate shape signatures for proteins that are invariant, localized, robust, compact, and biologically meaningful. To improve matching accuracy, we smooth the noisy raw atomic coordinate data with spline fitting. To improve matching efficiency, we adopt a hierarchical coarse-to-fine strategy. We use an efficient hashingbased technique to screen out unlikely candidates and perform detailed pairwise alignments only for a small number of candidates that survive the screening process. Contrary to other hashing based techniques, our technique employs domain specific information (not just geometric information) in constructing the hash key, and hence, is more tuned to the domain of biology. Furthermore, the invariancy, localization, and compactness of the shape signatures allow us to utilize a well-known local sequence alignment algorithm for aligning two protein structures. One measure of the efficacy of the proposed technique is that we were able to discover new, meaningful motifs that were not reported by other structure alignment methods. 1.
Self Generating Metaheuristics in Bioinformatics: The Proteins Structure Comparison Case
- Genetic Programming and Evolvable Machines
, 2004
"... In this paper we describe the application of a so called "Self-Generating" Memetic Algorithm to the Maximum Contact Map Overlap problem (MAX-CMO). The maximum overlap of contact maps is emerging as a leading modeling technique to obtain structural alignment among pairs of protein structures. Identif ..."
Abstract
-
Cited by 12 (5 self)
- Add to MetaCart
In this paper we describe the application of a so called "Self-Generating" Memetic Algorithm to the Maximum Contact Map Overlap problem (MAX-CMO). The maximum overlap of contact maps is emerging as a leading modeling technique to obtain structural alignment among pairs of protein structures. Identifying structural alignments (and hence similarity among proteins) is essential to the correct assessment of the relation between proteins structure and function. A robust methodology for structural comparison could have impact on the process of rational drug design.
FATCAT: a web server for flexible structure comparison and structure similarity searching
- Nucleic Acids Res
, 2004
"... Protein structure comparison, an important problem in structural biology, has two main applications: (i) comparing two protein structures in order to identify the similarities and differences between them, and (ii) searching for structures similar to a query structure. Many web-based resources for b ..."
Abstract
-
Cited by 12 (1 self)
- Add to MetaCart
Protein structure comparison, an important problem in structural biology, has two main applications: (i) comparing two protein structures in order to identify the similarities and differences between them, and (ii) searching for structures similar to a query structure. Many web-based resources for both applications are available, but all are based on rigid structural alignment algorithms. FATCAT server implements the recently developed flexible protein structure comparison algorithm FATCAT, which automatically identifies hinges and internal rearrangements in two protein structures. The server provides access to two algorithms: FATCAT-pairwise for pairwise flexible structure comparison and FATCAT-search for database searching for structurally similar proteins. Given two protein structures [in the Protein Data Bank (PDB) format], FATCAT-pairwise reports their structural alignment and the corresponding statistical significance of the similarity measured as a P-value. Users can view the superposition of the structures online in web browsers that support the Chime plug-in, or download the superimposed structures in PDB format. In FATCAT-search, users provide one query structure and the server returns a list of protein structures that are similar to the query, ordered by the P-values. In addition, FATCAT server can report the conformational changes of the query structure as compared to other proteins in the structure database. FATCAT server is available at
PSI: Indexing Protein Structures for Fast Similarity Search
- Bioinformatics
, 2003
"... Motivation: We consider the problem of finding similarities in protein structure databases. Current techniques sequentially compare the given query protein to all of the proteins in the database to find similarities. Therefore, the cost of similarity queries increases linearly as the volume of the p ..."
Abstract
-
Cited by 11 (4 self)
- Add to MetaCart
Motivation: We consider the problem of finding similarities in protein structure databases. Current techniques sequentially compare the given query protein to all of the proteins in the database to find similarities. Therefore, the cost of similarity queries increases linearly as the volume of the protein databases increase. As the sizes of experimentally determined and theoretically estimated protein structure databases grow, there is a need for scalable searching techniques.
PSIST: Indexing protein structures using suffix trees
- In IEEE Computational Systems Bioinformatics Conference
, 2005
"... Approaches for indexing proteins, and for fast and scalable searching for structures similar to a query structure have important applications such as protein structure and function prediction, protein classification and drug discovery. In this paper, we developed a new method for extracting the loca ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
Approaches for indexing proteins, and for fast and scalable searching for structures similar to a query structure have important applications such as protein structure and function prediction, protein classification and drug discovery. In this paper, we developed a new method for extracting the local feature vectors of protein structures. Each residue is represented by a triangle, and the correlation between a set of residues is described by the distances between C atoms and the angles between the normals of planes in which the triangles lie. The normalized local feature vectors are indexed using a suffix tree. For all query segments, suffix trees can be used effectively to retrieve the maximal matches, which are then chained to obtain alignments with database proteins. Similar proteins are selected by their alignment score against the query. Our results shows classification accuracy up to 97.8 % and 99.4 % at the superfamily and class level according to the SCOP classification, and shows that on average 7.49 out of 10 proteins from the same superfamily are obtained among the top 10 matches. These results are competitive with the best previous methods. 1.
Geometric Suffix Tree: A New Index Structure for Protein 3-D Structures
- Protein 3-D Structures, Combinatorial Pattern Matching 2006 (CPM 2006), LNCS 4009
, 2006
"... Abstract. Protein structure analysis is one of the most important research issues in the post-genomic era, and faster and more accurate query data structures for such 3-D structures are highly desired for research on proteins. This paper proposes a new data structure for indexing protein 3-D structu ..."
Abstract
-
Cited by 10 (3 self)
- Add to MetaCart
Abstract. Protein structure analysis is one of the most important research issues in the post-genomic era, and faster and more accurate query data structures for such 3-D structures are highly desired for research on proteins. This paper proposes a new data structure for indexing protein 3-D structures. For strings, there are many efficient indexing structures such as suffix trees, but it has been considered very difficult to design such sophisticated data structures against 3-D structures like proteins. Our index structure is based on the suffix trees and is called the geometric suffix tree. By using the geometric suffix tree for a set of protein structures, we can search for all of their substructures whose RMSDs (root mean square deviations) or URMSDs (unit-vector root mean square deviations) to a given query 3-D structure are not larger than a given bound. Though there are O(N 2) substructures, our data structure requires only O(N) space where N is the sum of lengths of the set of proteins. We propose an O(N 2) construction algorithm for it, while a naive algorithm would require O(N 3) time to construct it. Moreover we propose an efficient search algorithm. We also show computational experiments to demonstrate the practicality of our data structure. The experiments show that the construction time of the geometric suffix tree is practically almost linear to the size of the database, when applied to a protein structure database. 1

