Results 1 - 10
of
21
BAliBASE 3.0: latest developments of the multiple sequence alignment benchmark
- Proteins
, 2005
"... ABSTRACT Multiple sequence alignment is one of the cornerstones of modern molecular biology. It is used to identify conserved motifs, to determine protein domains, in 2D/3D structure prediction by homology and in evolutionary studies. Recently, high-throughput technologies such as genome sequencing ..."
Abstract
-
Cited by 48 (1 self)
- Add to MetaCart
ABSTRACT Multiple sequence alignment is one of the cornerstones of modern molecular biology. It is used to identify conserved motifs, to determine protein domains, in 2D/3D structure prediction by homology and in evolutionary studies. Recently, high-throughput technologies such as genome sequencing and structural proteomics have lead to an explosion in the amount of sequence and structure information available. In response, several new multiple alignment methods have been developed that improve both the efficiency and the quality of protein alignments. Consequently, the benchmarks used to evaluate and compare these methods must also evolve. We present here the latest release of the most widely used multiple alignment benchmark, BAliBASE, which provides high quality, manually refined, reference alignments based on 3D structural superpositions. Version 3.0 of BAliBASE includes new, more challenging test cases, representing the real problems encountered when aligning large sets of complex sequences. Using a novel, semiautomatic update protocol, the number of protein families in the benchmark has been increased and representative test cases are now available that cover most of the protein fold space. The total number of proteins in BAliBASE has also been significantly increased from 1444 to 6255 sequences. In addition, full-length sequences are now provided for all test cases, which represent difficult cases for both global and local alignment programs. Finally, the BAliBASE Web site
3DCoffee: Combining protein sequences and structures within multiple sequence alignments
- J Mol Biol
, 2004
"... It has long been assumed that using structural information can increase the accuracy of multiple protein sequence alignments (MSA). 1 Recent results 2,3 suggest that accurate MSAs obtained this ..."
Abstract
-
Cited by 29 (5 self)
- Add to MetaCart
It has long been assumed that using structural information can increase the accuracy of multiple protein sequence alignments (MSA). 1 Recent results 2,3 suggest that accurate MSAs obtained this
M-Coffee: combining multiple sequence alignment methods with T-Coffee
- Nucleic Acids Res
, 2006
"... We introduce M-Coffee, a meta-method for assembling multiple sequence alignments (MSA) by combining the output of several individual methods into one single MSA. M-Coffee is an extension of T-Coffee and uses consistency to estimate a consensus alignment. We show that the procedure is robust to varia ..."
Abstract
-
Cited by 10 (4 self)
- Add to MetaCart
We introduce M-Coffee, a meta-method for assembling multiple sequence alignments (MSA) by combining the output of several individual methods into one single MSA. M-Coffee is an extension of T-Coffee and uses consistency to estimate a consensus alignment. We show that the procedure is robust to variations in the choice of constituent methods and reasonably tolerant to duplicate MSAs. We also show that performances can be improved by carefully selecting the constituent methods. M-Coffee outperforms all the individual methods on three major reference datasets: HOMSTRAD, Prefab and Balibase. We also show that on a case-by-case basis, M-Coffee is twice as likely to deliver the best alignment than any individual method. Given a collection of pre-computed MSAs, M-Coffee has similar CPU requirements to the original T-Coffee. M-Coffee is a freeware open-source package available from
Kalign – an accurate and fast multiple sequence alignment algorithm
- BMC BIOINFORMATICS
, 2005
"... ..."
1 2 3 4 5 6 7 8
"... Motivation: Multiple sequence alignments (MSAs) are at the heart of bioinformatics analysis. Recently, a number of multiple protein sequence alignment benchmarks (i.e., BAliBASE, OXBench, PRE-FAB and SMART) have been released to evaluate new and existing MSA applications. These databases have been w ..."
Abstract
- Add to MetaCart
Motivation: Multiple sequence alignments (MSAs) are at the heart of bioinformatics analysis. Recently, a number of multiple protein sequence alignment benchmarks (i.e., BAliBASE, OXBench, PRE-FAB and SMART) have been released to evaluate new and existing MSA applications. These databases have been well received by researchers and help to quantitatively evaluate MSA programs on protein sequences. Unfortunately, analogous DNA benchmarks are not available, making evaluation of MSA programs difficult for DNA sequences. Results: This work presents the first known multiple DNA sequence alignment benchmarks that are 1) comprised of protein-coding portions of DNA 2) based on biological features such as the tertiary structure of encoded proteins. These reference DNA databases contain a total of 3,545 alignments, comprising of 68,581 sequences. Two versions of the database are available: mdsa 100s and mdsa all. The mdsa 100s version contains the alignments of the data sets that TBLASTN found 100 % sequence identity for each sequence. The mdsa all version includes all hits with an E-value score above the threshold of 0.001 A primary use of these databases is to benchmark the performance of MSA applications on DNA data sets. The first such case study is included in the supplementary material. Availability: The databases, further details and the supplementary material are publicly available at
unknown title
"... Motivation: Aligning multiple proteins based on sequence information alone is challenging if sequence identity is low or there is a significant degree of structural divergence. We present a novel algorithm (SATCHMO) that is designed to address this challenge. SATCHMO simultaneously constructs a tree ..."
Abstract
- Add to MetaCart
Motivation: Aligning multiple proteins based on sequence information alone is challenging if sequence identity is low or there is a significant degree of structural divergence. We present a novel algorithm (SATCHMO) that is designed to address this challenge. SATCHMO simultaneously constructs a tree and a set of multiple sequence alignments, one for each internal node of the tree. The alignment at a given node contains all sequences within its sub-tree, and predicts which positions in those sequences are alignable and which are not. Aligned regions therefore typically get shorter on a path from a leaf to the root as sequences diverge in structure. Current methods either regard all positions as alignable (e.g. ClustalW), or align only those positions believed to be homologous across all sequences (e.g. profile HMM methods); by contrast SATCHMO makes different predictions of alignable regions in different subgroups. SATCHMO generates profile hidden Markov models at each node; these are used to determine branching order, to align sequences and to predict structurally alignable regions. Results: In experiments on the BAliBASE benchmark alignment database, SATCHMO is shown to perform comparably to ClustalW and the UCSC SAM HMM software. Results using SATCHMO to identify protein domains are demonstrated on potassium channels, with implications for the mechanism by which tumor necrosis factor alpha affects potassium current. Availability: The software is available for download from
Simultaneous Sequence Alignment And Tree
- Bioinformatics
, 2003
"... Introduction In the words of David Jones, "There are really only three things that govern the overall accuracy of comparative modeling: alignment quality, alignment quality, and...alignment quality" [1]. Comparative modeling is not the only application for which alignment quality is critical: multi ..."
Abstract
- Add to MetaCart
Introduction In the words of David Jones, "There are really only three things that govern the overall accuracy of comparative modeling: alignment quality, alignment quality, and...alignment quality" [1]. Comparative modeling is not the only application for which alignment quality is critical: multiple sequence alignments are used for profile construction, detection of critical residues, prediction of functional subfamilies, and a host of other tasks. Because of its central importance, the construction of multiple sequence alignments is a focus of the computational biology community. When sequences are similar to each other, virtually any alignment method will produce good results. However, evolutionary divergence in multi-gene families can result in family members with pairwise similarity so low as to be indistinguishable from chance. Even when sequence similarity is detectable, local changes in structure between members can be significant and represent a great challenge. Methods for
APDB: a novel measure for benchmarking sequence alignment methods without reference alignments
- Bioinformatics
, 2003
"... Introduction APDB is a novel measure for evaluating the quality of a protein sequence alignment, given two or more PDB structures. We show how it is possible to avoid the use of reference alignments when PDB structures are available for at least two homologous sequences in a test alignment. Using t ..."
Abstract
- Add to MetaCart
Introduction APDB is a novel measure for evaluating the quality of a protein sequence alignment, given two or more PDB structures. We show how it is possible to avoid the use of reference alignments when PDB structures are available for at least two homologous sequences in a test alignment. Using this method it should become possible to systematically benchmark or train multiple sequence alignment methods using all known structures, in a completely automatic manner. Benchmarking is usually accomplished by comparing test alignments to a set of reference alignments of the same sequences assembled by specialists with the help of structural information. Two such set of reference alignments, HOMSTRAD [1] and BAliBASE [2], were investigated in this study. One of the simplest ways of using reference alignments in benchmarking is to count the percentage of columns in the test alignment that are correctly aligned according to the reference alignment (column score) [3]. Although simple and conv
BMC Bioinformatics BioMed Central Methodology article
, 2003
"... OXBench: A benchmark for evaluation of protein multiple sequence alignment accuracy ..."
Abstract
- Add to MetaCart
OXBench: A benchmark for evaluation of protein multiple sequence alignment accuracy

