Results 1 -
3 of
3
BAliBASE 3.0: latest developments of the multiple sequence alignment benchmark
- Proteins
, 2005
"... ABSTRACT Multiple sequence alignment is one of the cornerstones of modern molecular biology. It is used to identify conserved motifs, to determine protein domains, in 2D/3D structure prediction by homology and in evolutionary studies. Recently, high-throughput technologies such as genome sequencing ..."
Abstract
-
Cited by 48 (1 self)
- Add to MetaCart
ABSTRACT Multiple sequence alignment is one of the cornerstones of modern molecular biology. It is used to identify conserved motifs, to determine protein domains, in 2D/3D structure prediction by homology and in evolutionary studies. Recently, high-throughput technologies such as genome sequencing and structural proteomics have lead to an explosion in the amount of sequence and structure information available. In response, several new multiple alignment methods have been developed that improve both the efficiency and the quality of protein alignments. Consequently, the benchmarks used to evaluate and compare these methods must also evolve. We present here the latest release of the most widely used multiple alignment benchmark, BAliBASE, which provides high quality, manually refined, reference alignments based on 3D structural superpositions. Version 3.0 of BAliBASE includes new, more challenging test cases, representing the real problems encountered when aligning large sets of complex sequences. Using a novel, semiautomatic update protocol, the number of protein families in the benchmark has been increased and representative test cases are now available that cover most of the protein fold space. The total number of proteins in BAliBASE has also been significantly increased from 1444 to 6255 sequences. In addition, full-length sequences are now provided for all test cases, which represent difficult cases for both global and local alignment programs. Finally, the BAliBASE Web site
An Alignment Confidence Score Capturing Robustness to Guide Tree Uncertainty
"... Multiple sequence alignment (MSA) is the basis for a wide range of comparative sequence analyses from molecular phylogenetics to 3D structure prediction. Sophisticated algorithms have been developed for sequence alignment, but in practice, many errors can be expected and extensive portions of the MS ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Multiple sequence alignment (MSA) is the basis for a wide range of comparative sequence analyses from molecular phylogenetics to 3D structure prediction. Sophisticated algorithms have been developed for sequence alignment, but in practice, many errors can be expected and extensive portions of the MSA are unreliable. Hence, it is imperative to understand and characterize the various sources of errors in MSAs and to quantify site-specific alignment confidence. In this paper, we show that uncertainties in the guide tree used by progressive alignment methods are a major source of alignment uncertainty. We use this insight to develop a novel method for quantifying the robustness of each alignment column to guide tree uncertainty. We build on the widely used bootstrap method for perturbing the phylogenetic tree. Specifically, we generate a collection of trees and use each as a guide tree in the alignment algorithm, thus producing a set of MSAs. We next test the consistency of every column of the MSA obtained from the unperturbed guide tree with respect to the set of MSAs. We name this measure the ‘‘GUIDe tree based AligNment ConfidencE’ ’ (GUIDANCE) score. Using the Benchmark Alignment data BASE benchmark as well as simulation studies, we show that GUIDANCE scores accurately identify errors in MSAs. Additionally, we compare our results with the previously published Heads-or-Tails score and show that the GUIDANCE score is a better predictor of unreliably aligned regions. Research article Key words: multiple sequence alignment, guide tree, phylogeny, bootstrap, alignment confidence.
4 Swarm Intelligence Algorithms in Bioinformatics
"... Summary. Research in bioinformatics necessitates the use of advanced computing tools for processing huge amounts of ambiguous and uncertain biological data. Swarm Intelligence (SI) has recently emerged as a family of nature inspired algorithms, especially known for their ability to produce low cost, ..."
Abstract
- Add to MetaCart
Summary. Research in bioinformatics necessitates the use of advanced computing tools for processing huge amounts of ambiguous and uncertain biological data. Swarm Intelligence (SI) has recently emerged as a family of nature inspired algorithms, especially known for their ability to produce low cost, fast and reasonably accurate solutions to complex search problems. In this chapter, we explore the role of SI algorithms in certain bioinformatics tasks like microarray data clustering, multiple sequence alignment, protein structure prediction and molecular docking. The chapter begins with an overview of the basic concepts of bioinformatics along with their biological basis. It also gives an introduction to swarm intelligence with special emphasis on two specific SI algorithms well-known as Particle Swarm Optimization (PSO) and Ant Colony Systems (ACS). It then provides a detailed survey of the state of the art research centered around the applications of SI algorithms in bioinformatics. The chapter concludes with a discussion on how SI algorithms can be used for solving a few open ended problems in bioinformatics. The past few decades have seen a massive growth in biological information gathered

