Results 1 - 10
of
42
PROBCONS: Probabilistic consistency-based multiple sequence alignment
- Genome Res
, 2005
"... To study gene evolution across a wide range of organisms, biologists need accurate tools for multiple sequence alignment of protein families. Obtaining accurate alignments, however, is a difficult computational problem because of not only the high computational cost but also the lack of proper objec ..."
Abstract
-
Cited by 84 (5 self)
- Add to MetaCart
To study gene evolution across a wide range of organisms, biologists need accurate tools for multiple sequence alignment of protein families. Obtaining accurate alignments, however, is a difficult computational problem because of not only the high computational cost but also the lack of proper objective functions for measuring alignment quality. In this paper, we introduce prob-abilistic consistency, a novel scoring function for multiple sequence comparisons. We present PROBCONS, a practical tool for progressive protein multiple sequence alignment based on prob-abilistic consistency, and evaluate its performance on several standard alignment benchmark datasets. On the BAliBASE, SABmark, and PREFAB benchmark alignment databases, PROB-CONS achieves statistically significant improvement over other leading methods while maintain-ing practical speed. PROBCONS is publicly available as a web resource. Source code and execu-tables are available under the GNU Public License at
An Indexed Bibliography of Genetic Algorithms in Power Engineering
, 1995
"... s: Jan. 1992 -- Dec. 1994 ffl CTI: Current Technology Index Jan./Feb. 1993 -- Jan./Feb. 1994 ffl DAI: Dissertation Abstracts International: Vol. 53 No. 1 -- Vol. 55 No. 4 (1994) ffl EEA: Electrical & Electronics Abstracts: Jan. 1991 -- Dec. 1994 ffl P: Index to Scientific & Technical Proceedings: Ja ..."
Abstract
-
Cited by 67 (8 self)
- Add to MetaCart
s: Jan. 1992 -- Dec. 1994 ffl CTI: Current Technology Index Jan./Feb. 1993 -- Jan./Feb. 1994 ffl DAI: Dissertation Abstracts International: Vol. 53 No. 1 -- Vol. 55 No. 4 (1994) ffl EEA: Electrical & Electronics Abstracts: Jan. 1991 -- Dec. 1994 ffl P: Index to Scientific & Technical Proceedings: Jan. 1986 -- Feb. 1995 (except Nov. 1994) ffl EI A: The Engineering Index Annual: 1987 -- 1992 ffl EI M: The Engineering Index Monthly: Jan. 1993 -- Dec. 1994 The following GA researchers have already kindly supplied their complete autobibliographies and/or proofread references to their papers: Dan Adler, Patrick Argos, Jarmo T. Alander, James E. Baker, Wolfgang Banzhaf, Ralf Bruns, I. L. Bukatova, Thomas Back, Yuval Davidor, Dipankar Dasgupta, Marco Dorigo, Bogdan Filipic, Terence C. Fogarty, David B. Fogel, Toshio Fukuda, Hugo de Garis, Robert C. Glen, David E. Goldberg, Martina Gorges-Schleuter, Jeffrey Horn, Aristides T. Hatjimihail, Mark J. Jakiela, Richard S. Judson, Akihiko Konaga...
BAliBASE 3.0: latest developments of the multiple sequence alignment benchmark
- Proteins
, 2005
"... ABSTRACT Multiple sequence alignment is one of the cornerstones of modern molecular biology. It is used to identify conserved motifs, to determine protein domains, in 2D/3D structure prediction by homology and in evolutionary studies. Recently, high-throughput technologies such as genome sequencing ..."
Abstract
-
Cited by 48 (1 self)
- Add to MetaCart
ABSTRACT Multiple sequence alignment is one of the cornerstones of modern molecular biology. It is used to identify conserved motifs, to determine protein domains, in 2D/3D structure prediction by homology and in evolutionary studies. Recently, high-throughput technologies such as genome sequencing and structural proteomics have lead to an explosion in the amount of sequence and structure information available. In response, several new multiple alignment methods have been developed that improve both the efficiency and the quality of protein alignments. Consequently, the benchmarks used to evaluate and compare these methods must also evolve. We present here the latest release of the most widely used multiple alignment benchmark, BAliBASE, which provides high quality, manually refined, reference alignments based on 3D structural superpositions. Version 3.0 of BAliBASE includes new, more challenging test cases, representing the real problems encountered when aligning large sets of complex sequences. Using a novel, semiautomatic update protocol, the number of protein families in the benchmark has been increased and representative test cases are now available that cover most of the protein fold space. The total number of proteins in BAliBASE has also been significantly increased from 1444 to 6255 sequences. In addition, full-length sequences are now provided for all test cases, which represent difficult cases for both global and local alignment programs. Finally, the BAliBASE Web site
Opportunities for Combinatorial Optimization In Computational Biology
, 2003
"... This is a survey designed for mathematical programming people who do not know molecular biology and want to learn the kinds of combinatorial optimization problems that arise. After a brief introduction to the biology, we present optimization models pertaining to sequencing, evolutionary explanations ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
This is a survey designed for mathematical programming people who do not know molecular biology and want to learn the kinds of combinatorial optimization problems that arise. After a brief introduction to the biology, we present optimization models pertaining to sequencing, evolutionary explanations, structure prediction and recognition. Additional biology is given in the context of the problems, including some motivation for disease diagnosis and drug discovery. Open problems are cited with an extensive bibliography, and we o er a guide to getting started in this exciting frontier.
Multiple sequence alignment using evolutionary programming
- Proceedings of the First Congress of Evolutionary Computation (CEC-1999
, 1999
"... Abstract- Multiple sequence alignment can be used as a tool for the identification of common structure in an ordered string of nucleotides (in DNA or RNA) or amino acids (in proteins). Current multiple sequence alignment algorithms work well for sequences with high similarity but do not scale well w ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
Abstract- Multiple sequence alignment can be used as a tool for the identification of common structure in an ordered string of nucleotides (in DNA or RNA) or amino acids (in proteins). Current multiple sequence alignment algorithms work well for sequences with high similarity but do not scale well when either the length or number of the sequences is large or if the similarity is low. The focus of this paper is to develop an evolutionary programming (EP) algorithm for multiple sequence alignment. An EP method with representation specific variation operators is proposed and tested on several data sets. Comparisons to other algorithms suggests that this algorithm is well-suited to the multiple sequence alignment problem. 1.
M-Coffee: combining multiple sequence alignment methods with T-Coffee
- Nucleic Acids Res
, 2006
"... We introduce M-Coffee, a meta-method for assembling multiple sequence alignments (MSA) by combining the output of several individual methods into one single MSA. M-Coffee is an extension of T-Coffee and uses consistency to estimate a consensus alignment. We show that the procedure is robust to varia ..."
Abstract
-
Cited by 10 (4 self)
- Add to MetaCart
We introduce M-Coffee, a meta-method for assembling multiple sequence alignments (MSA) by combining the output of several individual methods into one single MSA. M-Coffee is an extension of T-Coffee and uses consistency to estimate a consensus alignment. We show that the procedure is robust to variations in the choice of constituent methods and reasonably tolerant to duplicate MSAs. We also show that performances can be improved by carefully selecting the constituent methods. M-Coffee outperforms all the individual methods on three major reference datasets: HOMSTRAD, Prefab and Balibase. We also show that on a case-by-case basis, M-Coffee is twice as likely to deliver the best alignment than any individual method. Given a collection of pre-computed MSAs, M-Coffee has similar CPU requirements to the original T-Coffee. M-Coffee is a freeware open-source package available from
An eulerian path approach to global multiple alignment for DNA sequences
- J. Comput. Biol
, 2003
"... With the rapid increase in the dataset of genome sequences, the multiple sequence alignment problem is increasingly important and frequently involves the alignment of a large number of sequences. Many heuristic algorithms have been proposed to improve the speed of computation and the quality of alig ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
With the rapid increase in the dataset of genome sequences, the multiple sequence alignment problem is increasingly important and frequently involves the alignment of a large number of sequences. Many heuristic algorithms have been proposed to improve the speed of computation and the quality of alignment. We introduce a novel approach that is fundamentally different from all currently available methods. Our motivation comes from the Eulerian method for fragment assembly in DNA sequencing that transforms all DNA fragments into a de Bruijn graph and then reduces sequence assembly to a Eulerian path problem. The paper focuses on global multiple alignment of DNA sequences, where entire sequences are aligned into one con � guration. Our main result is an algorithm with almost linear computational speed with respect to the total size (number of letters) of sequences to be aligned. Five hundred simulated sequences (averaging 500 bases per sequence and as low as 70% pairwise identity) have been aligned within three minutes on a personal computer, and the quality of alignment is satisfactory. As a result, accurate and simultaneous alignment of thousands of long sequences within a reasonable amount of time becomes possible. Data from an Arabidopsis sequencing project is used to demonstrate the performance. Key words: multiple sequence alignment, de Bruijn graph, Eulerian path. 1.
Multiple sequence alignment
- Protein Structure Prediction — Methods and Protocols
, 2000
"... Multiple sequence alignment is a central problem in Bioinformatics and a challenging one for optimisation algorithms. An established integer programming approach is to apply branch-and-cut to a graph-theoretical model. The models are exponentially large but are represented intensionally, and violate ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
Multiple sequence alignment is a central problem in Bioinformatics and a challenging one for optimisation algorithms. An established integer programming approach is to apply branch-and-cut to a graph-theoretical model. The models are exponentially large but are represented intensionally, and violated constraints can be located in polynomial time. This report describes a new integer program formulation that generates polynomial-sized models small enough to be passed to generic solvers. It is a hybrid formulation relating the sparse alignment graph with a compact encoding of the alignment matrix via channelling constraints. Alignments obtained with a pseudo-Boolean local search algorithm are competitive with those of state-of-the-art algorithms. Execution times are much longer, but in future work we aim to develop a more efficient specialised algorithm. 1
A SAT-Based Approach to Multiple Sequence Alignment
- Poster, Ninth International Conference on Principles and Practice of Constraint Programming
, 2003
"... Abstract. Multiple sequence alignment is a central problem in Bioinformatics. A known integer programming approach is to apply branch-and-cut to exponentially large graph-theoretic models. This paper describes a new integer program formulation that generates models small enough to be passed to gener ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
Abstract. Multiple sequence alignment is a central problem in Bioinformatics. A known integer programming approach is to apply branch-and-cut to exponentially large graph-theoretic models. This paper describes a new integer program formulation that generates models small enough to be passed to generic solvers. The formulation is a hybrid relating the sparse alignment graph with a compact encoding of the alignment matrix via channelling constraints. Alignments obtained with a SAT-based local search algorithm are competitive with those of state-of-the-art algorithms, though execution times are much longer. 1
A New Approach for Alignment of multiple proteins
- Pacific Symposium on Biocomputing
, 2006
"... We introduce a new graph-based multiple sequence alignment method for protein sequences. We name our method HSA (Horizontal Sequence Alignment) for it horizontally slides a window on the protein sequences simultaneously. Current progressive alignment tools build up final alignment by adding sequence ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
We introduce a new graph-based multiple sequence alignment method for protein sequences. We name our method HSA (Horizontal Sequence Alignment) for it horizontally slides a window on the protein sequences simultaneously. Current progressive alignment tools build up final alignment by adding sequences one by one to existing alignment. Thus, they have the shortcoming of order-dependent alignment. In contrast, HSA considers all the proteins at once. It obtains final alignment by concatenating cliques of graph. In order to find a biologically relevant alignment, HSA takes secondary structure information as well as amino acid sequences into account. The experimental results show that HSA achieves higher accuracy compared to existing tools on BAliBASE benchmarks. The improvement is more significant for proteins with low similarity. 1.

