Results 1  10
of
51
A LinearTime Algorithm for Computing Inversion Distance between Signed Permutations with an Experimental Study
 Journal of Computational Biology
, 2001
"... Hannenhalli and Pevzner gave the first polynomialtime algorithm for computing the inversion distance between two signed permutations, as part of the larger task of determining the shortest sequence of inversions needed to transform one permutation into the other. Their algorithm (restricted to dist ..."
Abstract

Cited by 116 (14 self)
 Add to MetaCart
Hannenhalli and Pevzner gave the first polynomialtime algorithm for computing the inversion distance between two signed permutations, as part of the larger task of determining the shortest sequence of inversions needed to transform one permutation into the other. Their algorithm (restricted to distance calculation) proceeds in two stages: in the first stage, the overlap graph induced by the permutation is decomposed into connected components; then, in the second stage, certain graph structures (hurdles and others) are identified. Berman and Hannenhalli avoided the explicit computation of the overlap graph and gave an O(n alpha(n)) algorithm, based on a UnionFind structure, to find its connected components, where a is the inverse Ackerman function. Since for all practical purposes alpha(n) is a constant no larger than four, this algorithm has been the fastest practical algorithm to date. In this paper, we present a new lineartime algorithm for computing the connected components, which is more efficient than that of Berman and Hannenhalli in both theory and practice. Our algorithm uses only a stack and is very easy to implement. We give the results of computational experiments over a large range of permutation pairs produced through simulated evolution; our experiments show a speedup by a factor of 2 to 5 in the computation of the connected components and by a factor of 1.3 to 2 in the overall distance computation.
Steps Toward Accurate Reconstructions of Phylogenies from GeneOrder Data
 J. COMPUT. SYST. SCI
, 2002
"... ..."
Scaling up accurate phylogenetic reconstruction from geneorder data
, 2002
"... Motivation: Phylogenetic reconstruction from geneorder data has attracted increasing attention from both biologists and computer scientists over the last few years. Methods used in reconstruction include distancebased methods (such as neighborjoining), parsimony methods using sequencebased encod ..."
Abstract

Cited by 31 (13 self)
 Add to MetaCart
Motivation: Phylogenetic reconstruction from geneorder data has attracted increasing attention from both biologists and computer scientists over the last few years. Methods used in reconstruction include distancebased methods (such as neighborjoining), parsimony methods using sequencebased encodings, Bayesian approaches, and direct optimization. The latter, pioneered by Sankoff and extended by us with the software suite GRAPPA, is the most accurate approach, but cannot handle more than about 15 genomes of limited size (e.g., organelles). Results: We report here on our successful efforts to scale up direct optimization through a twostep approach: the first step decomposes the dataset into smaller pieces and runs the direct optimization (GRAPPA) on the smaller pieces, while the second step builds a tree from the results obtained on the smaller pieces. We used the sophisticated diskcovering method (DCM) pioneered by Warnow and her group, suitably modified to take into account the computational limitations of GRAPPA. We find that DCMGRAPPA scales gracefully to at least 1,000 genomes of a few hundred genes each and retains surprisingly high accuracy throughout the range: in our experiments, the topological error rate rarely exceeded a few percent. Thus, reconstruction based on geneorder data can now be accomplished with high accuracy on datasets of significant size. Availability: All of our software is available in source form under GPL at www.compbio.unm.edu Contact:
Industrial Applications of HighPerformance Computing for Phylogeny Reconstruction
, 2001
"... Phylogenies (that is, treeoflife relationships) derived from gene order data may prove crucial in answering some fundamental open questions in biomolecular evolution. Realworld interest is strong in determining these relationships. For example, pharmaceutical companies may use phylogeny reconstru ..."
Abstract

Cited by 31 (4 self)
 Add to MetaCart
Phylogenies (that is, treeoflife relationships) derived from gene order data may prove crucial in answering some fundamental open questions in biomolecular evolution. Realworld interest is strong in determining these relationships. For example, pharmaceutical companies may use phylogeny reconstruction in drug discovery for finding plants with similar gene production. Health organizations study the evolution and spread of viruses such as HIV to gain understanding of future outbreaks. And governments are interested in aiding the production of foodstuffs like rice, wheat, and corn, by understanding the genetic code. Yet very few techniques are available for such phylogenetic reconstructions. Appropriate tools for analyzing such data may help resolve some difficult phylogenetic reconstruction problems; indeed, this new source of data has been embraced by many biologists in their phylogenetic work. With the rapid accumulation of whole genome sequences for a wide diversity of taxa, phylogenetic reconstruction based on changes in gene order and gene content is showing promise, particularly for resolving deep (i.e., old) branches. However, reconstruction from geneorder data is even more computationally intensive than reconstruction from sequence data, particularly in groups with large numbers of genes and highly rearranged genomes. We have developed a software suite, GRAPPA, that extends the breakpoint analysis (BPAnalysis) method of Sankoff and Blanchette while running much faster: in a recent analysis of a collection of chloroplast data for species of Campanulaceae on a 512processor Linux supercluster with Myrinet, we achieved a onemillionfold speedup over BPAnalysis. GRAPPA currently can use either breakpoint or inversion distance (computed exactly) for its computati...
Approximating the true evolutionary distance between two genomes
 in Proc. 7th SIAM Workshop on Algorithm Engineering & Experiments (ALENEX’05), 121 (SIAM
, 2005
"... As more and more genomes are sequenced, evolutionary biologists are becoming increasingly interested in evolution at the level of whole genomes, in scenarios in which the genome evolves through insertions, duplications, deletions, and movements of genes along its chromosomes. In the mathematical mod ..."
Abstract

Cited by 31 (7 self)
 Add to MetaCart
As more and more genomes are sequenced, evolutionary biologists are becoming increasingly interested in evolution at the level of whole genomes, in scenarios in which the genome evolves through insertions, duplications, deletions, and movements of genes along its chromosomes. In the mathematical model pioneered by Sankoff and others, a unichromosomal genome is represented by a signed permutation of a multiset of genes; Hannenhalli and Pevzner showed that the edit distance between two signed permutations of the same set can be computed in polynomial time when all operations are inversions. ElMabrouk extended that result to allow deletions and a limited form of insertions (which forbids duplications); in turn we extended it to compute a nearly optimal edit sequence between an arbitrary genome and the identity permutation. In this paper we generalize our approach to compute distances between two arbitrary genomes, but focus on approximating the true evolutionary distance rather than the edit distance. We present experimental results showing that our algorithm produces excellent estimates of the true evolutionary distance up to a (high) threshold of saturation; indeed, the distances thus produced are good enough to enable the simple
Inversion medians outperform breakpoint medians in phylogeny reconstruction from geneorder data
, 2002
"... ..."
Phylogenetic relationships of agaric fungi based on large ribosomal subunit DNA sequences
, 1999
"... Abstract.—Phylogenetic relationships of mushrooms and their relatives within the order Agaricales were addressed by using nuclear large subunit ribosomal DNA sequences. Approximately 900 bases of the 59 end of the nucleusencoded large subunit RNA gene were sequenced for 154 selected taxa representi ..."
Abstract

Cited by 28 (4 self)
 Add to MetaCart
Abstract.—Phylogenetic relationships of mushrooms and their relatives within the order Agaricales were addressed by using nuclear large subunit ribosomal DNA sequences. Approximately 900 bases of the 59 end of the nucleusencoded large subunit RNA gene were sequenced for 154 selected taxa representing most families within the Agaricales. Several phylogenetic methods were used, including weighted and equally weighted parsimony (MP), maximum likelihood (ML), and distance methods (NJ). The starting tree for branch swapping in the ML analyses was the tree with the highest ML score among previously produced MP and NJ trees. A high degree of consensus was observed between phylogenetic estimates obtained through MP and ML. NJ trees differed according to the distance model that was used; however, all NJ trees still supported most of the same terminal groupings as the MP and ML trees did. NJ trees were always significantly suboptimal when evaluated against the best MP and ML trees, by both parsimony and likelihood tests. Our analyses suggest that weighted MP and ML provide the best estimates of Agaricales phylogeny. Similar support was observed between bootstrapping and jackknifing methods for evaluation of tree robustness. Phylogenetic analyses revealed many groups of agaricoid fungi that are supported by moderate to high bootstrap or jackknife values or are consistent with morphologybased classification schemes. Analyses also support separate placement of the boletes and russules, which are basal to the main core group of gilled mushrooms (the Agaricineae of Singer). Examples of monophyletic groups include the families Amanitaceae, Coprinaceae (excluding Coprinus comatus and subfamily Panaeolideae), Agaricaceae (excluding the Cystodermateae), and Strophariaceae pro parte (Stropharia, Pholiota, and Hypholoma); the mycorrhizal species of Tricholoma (including Leucopaxillus, also mycorrhizal);
Genomic Distances under Deletions and Insertions
 THEORETICAL COMPUTER SCIENCE
, 2003
"... As more and more genomes are sequenced, evolutionary biologists are becoming increasingly interested in evolution at the level of whole genomes, in scenarios in which the genome evolves through insertions, deletions, and movements of genes along its chromosomes. In the mathematical model pioneere ..."
Abstract

Cited by 28 (6 self)
 Add to MetaCart
As more and more genomes are sequenced, evolutionary biologists are becoming increasingly interested in evolution at the level of whole genomes, in scenarios in which the genome evolves through insertions, deletions, and movements of genes along its chromosomes. In the mathematical model pioneered by Sankoff and others, a unichromosomal genome is represented by a signed permutation of a multiset of genes; Hannenhalli and Pevzner showed that the edit distance between two signed permutations of the same set can be computed in polynomial time when all operations are inversions. ElMabrouk extended that result to allow deletions and a limited form of insertions (which forbids duplications). In this paper we extend ElMabrouk's work to handle duplications as well as insertions and present an alternate framework for computing (near) minimal edit sequences involving insertions, deletions, and inversions. We derive an error bound for our polynomialtime distance computation under various assumptions and present preliminary experimental results that suggest that performance in practice may be excellent, within a few percent of the actual distance.
HighPerformance Algorithm Engineering for Computational Phylogenetics
 J. Supercomputing
, 2002
"... A phylogeny is the evolutionary history of a group of organisms; systematists (and other biologists) attempt to reconstruct this history from various forms of data about contemporary organisms. Phylogeny reconstruction is a crucial step in the understanding of evolution as well as an important tool ..."
Abstract

Cited by 24 (7 self)
 Add to MetaCart
A phylogeny is the evolutionary history of a group of organisms; systematists (and other biologists) attempt to reconstruct this history from various forms of data about contemporary organisms. Phylogeny reconstruction is a crucial step in the understanding of evolution as well as an important tool in biological, pharmaceutical, and medical research. Phylogeny reconstruction from molecular data is very difficult: almost all optimization models give rise to NPhard (and thus computationally intractable) problems. Yet approximations must be of very high quality in order to avoid outright biological nonsense. Thus many biologists have been willing to run farms of processors for many months in order to analyze just one dataset. Highperformance algorithm engineering offers a battery of tools that can reduce, sometimes spectacularly, the running time of existing phylogenetic algorithms, as well as help designers produce better algorithms. We present an overview of algorithm engineering techniques, illustrating them with an application to the "breakpoint analysis" method of Sankoff et al., which resulted in the GRAPPA software suite. GRAPPA demonstrated a speedup in running time by over eight orders of magnitude over the original implementation on a variety of real and simulated datasets. We show how these algorithmic engineering techniques are directly applicable to a large variety of challenging combinatorial problems in computational biology.