Results 1  10
of
21
RecIDCM3: A fast algorithmic technique for reconstructing large phylogenetic trees
 In Proc. IEEE Computer Society Bioinformatics Conference (CSB 2004
, 2004
"... ..."
(Show Context)
Network (reticulate) evolution: biology, models, and algorithms
 In The Ninth Pacific Symposium on Biocomputing (PSB
, 2004
"... ..."
(Show Context)
P.M.B.: A New Quartet Tree Heuristic for Hierarchical Clustering arXiv:cs/0606048
, 2006
"... We consider the problem of constructing an an optimalweight tree from the 3 () n weighted quartet 4 topologies on n objects, where optimality means that the summed weight of the embedded quartet topologies is optimal (so it can be the case that the optimal tree embeds all quartets as nonoptimal to ..."
Abstract

Cited by 13 (3 self)
 Add to MetaCart
(Show Context)
We consider the problem of constructing an an optimalweight tree from the 3 () n weighted quartet 4 topologies on n objects, where optimality means that the summed weight of the embedded quartet topologies is optimal (so it can be the case that the optimal tree embeds all quartets as nonoptimal topologies). We present a heuristic for reconstructing the optimalweight tree, and a canonical manner to derive the quartettopology weights from a given distance matrix. The method repeatedly transforms a bifurcating tree, with all objects involved as leaves, achieving a monotonic approximation to the exact single globally optimal tree. This contrasts to other heuristic search methods from biological phylogeny, like DNAML or quartet puzzling, which, repeatedly, incrementally construct a solution from a random order of objects, and subsequently add agreement values. We do not assume that there exists a true bifurcating supertree that embeds each quartet in the optimal topology, or represents the distance matrix faithfully—not even under the assumption that the weights or distances are corrupted by a measuring process. Our aim is to hierarchically cluster the input data as faithfully as possible, both phylogenetic data and data of completely different types. In our experiments with natural data, like genomic data, texts or music, the global optimum appears to be reached. Our method is capable of handling over 100 objects, possibly up to 1000 objects, while no existing quartet heuristic can computionally approximate the exact optimal solution of a quartet tree of more than about 20–30 objects without running for years. The method is implemented and available as public software. 1
Computational grand challenges in assembling the tree of life: Problems & solutions
 THE IEEE AND ACM SUPERCOMPUTING CONFERENCE 2005 (SC2005) TUTORIAL
, 2005
"... The computation of ever larger as well as more accurate phylogenetic (evolutionary) trees with the ultimate goal to compute the tree of life represents one of the grand challenges in High Performance Computing (HPC) Bioinformatics. Unfortunately, the size of trees which can be computed in reasonable ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
(Show Context)
The computation of ever larger as well as more accurate phylogenetic (evolutionary) trees with the ultimate goal to compute the tree of life represents one of the grand challenges in High Performance Computing (HPC) Bioinformatics. Unfortunately, the size of trees which can be computed in reasonable time based on elaborate evolutionary models is limited by the severe computational cost inherent to these methods. There exist two orthogonal research directions to overcome this challenging computational burden: First, the development of novel, faster, and more accurate heuristic algorithms and second, the application of high performance computing techniques. The goal of this chapter is to provide a comprehensive introduction to the field of computational evolutionary biology to an audience with computing background, interested in participating in research and/or commercial applications of this field. Moreover, we will cover leadingedge technical and algorithmic developments in the field and discuss open problems and potential solutions.
PRecIDCM3: a parallel framework for fast and accurate large scale phylogeny reconstruction
 International Journal on Bioinformatics Research and Applications (IJBRA
, 2005
"... Accurate reconstruction of phylogenetic trees very often involves solving hard optimization problems, particularly the maximum parsimony (MP) and maximum likelihood (ML) problems. Various heuristics have been devised for solving these two problems; however, they obtain good results within reasonable ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
Accurate reconstruction of phylogenetic trees very often involves solving hard optimization problems, particularly the maximum parsimony (MP) and maximum likelihood (ML) problems. Various heuristics have been devised for solving these two problems; however, they obtain good results within reasonable time only on small datasets. This has been a major impediment for largescale phylogeny reconstruction, particularly for the effort to assemble the Tree of Life—the evolutionary relationship of all organisms on earth. Roshan et al. recently introduced RecIDCM3, an efficient and accurate metamethod for solving the MP problem on large datasets of up to 14,000 taxa. Nonetheless, a drastic improvement in RecIDCM3’s performance is still needed in order to achieve similar (or better) accuracy on datasets at the scale of the Tree of Life. In this paper, we improve the performance of RecIDCM3 via parallelization. Experimental results demonstrate that our parallel
Trees versus characters and the supertree/supermatrix “paradox.” Syst. Biol
"... In a pair of recent articles, Gatesy and colleagues ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
(Show Context)
In a pair of recent articles, Gatesy and colleagues
Quartet methods for phylogeny reconstruction from gene orders
 Dept. CS and Engin., Univ. SouthCarolina
, 2005
"... Abstract. Phylogenetic reconstruction from generearrangement data has attracted increasing attention from biologists and computer scientists. Methods used in reconstruction include distancebased methods, parsimony methods using sequencebased encodings, and direct optimization. The latter, pioneer ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
(Show Context)
Abstract. Phylogenetic reconstruction from generearrangement data has attracted increasing attention from biologists and computer scientists. Methods used in reconstruction include distancebased methods, parsimony methods using sequencebased encodings, and direct optimization. The latter, pioneered by Sankoff and extended by us with the software suite GRAPPA, is the most accurate approach; however, its exhaustive approach means that it can be applied only to small datasets of fewer than 15 taxa. While we have successfully scaled it up to 1,000 genomes by integrating it with a diskcovering method (DCMGRAPPA), the recursive decomposition may need many levels of recursion to handle datasets with 1,000 or more genomes. We thus investigated quartetbased approaches, which directly decompose the datasets into subsets of four taxa each; such approaches have been well studied for sequence data, but not for generearrangement data. We give an optimization algorithm for the NPhard problem of computing optimal trees for each quartet, present a variation of the dyadic method (using heuristics to choose suitable short quartets), and use both in simulation studies. We find that our quartetbased method can handle more genomes than the base version of GRAPPA, thus enabling us to reduce the number of levels of recursion in DCMGRAPPA, but is more sensitive to the rate of evolution, with error rates rapidly increasing when saturation is approached. 1
Clustering
, 2009
"... The problem is to construct an optimal weight tree from the 3 () n 4 weighted quartet topologies on n objects, where optimality means that the summed weight of the embedded quartet topologies is optimal (so it can be the case that the optimal tree embeds all quartets as nonoptimal topologies). We pr ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
The problem is to construct an optimal weight tree from the 3 () n 4 weighted quartet topologies on n objects, where optimality means that the summed weight of the embedded quartet topologies is optimal (so it can be the case that the optimal tree embeds all quartets as nonoptimal topologies). We present a Monte Carlo heuristic, based on randomized hill climbing, for approximating the optimal weight tree, given the quartet topology weights. The method repeatedly transforms a bifurcating tree, with all objects involved as leaves, achieving a monotonic approximation to the exact single globally optimal tree. The method has been extensively used for general hierarchical clustering of nontreelike (nonphylogeny) data in various domains and across domains with heterogenous data, and is implemented and available, as part of the CompLearn package. We compare performance and running time with those of UPGMA, BioNJ, and NJ, as implemented in the SplitsTree package on genomic data for which the latter are optimized.
Quartetbased phylogeny reconstruction from gene orders ⋆
"... Abstract. Phylogenetic reconstruction based on gene rearrangements is attracting increasing attention from biologists and computer scientists. Methods used in reconstruction include distancebased methods, parsimony methods using sequence encodings, and direct optimization. The latter, pioneered by ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Abstract. Phylogenetic reconstruction based on gene rearrangements is attracting increasing attention from biologists and computer scientists. Methods used in reconstruction include distancebased methods, parsimony methods using sequence encodings, and direct optimization. The latter, pioneered by Sankoff and extended by us with the software suite GRAPPA, is the most accurate approach; however, its exhaustive nature means that it can be applied only to small datasets (of fewer than 15 taxa). While we have successfully scaled it up to 1,000 taxa by integrating it with a diskcovering method, yielding DCMGRAPPA, the recursive decomposition in the DCM may require many levels of recursion to handle datasets with 1,000 or more taxa. In order to handle larger datasets and reduce the need for recursive decomposition, we investigate quartetbased approaches, which directly decompose the datasets into subsets of four taxa each. Such approaches have been well studied for sequence data, but not for geneorder data. We give an optimization algorithm for the NPhard problem of computing optimal trees for each quartet, present a variation of the dyadic method (using heuristics to choose suitable short quartets), and use both in simulation studies. We find that our quartetbased method can handle more taxa than the base version of GRAPPA, thus enabling us to reduce the number of levels of recursion in DCMGRAPPA, but is more sensitive to the rate of evolution, with error rates rapidly increasing when saturation is approached. 1
Edited by
"... CRC Press is an imprint of the Taylor & Francis Group, an informa business9579_C000.fm Page iv Friday, November 17, 2006 12:30 PM Outside to inside of image: water ermine moth, UK (Spilosoma urticae); barley, UK (Hordeum distichon); fossilised sea urchins, Tunisia (Mecaster spp.); seeds, unknown ..."
Abstract
 Add to MetaCart
CRC Press is an imprint of the Taylor & Francis Group, an informa business9579_C000.fm Page iv Friday, November 17, 2006 12:30 PM Outside to inside of image: water ermine moth, UK (Spilosoma urticae); barley, UK (Hordeum distichon); fossilised sea urchins, Tunisia (Mecaster spp.); seeds, unknown origin (Bignoniaceae); purple sea snails, worldwide (Janthina janthina); fossilised shark teeth, USA (Isurus sp.); and sea urchin, Greece (Arbacia lixula)