## Reconstructing optimal phylogenetic trees: a challenge in experimental algorithmics (2002)

Venue: | Experimental Algorithmics, volume 2547 of Lecture Notes in Computer Science |

Citations: | 7 - 4 self |

### BibTeX

@INPROCEEDINGS{Moret02reconstructingoptimal,

author = {Bernard M. E. Moret and Tandy Warnow},

title = {Reconstructing optimal phylogenetic trees: a challenge in experimental algorithmics},

booktitle = {Experimental Algorithmics, volume 2547 of Lecture Notes in Computer Science},

year = {2002},

pages = {163--180},

publisher = {Springer Verlag}

}

### OpenURL

### Abstract

### Citations

2293 |
The neighbor-joining method: a new method for reconstructing phylogenetic trees. Molecular Biology and Evolution
- Saitou, Nei
- 1987
(Show Context)
Citation Context ...ly moderately large datasets can take years of real analysis (hundreds of CPU years), without resolution [29]. By comparison, distancebased methods, including the popular Neighbor-Joining (NJ) method =-=[32]-=-, are often quite accurate (with respect to topological accuracy, as determined using simulation studies) and are very fast (polynomial-time and fast in practice). While the experimental evidence is n... |

271 |
The travelling salesman problem: A case study in local optimization
- Johnson, McGeoch
- 2003
(Show Context)
Citation Context ...s to construct such a median and is NP-hard [27]. Sankoff and Blanchette developed a reduction from MPB to the Travelling Salesman Problem (TSP), perhaps the most studied of all optimization problems =-=[15]-=-. Their reduction produces an undirected instance of the TSP from the directed instance of MPB by the standard technique of representing each gene by a pair of cities connected by an edge that must be... |

178 |
Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees
- Rambaut, Grassly
- 1997
(Show Context)
Citation Context ...owing we write # e to denote the average edge length in a collection of trees---which is just 500 times the scaling factor. We generated random DNA sequences for the root and used the program Seq-Gen =-=[28]-=- to evolve these sequences down the tree under the Jukes-Cantor model of evolution, thus producing sets of sequences at the leaves, our synthetic datasets. Because the number of distinct unrooted, lea... |

112 | The influence of caches on the performance of sorting - LaMarca, Ladner - 1997 |

99 | A few logs suffice to build (almost) all trees (I), Submitted, Random Structures and Algorithms
- Erdos, Steel, et al.
(Show Context)
Citation Context ...gs. 10 if it does not agree with T . If Q(T ) denotes the set of all quartet trees that agree with T , then T is uniquely characterized by Q(T ) and can be reconstructed from Q(T ) in polynomial time =-=[12]-=-. Quartet-based methods operate in two phases. In the first phase they construct a set Q of quartet trees on the different sets of four taxa; in the second phase, they combine these quartet trees into... |

77 | Multiple genome rearrangement and breakpoint phylogeny
- Sankoff, Blanchette
- 1998
(Show Context)
Citation Context ... that this approach works well for certain datasets (i.e., it obtains trees that are close to the model tree), but that the implementation developed by Sankoff and Blanchette, the BPAnalysis software =-=[33]-=-, is too slow to be used on anything other than small datasets with a few genes [9, 10]. 4.1 Breakpoint Analysis: Details When each genome has the same set of genes and each gene appears exactly once,... |

70 | Inferring evolutionary trees with strong combinatorial evidence
- Berry, Gascuel
- 1997
(Show Context)
Citation Context ...d (also known as the Buneman method) and the Quartet-Cleaning methods, can be described in terms of an explicit bound on the number of quartet errors around the edges they reconstruct. The Q # method =-=[4]-=- seeks the maximally resolved tree T # obeying Q(T # ) # Q; therefore, there are no quartet errors around any edge in the tree T # . Quartet-Cleaning (QC) methods [3, 5, 14] have explicit bounds on th... |

68 | The Influence of Caches on the Performance of Heaps - LaMarca, Ladner - 1996 |

64 | The median problems for breakpoints are NPcomplete
- Pe’er, Shamir
- 1998
(Show Context)
Citation Context ...r median to be a fourth genome that minimizes the sum of the breakpoint distances between it and the other three. The Median Problem for Breakpoints (MPB) is to construct such a median and is NP-hard =-=[27]-=-. Sankoff and Blanchette developed a reduction from MPB to the Travelling Salesman Problem (TSP), perhaps the most studied of all optimization problems [15]. Their reduction produces an undirected ins... |

60 | Breakpoint Phylogenies
- Blanchette, Bourque, et al.
- 1997
(Show Context)
Citation Context ...ting task: choosing how to vary the parameters while keeping the total computation down is a difficult tradeoff. 4 An Algorithm Engineering Example: Solving the Breakpoint Phylogeny Blanchette et al. =-=[6]-=- developed an approach, which they called breakpoint phylogeny, for reconstructing phylogenies from gene order data. Their approach is limited to the special case in which the genomes all have the sam... |

55 | A new implementation and detailed study of breakpoint analysis
- Moret, Wyman, et al.
- 2001
(Show Context)
Citation Context ...High-Performance Implementation Our implementation, GRAPPA, 2 incorporates all of the refinements mentioned above, plus others specifically made to enable the code to run efficiently in parallel (see =-=[23, 24, 26]-=- for details). Because the basic algorithm enumerates and independently scores every tree, it presents obvious parallelism: we can have each processor handle a subset of the trees. In order to do so e... |

51 | An empirical comparison of phylogenetic methods on chloroplast gene order data in Campanulaceae
- Cosner, Jansen, et al.
- 2000
(Show Context)
Citation Context ... close to the model tree), but that the implementation developed by Sankoff and Blanchette, the BPAnalysis software [33], is too slow to be used on anything other than small datasets with a few genes =-=[9, 10]-=-. 4.1 Breakpoint Analysis: Details When each genome has the same set of genes and each gene appears exactly once, a genome can be described by an ordering (circular or linear) of these genes, each gen... |

46 | A practical algorithm for recovering the best supported edges of an evolutionary tree (extended abstract
- Berry, Bryant, et al.
(Show Context)
Citation Context ... they reconstruct. The Q # method [4] seeks the maximally resolved tree T # obeying Q(T # ) # Q; therefore, there are no quartet errors around any edge in the tree T # . Quartet-Cleaning (QC) methods =-=[3, 5, 14]-=- have explicit bounds on the number of quartet errors around each reconstructed edge e. These error bounds have the form m # q e , where q e is the number of quartet trees around edge e and m is a sma... |

43 |
Rare genomic changes as a tool for phylogenetics. Trends Ecol
- Rokas, Holland
- 2000
(Show Context)
Citation Context ...of genome rearrangement data is part of an increased interest in the development of new sources of phylogenetic information, especially those which can be characterized as "rare genomic changes&q=-=uot; (see [31]-=- for a survey of these approaches). Sequence data and genomic rearrangement data are highly complementary, with different rates of evolution especially in organelles (chloroplasts and mitochondria) , ... |

42 | Quartet cleaning: improved algorithms and simulations
- Berry, Jiang, et al.
- 1999
(Show Context)
Citation Context ... they reconstruct. The Q # method [4] seeks the maximally resolved tree T # obeying Q(T # ) # Q; therefore, there are no quartet errors around any edge in the tree T # . Quartet-Cleaning (QC) methods =-=[3, 5, 14]-=- have explicit bounds on the number of quartet errors around each reconstructed edge e. These error bounds have the form m # q e , where q e is the number of quartet trees around edge e and m is a sma... |

40 | Steps toward accurate reconstructions of phylogenies from gene-order data
- Moret, Tang, et al.
(Show Context)
Citation Context ...High-Performance Implementation Our implementation, GRAPPA, 2 incorporates all of the refinements mentioned above, plus others specifically made to enable the code to run efficiently in parallel (see =-=[23, 24, 26]-=- for details). Because the basic algorithm enumerates and independently scores every tree, it presents obvious parallelism: we can have each processor handle a subset of the trees. In order to do so e... |

38 |
Analyzing algorithms by simulation: variance reduction techniques and simulation speedups
- McGeoch
- 1992
(Show Context)
Citation Context ...er expanded by the choice of evolutionary rates, it is not possible to take a fair sample of the entire input space. In order to obtain statistically robust results, we followed the advice of McGeoch =-=[21]-=- and Moret [25] and used a number of runs, each composed of a number of trials (a trial is a single comparison), computed the mean outcome for each run, and studied the mean and standard deviation ove... |

33 | Performance study of phylogenetic methods: (unweighted) quartet methods and neighbor-joining
- John, Moret, et al.
(Show Context)
Citation Context ...or instance, DNA sequences cannot be of arbitrary length, much less gene orders; thus the rate of convergence is crucial and needs to be evaluated experimentally as well as bounded theoretically (see =-=[35]-=- for such an evaluation and [37] for a theoretical approach). Data requirements therefore loom large---and indeed may prove more detrimental than computational requirements, since we can always run th... |

29 | A New Fast Heuristic for Computing the Breakpoint Phylogeny and Experimental Phylogenetic Analyses of Real and Synthetic data
- Cosner, Jansen, et al.
- 2000
(Show Context)
Citation Context ...logenies are usually (but not always) represented by unrooted leaf-labelled trees. Figure 1 shows two proposed phylogenies, one for several species of the Campanulaceae (bluebell flower) family (from =-=[10]-=-) and the other for herpesviruses that are known to affect humans (from [8]). Note that the Campanulaceae tree is rooted through the use of a distantly related species (here tobacco), called an outgro... |

26 |
On the practical solution of the reversal median problem
- Caprara
- 2001
(Show Context)
Citation Context ...m for this purpose. In turn, the availability of a fast implementation for inversion distance computations and inversion-based phylogenies has spurred renewed interest in the inversion median problem =-=[7, 34]-=- and other related problems. 5 An Experimental Algorithmics Example: Quartet-Based Methods for DNA Data 5.1 Quartet-based methods. A quartet tree is an unrooted binary tree on four taxa. A quartet tre... |

25 | Finding an optimal inversion median: Experimental results
- Siepel, Moret
- 2001
(Show Context)
Citation Context ...m for this purpose. In turn, the availability of a fast implementation for inversion distance computations and inversion-based phylogenies has spurred renewed interest in the inversion median problem =-=[7, 34]-=- and other related problems. 5 An Experimental Algorithmics Example: Quartet-Based Methods for DNA Data 5.1 Quartet-based methods. A quartet tree is an unrooted binary tree on four taxa. A quartet tre... |

25 | Improving memory performance of sorting algorithms - Xiao, Zhang, et al. - 1981 |

22 | High-performance algorithm engineering for computational phylogenetics
- Moret, Bader, et al.
- 2001
(Show Context)
Citation Context ...High-Performance Implementation Our implementation, GRAPPA, 2 incorporates all of the refinements mentioned above, plus others specifically made to enable the code to run efficiently in parallel (see =-=[23, 24, 26]-=- for details). Because the basic algorithm enumerates and independently scores every tree, it presents obvious parallelism: we can have each processor handle a subset of the trees. In order to do so e... |

19 | Efficient sorting using registers and caches - Arge, Chase, et al. - 2000 |

18 |
Epstein-Barr virus infection
- Cohen
- 2000
(Show Context)
Citation Context ...trees. Figure 1 shows two proposed phylogenies, one for several species of the Campanulaceae (bluebell flower) family (from [10]) and the other for herpesviruses that are known to affect humans (from =-=[8]-=-). Note that the Campanulaceae tree is rooted through the use of a distantly related species (here tobacco), called an outgroup in this context (the root is taken to be the internal node to which the ... |

17 |
Evidence from beta-tubulin phylogeny that microsporidia evolved from within the fungi
- Keeling, Luker, et al.
- 2000
(Show Context)
Citation Context ...high running time (proportional to n 7 m 4m+2 ), so that it is impractical for m larger than 5. The final quartet-based method we examined is the best known and the most frequently used by biologists =-=[22, 30, 17]-=-: the Quartet-Puzzling (QP) method [36]. This heuristic computes quartet trees using maximum likelihood (ML) and then uses a greedy strategy to construct a tree on which many input quartets are in agr... |

15 | Hybrid tree reconstruction methods
- Huson, Nettles, et al.
- 1999
(Show Context)
Citation Context ...initive, the best distance-based methods appear less accurate than the better heuristics for maximum parsimony and maximum likelihood, at least on large trees with high rates of evolution (see, e.g., =-=[13]-=-). 3 Algorithmic and Experimental Challenges 3.1 Designing for Speed Because both parsimony- and likelihood-based approaches involve NP-hard optimization problems and because poor approximations may l... |

15 |
Analyzing large data sets: rbcL 500 revisited. Syst. Biol
- RICE, DONOGHUE, et al.
- 1997
(Show Context)
Citation Context ...rsions, which may not have any performance guarantees) is that they take too long. Even some only moderately large datasets can take years of real analysis (hundreds of CPU years), without resolution =-=[29]-=-. By comparison, distancebased methods, including the popular Neighbor-Joining (NJ) method [32], are often quite accurate (with respect to topological accuracy, as determined using simulation studies)... |

13 |
A polynomial-time approximation scheme for inferring evolutionary trees from quartet topologies and its application
- Jiang, Kearney, et al.
(Show Context)
Citation Context ... they reconstruct. The Q # method [4] seeks the maximally resolved tree T # obeying Q(T # ) # Q; therefore, there are no quartet errors around any edge in the tree T # . Quartet-Cleaning (QC) methods =-=[3, 5, 14]-=- have explicit bounds on the number of quartet errors around each reconstructed edge e. These error bounds have the form m # q e , where q e is the number of quartet trees around edge e and m is a sma... |

11 | The cache performance of traversals and random accesses - Ladner, Fix, et al. - 1999 |

10 |
GRAPPA runs in record time
- Bader, Moret
(Show Context)
Citation Context ...rough algorithm engineering, our run on the Campanulaceae dataset demonstrated a one hundred millionfoldsspeed-up over the original implementation [26] (a first speedup of one million was reported in =-=[2]-=-). 4.3 A Partial Assessment Clearly, generating every single tree is a self-defeating approach: even our huge 10 8 - fold speedup allowed us to move from 10 taxa to just 16 taxa---and 20 or more taxa ... |

10 |
Quartet puzzling: A maximum likelihood method for reconstructing tree topologies. Molecular Biology and Evolution
- Strimmer, Haeseler
- 1996
(Show Context)
Citation Context ..., so that it is impractical for m larger than 5. The final quartet-based method we examined is the best known and the most frequently used by biologists [22, 30, 17]: the Quartet-Puzzling (QP) method =-=[36]-=-. This heuristic computes quartet trees using maximum likelihood (ML) and then uses a greedy strategy to construct a tree on which many input quartets are in agreement. QP uses an arbitrary ordering o... |

9 | Algorithms and experiments: The new (and old) methodology
- Moret, Shapiro
(Show Context)
Citation Context ...the choice of evolutionary rates, it is not possible to take a fair sample of the entire input space. In order to obtain statistically robust results, we followed the advice of McGeoch [21] and Moret =-=[25]-=- and used a number of runs, each composed of a number of trials (a trial is a single comparison), computed the mean outcome for each run, and studied the mean and standard deviation over the runs of t... |

7 | Matrix multiplication: a case study of enhanced data cache utilization - Eiron, Rodeh - 1999 |

6 |
A phylogeny of the damselfly genus Calopteryx (Odonata) using mitochondrial 16S rDNA markers. Molecular Phylogenetics and Evolution
- Misof, Anderson, et al.
- 2000
(Show Context)
Citation Context ...high running time (proportional to n 7 m 4m+2 ), so that it is impractical for m larger than 5. The final quartet-based method we examined is the best known and the most frequently used by biologists =-=[22, 30, 17]-=-: the Quartet-Puzzling (QP) method [36]. This heuristic computes quartet trees using maximum likelihood (ML) and then uses a greedy strategy to construct a tree on which many input quartets are in agr... |

5 |
Introduction to Computational Biology: Sequences, Maps and Genomes
- Waterman
- 1995
(Show Context)
Citation Context ... but the other has a space), and deletions (the reverse). Computing a good multiple sequence alignment is itself a hard optimization problem, but outside our scope; we direct the interested reader to =-=[38]-=- for an introduction to this problem. Genome rearrangement data indicate how the genes are ordered within the given genomes. Many organellar genomes are composed of a single chromosome and are relativ... |

4 |
Molecular evolution and phylogeny of the buzzatii complex (Drosophila repleta group): a maximumlikelihood approach
- Rodriguez-Trelles, Alarcon, et al.
- 2000
(Show Context)
Citation Context ...high running time (proportional to n 7 m 4m+2 ), so that it is impractical for m larger than 5. The final quartet-based method we examined is the best known and the most frequently used by biologists =-=[22, 30, 17]-=-: the Quartet-Puzzling (QP) method [36]. This heuristic computes quartet trees using maximum likelihood (ML) and then uses a greedy strategy to construct a tree on which many input quartets are in agr... |

2 |
Absolute phylogeny: true trees from short sequences
- John
- 2001
(Show Context)
Citation Context ...t be of arbitrary length, much less gene orders; thus the rate of convergence is crucial and needs to be evaluated experimentally as well as bounded theoretically (see [35] for such an evaluation and =-=[37]-=- for a theoretical approach). Data requirements therefore loom large---and indeed may prove more detrimental than computational requirements, since we can always run the program longer. 6 Because the ... |

1 | Efficient sorting using registers andcaches - Arge, Chase, et al. - 2000 |