## Rec-I-DCM3: A fast algorithmic technique for reconstructing large phylogenetic trees (2004)

### Cached

### Download Links

- [www.cs.unm.edu]
- [www.cs.unm.edu]
- [www.cs.njit.edu]
- [www.fas.harvard.edu]
- [www.fas.harvard.edu]
- [www.ped.fas.harvard.edu]
- [www.phylo.org]
- [ftp.dec.com]
- [gatekeeper.research.compaq.com]
- [gatekeeper.dec.com]
- [apotheca.hpl.hp.com]
- [faculty.cs.tamu.edu]
- [faculty.cse.tamu.edu]
- [www.cs.unm.edu]
- [lcbb.epfl.ch]
- [cs.unm.edu]
- [www.cs.utexas.edu]
- DBLP

### Other Repositories/Bibliography

Venue: | In Proc. IEEE Computer Society Bioinformatics Conference (CSB 2004 |

Citations: | 35 - 9 self |

### BibTeX

@INPROCEEDINGS{Roshan04rec-i-dcm3:a,

author = {Usman Roshan and Bernard M. E. Moret and Tandy Warnow and Tiffani L. Williams},

title = {Rec-I-DCM3: A fast algorithmic technique for reconstructing large phylogenetic trees},

booktitle = {In Proc. IEEE Computer Society Bioinformatics Conference (CSB 2004},

year = {2004},

pages = {98--109},

publisher = {IEEE Press}

}

### OpenURL

### Abstract

### Citations

3225 |
The neighbour-joining method: a new method for reconstructing phylogenetic trees
- Saitou, Nei
- 1987
(Show Context)
Citation Context ...mputing power will enable the analysis of larger datasets, as the accuracy of the heuristics steadily decreases with increasing size of datasets. Polynomial-time algorithms do exist (Neighbor-Joining =-=[19]-=- and UPGMA [12] are the best known examples), but many experimental studies have shown that such trees are not as accurate as those produced by MP or ML analyses. One of the major challenges confronti... |

2693 | PAUP: Phylogenetic analysis using parsimony
- Swofford
- 1990
(Show Context)
Citation Context ...he “parsimony ratchet” [11] was more effective than Tree-Bisection and Reconnection (TBR) hill-climbing [9], and that TNT’s [4] implementation of the parsimony ratchet was more efficient than PAUP*’s =-=[16]-=- implementation. Thus, TNT’s ratchet is probably among the best of the existing software tools for solving MP. In an independent development, new algorithmic strategies were also developed which were ... |

1185 |
Algorithmic graph theory and perfect graphs
- Golumbic
- 2004
(Show Context)
Citation Context ...e greater than three). � � ��������� ��������� ����� � ����������������������� ������� ������������������� Since the short subtree graph is triangulated, we can find (in polynomial time, as proven in =-=[5]-=-) a maximal clique separator that minimizes , where is the union of components . This would allow us to define a decomposition of the dataset into subsets , for , which would minimize the maximum subs... |

754 | Phylogenetic analysis using parsimony (*and other methods). Version 4 - PAUP |

269 |
Toward defining the course of evolution: minimum change for a specific tree topology
- Fitch
- 1971
(Show Context)
Citation Context ...ng the tree with the smallest number of point mutations for the data. While MP is NP-hard [3], constructing the optimal labeling of the internal nodes of a fixed tree � can be done in polynomial time =-=[2]-=-. Iterative Improvement Methods: Iterative improvement methods are some of the most popular methods in phylogeny reconstruction. Some fast technique is used to find an initial tree; that tree is then ... |

135 |
The Steiner problem in phylogeny is NP-complete
- Foulds, Graham
(Show Context)
Citation Context ...uction Maximum parsimony (MP) and maximum likelihood (ML) are two of the major optimization problems in phylogeny (i.e., evolutionary tree) reconstruction. Both are quite hard to solve (MP is NP-hard =-=[3]-=-, and ML harder in practice), and datasets above 30 taxa cannot be solved exactly with any reliability (branch and bound, or exhaustive search, techniques are both limited to smaller datasets). ML heu... |

104 |
The Parsimony Ratchet, a new method for rapid parsimony analysis
- Nixon
- 1999
(Show Context)
Citation Context ...hods have been studied is the time needed to get to the optimal score (or, more accurately, the best known score) on various real datasets, preferably of at least one hundred sequences. These studies =-=[4, 11, 15, 12, 13]-=- have suggested that the “parsimony ratchet” [11] was more effective than Tree-Bisection and Reconnection (TBR) hill-climbing [9], and that TNT’s [4] implementation of the parsimony ratchet was more e... |

86 |
A characterization of rigid circuit graphs
- Buneman
- 1974
(Show Context)
Citation Context ...ge present, weighted according to Û; denote this clique by Ã � .Theshort subtree graph is then the union, over all internal edges � of the guide tree, of the Ã � . We now use the following lemma from =-=[2]-=- and [6] to prove that the short subtree graph is triangulated. Lemma 1. Every triangulated graph is the intersection graph of subtrees of a tree and vice-versa. Theorem 1. The short subtree graph � o... |

84 | Disk-covering, a fastconverging method for phylogenetic tree reconstruction
- Huson, Nettles, et al.
- 1999
(Show Context)
Citation Context ... the new data. When that new search reaches a local optima, then the dataset is changed back to the original dataset, and hill-climbing is resumed. Disk-Covering Methods: Disk-Covering Methods (DCMs) =-=[6, 7, 10, 13, 17]-=- are divide-andconquer methods that are designed to “boost” the performance of phylogenetic reconstruction methods. The first DCM [6], also called DCM1, was designed for use with distance-based method... |

81 |
The discovery and importance of multiple islands of most-parsimonius trees
- Maddison
- 1991
(Show Context)
Citation Context ...rably of at least one hundred sequences. These studies [4, 11, 15, 12, 13] have suggested that the “parsimony ratchet” [11] was more effective than Tree-Bisection and Reconnection (TBR) hill-climbing =-=[9]-=-, and that TNT’s [4] implementation of the parsimony ratchet was more efficient than PAUP*’s [16] implementation. Thus, TNT’s ratchet is probably among the best of the existing software tools for solv... |

61 |
Analyzing large data sets in reasonable times: solutions for composite optima
- Goloboff
- 1999
(Show Context)
Citation Context ...hods have been studied is the time needed to get to the optimal score (or, more accurately, the best known score) on various real datasets, preferably of at least one hundred sequences. These studies =-=[4, 11, 15, 12, 13]-=- have suggested that the “parsimony ratchet” [11] was more effective than Tree-Bisection and Reconnection (TBR) hill-climbing [9], and that TNT’s [4] implementation of the parsimony ratchet was more e... |

48 |
Angiosperm phylogeny inferred from 18S rDNA, rbcL, and atpB
- Soltis, Soltis, et al.
- 2000
(Show Context)
Citation Context ...hods have been studied is the time needed to get to the optimal score (or, more accurately, the best known score) on various real datasets, preferably of at least one hundred sequences. These studies =-=[4, 11, 15, 12, 13]-=- have suggested that the “parsimony ratchet” [11] was more effective than Tree-Bisection and Reconnection (TBR) hill-climbing [9], and that TNT’s [4] implementation of the parsimony ratchet was more e... |

48 |
The RDP (Ribosomal Database Project) continues. Nucleic Acids Res
- Maidak, Cole, et al.
- 2000
(Show Context)
Citation Context ...ar and Molecular Biology, The University of Texas at Austin. 3. A set of 2,594 rbcL DNA sequences (1,428 sites) [9]. 4. A set of 4,583 aligned 16s ribosomal Actinobacteria RNA sequences (1,263 sites) =-=[11]-=-. Proceedings of the 2004 IEEE Computational Systems Bioinformatics Conference (CSB 2004) 0-7695-2194-0/04 $20.00 © 2004 IEEE 5. A set of 6,590 aligned small subunit ribosomal Eukaryotes RNA sequences... |

38 | Towards a discipline of experimental algorithmics
- Moret
- 2001
(Show Context)
Citation Context ...s) that it outperforms existing techniques, we now need to evaluate its performance in practice. The experimental evaluation of algorithms for phylogenetic reconstruction is a difficult endeavor (see =-=[13, 14]-=- for details). Because credible simulations of evolution remain lacking at the scale of 10,000 or more taxa, we chose to use biological datasets in our study. This choice ensures biological relevance ... |

35 |
The European database on small subunit ribosomal RNA. Nucleic Acids Res
- Wuyts, Peer, et al.
- 2002
(Show Context)
Citation Context ...ances were computed to ensure that the results are statistically significant. Datasets We gathered ten large datasets of the following sizes and types: (1) 1322 lsu rRNA of all organisms (1078 sites) =-=[18]-=-, (2) 2000 Eukaryotes rRNA (1326 sites) from the Gutell Lab at 6sThe University of Texas (UT) at Austin, (3) 2594 rbcL DNA (1428 sites) [8], (4) 4583 16s rRNA of all Actinobacteria (1263 sites) [1], (... |

31 | Scaling up accurate phylogenetic reconstruction from gene-order data
- Tang, Moret
- 2003
(Show Context)
Citation Context ...hod (DCM) to find highly accurate trees quickly. Rec-I-DCM3 uses iteration for escaping local optima, the divide-andconquer approach of the DCMs to reduce problem size, and recursion (as pioneered in =-=[22]-=-) to enable further localization and reduction in problem size. A Rec-I-DCM3 search not only dramatically reduces the size of the explored tree space, but also finds a larger fraction of MP trees with... |

25 | Solving large scale phylogenetic problems using DCM2
- Huson, Vawter, et al.
- 1999
(Show Context)
Citation Context ...er if run on a small number of datasets, each a fraction of the size of the full dataset. An early such divide-and-conquer method called DCM2 was developed for MP analysis, and presented at ISMB 1999 =-=[7]-=-. This divide-and-conquer method is used with an existing “base method” as follows. First a decomposition of the set of taxa into overlapping subsets is obtained, and trees are constructed on the subs... |

24 |
Designing fast converging phylogenetic methods
- John, Warnow
- 2001
(Show Context)
Citation Context ... the new data. When that new search reaches a local optima, then the dataset is changed back to the original dataset, and hill-climbing is resumed. Disk-Covering Methods: Disk-Covering Methods (DCMs) =-=[6, 7, 10, 13, 17]-=- are divide-andconquer methods that are designed to “boost” the performance of phylogenetic reconstruction methods. The first DCM [6], also called DCM1, was designed for use with distance-based method... |

20 | Performance of supertree methods on various data set decompositions
- Roshan, Moret, et al.
- 2004
(Show Context)
Citation Context |

18 |
A quantitative approach to a problem in classification
- Michener, Sokal
- 1957
(Show Context)
Citation Context ...ill enable the analysis of larger datasets, as the accuracy of the heuristics steadily decreases with increasing size of datasets. Polynomial-time algorithms do exist (Neighbor-Joining [19] and UPGMA =-=[12]-=- are the best known examples), but many experimental studies have shown that such trees are not as accurate as those produced by MP or ML analyses. One of the major challenges confronting both MP and ... |

16 |
The rate of growth of phylogenetic information, and the need for a phylogenetic database
- Sanderson, Baldwin, et al.
- 1993
(Show Context)
Citation Context ...tematists perform their phylogeny reconstructions (Sanderson surveyed 882 phylogenetic analyses published in 76 journals, and observed that 60% of the phylogenies were constructed using MP heuristics =-=[14]-=-). Because of the importance of MP analyses in phylogeny reconstruction, systematists and algorithms researchers in phylogeny have studied the existing methods (specifically, implementations of heuris... |

14 |
Absolute convergence: true trees from short sequences
- John
(Show Context)
Citation Context ... the new data. When that new search reaches a local optima, then the dataset is changed back to the original dataset, and hill-climbing is resumed. Disk-Covering Methods: Disk-Covering Methods (DCMs) =-=[6, 7, 10, 13, 17]-=- are divide-andconquer methods that are designed to “boost” the performance of phylogenetic reconstruction methods. The first DCM [6], also called DCM1, was designed for use with distance-based method... |

11 |
Changing the landscape: a new strategy for estimating large phylogenies. Syst. Biol
- Quicke, Taylor, et al.
- 2001
(Show Context)
Citation Context |

11 |
The relationship between maximum parsimony scores and phylogenetic tree topologies
- Williams, Berger-Wolf, et al.
- 2004
(Show Context)
Citation Context ...alyses. Whereas 90–95% accuracy is often considered excellent in heuristics for hard optimization problems, heuristics used in phylogenetic reconstruction must be much more accurate: in another study =-=[24]-=-, we found that solutions to MP that had an error rate larger than � (i.e., whose lengthsexceeded the optimal length by more than � ) produced topologically poor estimates of the true tree. Thus, heur... |

9 | Toward dening the course of evolution: Minimum change for a specic tree topology. Syst. Zool - FITCH - 1971 |

7 | Reconstructing optimal phylogenetic trees: A challenge in experimental algorithmics
- Moret, Warnow
- 2002
(Show Context)
Citation Context ...s) that it outperforms existing techniques, we now need to evaluate its performance in practice. The experimental evaluation of algorithms for phylogenetic reconstruction is a difficult endeavor (see =-=[13, 14]-=- for details). Because credible simulations of evolution remain lacking at the scale of 10,000 or more taxa, we chose to use biological datasets in our study. This choice ensures biological relevance ... |

4 |
Simultaneous parsimony jackknife analysis of 2538 rbcL DNA sequences reveals support for major clades of green plants, land plants, seed plants, and owering
- Kallerjo, Farris, et al.
- 1998
(Show Context)
Citation Context ... and types: (1) 1322 lsu rRNA of all organisms (1078 sites) [18], (2) 2000 Eukaryotes rRNA (1326 sites) from the Gutell Lab at 6sThe University of Texas (UT) at Austin, (3) 2594 rbcL DNA (1428 sites) =-=[8]-=-, (4) 4583 16s rRNA of all Actinobacteria (1263 sites) [1], (5) 6590 ssu rRNA of all Eukaryotes (1661 sites) [18], (6) 7180 three-domain rRNA (1122 sites) from the Gutell Lab at UT Austin, (7) 7233 16... |

3 |
Ratchet implementation in PAUP*4.0b10
- Bininda-Emonds
- 2003
(Show Context)
Citation Context ...starting tree Ì . In our experiments, we have used TNT (with default settings) as our base method, since it is the hardest to improve (in comparison, the PAUP* implementation of the parsimony ratchet =-=[1]-=- is easier to improve). Our algorithm produces smaller subproblems by recursively applying the centroid-edge decomposition until each subproblem is of size at most �; in our experiments we used subpro... |

2 | Ratchet implementation in PAUP*4.0b10, 2003. Available from www.tierzucht.tum.de:8080/WWW/Homepages/Bininda-Emonds - Bininda-Emonds |

1 |
et al. The RDP (ribosomal database project) continues
- Maidak
(Show Context)
Citation Context ...) [18], (2) 2000 Eukaryotes rRNA (1326 sites) from the Gutell Lab at 6sThe University of Texas (UT) at Austin, (3) 2594 rbcL DNA (1428 sites) [8], (4) 4583 16s rRNA of all Actinobacteria (1263 sites) =-=[1]-=-, (5) 6590 ssu rRNA of all Eukaryotes (1661 sites) [18], (6) 7180 three-domain rRNA (1122 sites) from the Gutell Lab at UT Austin, (7) 7233 16s rRNA of all Firmicutes bacteria (1352 sites) [1], (8) 85... |

1 | A quantitative approach to a problem in classication - Michener, Sokal - 1957 |

1 | on Intelligent Systems for Molecular Biology ISMB'03, volume 19 (Suppl. 1) of Bioinformatics - John - 2003 |

1 | Discrete Algorithms (SODA'01 - Symp - 2001 |