Results 1  10
of
150
Fast and Accurate Phylogeny Reconstruction Algorithms Based on the MinimumEvolution Principle
 JOURNAL OF COMPUTATIONAL BIOLOGY
, 2002
"... The Minimum Evolution (ME) approach to phylogeny estimation has been shown to be statistically consistent when it is used in conjunction with ordinary leastsquares (OLS) fitting of a metric to a tree structure. The traditional approach to using ME has been to start with the Neighbor Joining (NJ) to ..."
Abstract

Cited by 55 (5 self)
 Add to MetaCart
The Minimum Evolution (ME) approach to phylogeny estimation has been shown to be statistically consistent when it is used in conjunction with ordinary leastsquares (OLS) fitting of a metric to a tree structure. The traditional approach to using ME has been to start with the Neighbor Joining (NJ) topology for a given matrix and then do a topological search from that starting point. The first stage requires O(n³) time, where n is the number of taxa, while the current implementations of the second are in O(p n³) or more, where p is the number of swaps performed by the program. In this paper, we examine a greedy approach to minimum evolution which produces a starting topology in O(n²) time. Moreover, we provide an algorithm that searches for the best topology using nearest neighbor interchanges (NNIs), where the cost of doing p NNIs is O(n² C p n), i.e., O(n²) in practice because p is always much smaller than n. The Greedy Minimum Evolution (GME) algorithm, when used in combination with NNIs, produces trees which are fairly close to NJ trees in terms of topological accuracy. We also examine ME under a balanced weighting scheme, where sibling subtrees have equal weight, as opposed to the standard “unweighted ” OLS, where
Species trees from gene trees: Reconstructing Bayesian posterior distributions of a species phylogeny using estimated gene tree distributions
 Systematic Biology
"... The estimation of species trees has become popular as a considerable amount of multilocus molecular data is available for inferring the evolutionary history of species. However, the current phylogenetic paradigm, that reconstructs gene trees to represent the species tree suggests that commonly used ..."
Abstract

Cited by 36 (4 self)
 Add to MetaCart
The estimation of species trees has become popular as a considerable amount of multilocus molecular data is available for inferring the evolutionary history of species. However, the current phylogenetic paradigm, that reconstructs gene trees to represent the species tree suggests that commonly used methods such as the concatenation method, the consensus tree method, or the gene tree parsimony method may be either inconsistent or highly biased. In this paper, we propose a Bayesian hierarchical model to estimate the phylogeny of a group of species using multiple estimated gene tree distributions such as those that arise in a Bayesian analysis of DNA sequence data. Our model employs substitution models used in traditional phylogenetics, but also uses coalescent theory to explain genealogical signals from species trees to gene trees and from gene trees to sequence data, thereby forming a stochastic model to estimate gene trees, species trees, ancestral population sizes and species divergence times simultaneously. Our model is founded on the assumption that gene trees, even of unlinked loci, are correlated due to being derived from a single species tree and therefore should be estimated jointly. We apply the method to two multilocus DNA sequences datasets. The estimates of the
Case Study: Visualizing Sets of Evolutionary Trees
, 2002
"... We describe a visualization tool which allows a biologist to explore a large set of hypothetical evolutionary trees. Interacting with such a dataset allows the biologist to identify distinct hypotheses about how different species or organisms evolved, which would not have been clear from traditional ..."
Abstract

Cited by 35 (4 self)
 Add to MetaCart
We describe a visualization tool which allows a biologist to explore a large set of hypothetical evolutionary trees. Interacting with such a dataset allows the biologist to identify distinct hypotheses about how different species or organisms evolved, which would not have been clear from traditional analyses. Our system integrates a pointset visualization of the distribution of hypothetical trees with detail views of an individual tree, or of a consensus tree summarizing a subset of trees. Efficient algorithms were required for the key tasks of computing distances between trees, finding consensus trees, and laying out the pointset visualization. 1
A framework for representing reticulate evolution
 Ann. Combin
, 2004
"... Abstract. Acyclic directed graphs (ADGs) are increasingly being viewed as more appropriate for representing certain evolutionary relationships, particularly in biology, than rooted trees. In this paper, we develop a framework for the analysis of these graphs which we call hybrid phylogenies. We are ..."
Abstract

Cited by 34 (6 self)
 Add to MetaCart
Abstract. Acyclic directed graphs (ADGs) are increasingly being viewed as more appropriate for representing certain evolutionary relationships, particularly in biology, than rooted trees. In this paper, we develop a framework for the analysis of these graphs which we call hybrid phylogenies. We are particularly interested in the problem whereby one is given a set of phylogenetic trees and wishes to determine a hybrid phylogeny that ‘embeds ’ each of these trees and which requires the smallest number of hybridisation events. We show that this quantity can be greatly reduced if additional species are involved, and investigate other combinatorial aspects of this and related questions. 1.
Approximating the true evolutionary distance between two genomes
 in Proc. 7th SIAM Workshop on Algorithm Engineering & Experiments (ALENEX’05), 121 (SIAM
, 2005
"... As more and more genomes are sequenced, evolutionary biologists are becoming increasingly interested in evolution at the level of whole genomes, in scenarios in which the genome evolves through insertions, duplications, deletions, and movements of genes along its chromosomes. In the mathematical mod ..."
Abstract

Cited by 31 (7 self)
 Add to MetaCart
As more and more genomes are sequenced, evolutionary biologists are becoming increasingly interested in evolution at the level of whole genomes, in scenarios in which the genome evolves through insertions, duplications, deletions, and movements of genes along its chromosomes. In the mathematical model pioneered by Sankoff and others, a unichromosomal genome is represented by a signed permutation of a multiset of genes; Hannenhalli and Pevzner showed that the edit distance between two signed permutations of the same set can be computed in polynomial time when all operations are inversions. ElMabrouk extended that result to allow deletions and a limited form of insertions (which forbids duplications); in turn we extended it to compute a nearly optimal edit sequence between an arbitrary genome and the identity permutation. In this paper we generalize our approach to compute distances between two arbitrary genomes, but focus on approximating the true evolutionary distance rather than the edit distance. We present experimental results showing that our algorithm produces excellent estimates of the true evolutionary distance up to a (high) threshold of saturation; indeed, the distances thus produced are good enough to enable the simple
Scaling up accurate phylogenetic reconstruction from geneorder data
, 2002
"... Motivation: Phylogenetic reconstruction from geneorder data has attracted increasing attention from both biologists and computer scientists over the last few years. Methods used in reconstruction include distancebased methods (such as neighborjoining), parsimony methods using sequencebased encod ..."
Abstract

Cited by 29 (13 self)
 Add to MetaCart
Motivation: Phylogenetic reconstruction from geneorder data has attracted increasing attention from both biologists and computer scientists over the last few years. Methods used in reconstruction include distancebased methods (such as neighborjoining), parsimony methods using sequencebased encodings, Bayesian approaches, and direct optimization. The latter, pioneered by Sankoff and extended by us with the software suite GRAPPA, is the most accurate approach, but cannot handle more than about 15 genomes of limited size (e.g., organelles). Results: We report here on our successful efforts to scale up direct optimization through a twostep approach: the first step decomposes the dataset into smaller pieces and runs the direct optimization (GRAPPA) on the smaller pieces, while the second step builds a tree from the results obtained on the smaller pieces. We used the sophisticated diskcovering method (DCM) pioneered by Warnow and her group, suitably modified to take into account the computational limitations of GRAPPA. We find that DCMGRAPPA scales gracefully to at least 1,000 genomes of a few hundred genes each and retains surprisingly high accuracy throughout the range: in our experiments, the topological error rate rarely exceeded a few percent. Thus, reconstruction based on geneorder data can now be accomplished with high accuracy on datasets of significant size. Availability: All of our software is available in source form under GPL at www.compbio.unm.edu Contact:
Multiple Sequence Alignment Accuracy and Phylogenetic Inference
"... Phylogenies are often thought to be more dependent upon the specifics of the sequence alignment rather than on the method of reconstruction. Simulation of sequences containing insertion and deletion events was performed in order to determine the role that alignment accuracy plays during phylogeneti ..."
Abstract

Cited by 25 (1 self)
 Add to MetaCart
Phylogenies are often thought to be more dependent upon the specifics of the sequence alignment rather than on the method of reconstruction. Simulation of sequences containing insertion and deletion events was performed in order to determine the role that alignment accuracy plays during phylogenetic inference. Data sets were simulated for pectinate, balanced, and random tree shapes under different conditions (ultrametric equal branch length, ultrametric random branch length, nonultrametric random branch length). Comparisons between hypothesized alignments and true alignments enabled determination of two measures of alignment accuracy, that of the total data set and that of individual branches. In general, our results indicate that as alignment error increases, topological accuracy decreases. This trend was much more pronounced for data sets derived from more pectinate topologies. In contrast, for balanced, ultrametric, equal branch length tree shapes, alignment inaccuracy had little average effect on tree reconstruction. These conclusions are based on average trends of many analyses under different conditions, and any one specific analysis, independent of the alignment accuracy, may recover very accurate or inaccurate topologies. Maximum likelihood and Bayesian, in general, outperformed neighbor joining and maximum parsimony in terms of tree reconstruction accuracy. Results also indicated that as the length of the branch and of the neighboring branches increase, alignment accuracy decreases, and the length of the neighboring branches is the major factor in topological accuracy. Thus, multiplesequence alignment can be an important factor in downstream effects on topological reconstruction. [Bayesian; maximum likelihood; maximum parsimony; multiple sequence alignment; neighbor
Comparison of TreeChild Phylogenetic Networks
, 708
"... Abstract. Phylogenetic networks are a generalization of phylogenetic trees that allow for the representation of nontreelike evolutionary events, like recombination, hybridization, or lateral gene transfer. While much progress has been made to find practical algorithms for reconstructing a phylogene ..."
Abstract

Cited by 24 (5 self)
 Add to MetaCart
Abstract. Phylogenetic networks are a generalization of phylogenetic trees that allow for the representation of nontreelike evolutionary events, like recombination, hybridization, or lateral gene transfer. While much progress has been made to find practical algorithms for reconstructing a phylogenetic network from a set of sequences, all attempts to endorse a class of phylogenetic networks (strictly extending the class of phylogenetic trees) with a wellfounded distance measure have, to the best of our knowledge, failed so far. In this paper, we present and study a new meaningful class of phylogenetic networks, called treechild phylogenetic networks, and we provide an injective representation of these networks as multisets of vectors of natural numbers, their path multiplicity vectors. We then use this representation to define a distance on this class that extends the wellknown RobinsonFoulds distance for phylogenetic trees, and to give an alignment method for pairs of networks in this class. Simple, polynomial algorithms for reconstructing a treechild phylogenetic network from its path multiplicity vectors, for computing the distance between two treechild phylogenetic networks, and for aligning a pair of treechild phylogenetic networks, are provided. They have been implemented as a Perl package and a Java applet, and they are available at the Supplementary Material web page. 1
Computing the quartet distance between evolutionary trees
 In Proceedings of the 11th Annual ACMSIAM Symposium on Discrete Algorithms (SODA
, 2000
"... The comparison of evolutionary trees is a fundamental problem in evolutionary biology. Di erent evolutionary hypotheses (or con icting phylogenies) arise ..."
Abstract

Cited by 22 (0 self)
 Add to MetaCart
The comparison of evolutionary trees is a fundamental problem in evolutionary biology. Di erent evolutionary hypotheses (or con icting phylogenies) arise
Treetotree Correction for Document Trees
, 1995
"... Documents can be represented as ordered labelled trees. Finding the editing distance between documents is a particular case of the general problem for trees. We give a detailed survey of previous results, presenting them in a single notation to elucidate their commonalities. We then discuss two ways ..."
Abstract

Cited by 21 (0 self)
 Add to MetaCart
Documents can be represented as ordered labelled trees. Finding the editing distance between documents is a particular case of the general problem for trees. We give a detailed survey of previous results, presenting them in a single notation to elucidate their commonalities. We then discuss two ways of extending these resultsfirst, by changing the set of primitive editing operations used by existing algorithms and, second, by postprocessing the output of the algorithms to recognize patterns of change significant to documents. Finally, we provide extensions of the first type. Our algorithm allows subtree operations but is otherwise similar to that of Zhang and Shasha. This is a corrected and expanded version of Technical Report 91315. y This report was completed during a sabbatical at INRIA (Institute National de Recherche en Informatique et en Automatique) in Rocquencourt, France. Contents 1 Introduction 3 2 Background 5 2.1 StringtoString Correction: Wagner and Fischer ...