Results 1  10
of
21
A few logs suffice to build (almost) all trees (I)
 II. THEORETICAL COMPUTER SCIENCE
, 1999
"... A phylogenetic tree (also called an "evolutionary tree") is a leaflabelled tree which represents the evolutionary history for a set of species, and the construction of such trees is a fundamental problem in biology. Here we address the issue of how many sequence sites are required in order to recov ..."
Abstract

Cited by 101 (24 self)
 Add to MetaCart
A phylogenetic tree (also called an "evolutionary tree") is a leaflabelled tree which represents the evolutionary history for a set of species, and the construction of such trees is a fundamental problem in biology. Here we address the issue of how many sequence sites are required in order to recover the tree with high probability when the sites evolve under standard Markovstyle i.i.d. mutation models. We provide analytic upper and lower bounds for the required sequence length, by developing a new (and polynomial time) algorithm. In particular we show that when the mutation probabilities are bounded the required sequence length can grow surprisingly slowly (a power of log n) in the number n of sequences, for almost all trees.
Diskcovering, a fastconverging method for phylogenetic tree reconstruction
 JOURNAL OF COMPUTATIONAL BIOLOGY
, 1999
"... The evolutionary history of a set of species is represented by a phylogenetic tree, which is a rooted, leaflabeled tree, where internal nodes represent ancestral species and the leaves represent modern day species. Accurate (or even boundedly inaccurate) topology reconstructions of large and diverg ..."
Abstract

Cited by 79 (6 self)
 Add to MetaCart
The evolutionary history of a set of species is represented by a phylogenetic tree, which is a rooted, leaflabeled tree, where internal nodes represent ancestral species and the leaves represent modern day species. Accurate (or even boundedly inaccurate) topology reconstructions of large and divergent trees from realistic length sequences have long been considered one of the major challenges in systematic biology. In this paper, we present a simple method, the DiskCovering Method (DCM), which boosts the performance of base phylogenetic methods under various Markov models of evolution. We analyze the performance of DCMboosted distance methods under the Jukes–Cantor Markov model of biomolecular sequence evolution, and prove that for almost all trees, polylogarithmic length sequences suffice for complete accuracy with high probability, while polynomial length sequences always suffice. We also provide an experimental study based upon simulating sequence evolution on model trees. This study confirms substantial reductions in error rates at realistic sequence lengths.
Inferring Evolutionary Trees with Strong Combinatorial Evidence
 THEORETICAL COMPUTER SCIENCE
, 1997
"... We consider the problem of inferring the evolutionary tree of a set of n species. We propose a quartet reconstruction method which specifically produces trees whose edges have strong combinatorial evidence. Let Q be a set of resolved quartets defined on the studied species, the method computes th ..."
Abstract

Cited by 71 (11 self)
 Add to MetaCart
We consider the problem of inferring the evolutionary tree of a set of n species. We propose a quartet reconstruction method which specifically produces trees whose edges have strong combinatorial evidence. Let Q be a set of resolved quartets defined on the studied species, the method computes the unique maximum subset Q of Q which is equivalent to a tree and outputs the corresponding tree as an estimate of the species' phylogeny. We use a characterization of the subset Q due to [6] to provide an O(n 4 ) incremental algorithm for this variant of the NPhard quartet consistency problem. Moreover, when chosing the resolution of the quartets by the FourPoint Method (FPM) and considering the CavenderFarris model of evolution, we show that the convergence rate of the Q method is at worst polynomial when the maximum evolutive distance between two species is bounded. We complete these theoretical results by an experimental study on real and simulated data sets. The results ...
Efficient Algorithms for Inverting Evolution
 Proceedings of the ACM Symposium on the Foundations of Computer Science
, 1999
"... Evolution can be mathematically modelled by a stochastic process that operates on the DNA of species. Such models are based on the established theory that the DNA sequences, or genomes, of all extant species have been derived from the genome of the common ancestor of all species by a process of rand ..."
Abstract

Cited by 47 (3 self)
 Add to MetaCart
Evolution can be mathematically modelled by a stochastic process that operates on the DNA of species. Such models are based on the established theory that the DNA sequences, or genomes, of all extant species have been derived from the genome of the common ancestor of all species by a process of random mutation and natural selection. A stochastic model...
Evolutionary Trees can be Learned in Polynomial Time in the TwoState General Markov Model
 SIAM Journal on Computing
, 1998
"... The jState General Markov Model of evolution (due to Steel) is a stochastic model concerned with the evolution of strings over an alphabet of size j . In particular, the TwoState General Markov Model of evolution generalises the wellknown CavenderFarrisNeyman model of evolution by removing the sy ..."
Abstract

Cited by 32 (2 self)
 Add to MetaCart
The jState General Markov Model of evolution (due to Steel) is a stochastic model concerned with the evolution of strings over an alphabet of size j . In particular, the TwoState General Markov Model of evolution generalises the wellknown CavenderFarrisNeyman model of evolution by removing the symmetry restriction (which requires that the probability that a `0' turns into a `1' along an edge is the same as the probability that a `1' turns into a `0' along the edge). Farach and Kannan showed how to PAClearn Markov Evolutionary Trees in the CavenderFarrisNeyman model provided that the target tree satisfies the additional restriction that all pairs of leaves have a sufficiently high probability of being the same. We show how to remove both restrictions and thereby obtain the first polynomialtime PAClearning algorithm (in the sense of Kearns et al.) for the general class of TwoState Markov Evolutionary Trees. Research Report RR347, Department of Computer Science, University of Wa...
Learning Nonsingular Phylogenies and Hidden Markov Models
 Proceedings of the thirtyseventh annual ACM Symposium on Theory of computing, Baltimore (STOC05
, 2005
"... In this paper, we study the problem of learning phylogenies and hidden Markov models. We call the Markov model nonsingular if all transtion matrices have determinants bounded away from 0 (and 1). We highlight the role of the nonsingularity condition for the learning problem. Learning hidden Markov m ..."
Abstract

Cited by 26 (6 self)
 Add to MetaCart
In this paper, we study the problem of learning phylogenies and hidden Markov models. We call the Markov model nonsingular if all transtion matrices have determinants bounded away from 0 (and 1). We highlight the role of the nonsingularity condition for the learning problem. Learning hidden Markov models without the nonsingularity condition is at least as hard as learning parity with noise. On the other hand, we give a polynomialtime algorithm for learning nonsingular phylogenies and hidden Markov models.
Obtaining Highly Accurate Topology Estimates of Evolutionary Trees From Very Short Sequences
 Proceedings of RECOMB'99
"... The evolutionary history of a set of species is represented by a phylogenetic tree, in other words, by a rooted, leaflabelled tree, where internal nodes represent ancestral species and the leaves represent modern day species. Accurate (or even boundedly inaccurate) topology reconstructions of large ..."
Abstract

Cited by 15 (0 self)
 Add to MetaCart
The evolutionary history of a set of species is represented by a phylogenetic tree, in other words, by a rooted, leaflabelled tree, where internal nodes represent ancestral species and the leaves represent modern day species. Accurate (or even boundedly inaccurate) topology reconstructions of large and divergent trees has long been considered one of the major challenges in systematic biology. None of the polynomial time methods developed by the theoretical computer science community has been shown to outperform the popular NeighborJoining method used by systematic biologists, with respect to topology estimation. (However, preliminary experiments indicate that two new variants of NeighborJoining, BioNJ and Weighbor, do exhibit improved performance.) In this paper, we present a simple polynomial time method, the DiskCovering Method (DCM), which boosts the performance of base phylogenetic methods. We analyze the performance of DCMboosted distance methods under the general Markov mo...
Inverting random functions
 Annals of Combinatorics
, 1999
"... In this paper we study how to invert random functions under different criteria. The motivation for this study is phylogeny reconstruction, since the evolution of biomolecular sequences may be considered as a random function from the set of possible phylogenetic trees to the set of collections of bio ..."
Abstract

Cited by 14 (9 self)
 Add to MetaCart
In this paper we study how to invert random functions under different criteria. The motivation for this study is phylogeny reconstruction, since the evolution of biomolecular sequences may be considered as a random function from the set of possible phylogenetic trees to the set of collections of biomolecular sequences of observed species. Our results may effect how we think about the maximum likelihood estimation (MLE) in phylogeny. MLE is optimal to invert random functions under a first criterion, although it is not optimal under another, at least equally natural but more conservative second criterion. It turns out that MLE has to be used in a different way as it is used in the phylogeny literature, if we have a prior distribution on trees and mutation mechanisms and want to keep MLE optimal under the same first criterion. Some of the results of this paper have been known in the setting of statistical decision theory, but have never been discussed in the context of phylogeny. ∗Michael A. Steel was supported by the New Zealand Marsden Fund. László A.Székely
Better Methods for Solving Parsimony and Compatibility
 Journal of Computational Biology
, 1998
"... Evolutionary tree reconstruction is a challenging problem with important applications in Biology and Linguistics. In Biology, one of the most promising approaches to tree reconstruction is to find the "maximum parsimony" tree, while in Linguistics, the use of the "maximum ..."
Abstract

Cited by 14 (3 self)
 Add to MetaCart
Evolutionary tree reconstruction is a challenging problem with important applications in Biology and Linguistics. In Biology, one of the most promising approaches to tree reconstruction is to find the "maximum parsimony" tree, while in Linguistics, the use of the "maximum