Results 1 - 10
of
131
A few logs suffice to build (almost) all trees (I)
- II. THEORETICAL COMPUTER SCIENCE
, 1999
"... A phylogenetic tree (also called an "evolutionary tree") is a leaf-labelled tree which represents the evolutionary history for a set of species, and the construction of such trees is a fundamental problem in biology. Here we address the issue of how many sequence sites are required in order to recov ..."
Abstract
-
Cited by 81 (24 self)
- Add to MetaCart
A phylogenetic tree (also called an "evolutionary tree") is a leaf-labelled tree which represents the evolutionary history for a set of species, and the construction of such trees is a fundamental problem in biology. Here we address the issue of how many sequence sites are required in order to recover the tree with high probability when the sites evolve under standard Markov-style i.i.d. mutation models. We provide analytic upper and lower bounds for the required sequence length, by developing a new (and polynomial time) algorithm. In particular we show that when the mutation probabilities are bounded the required sequence length can grow surprisingly slowly (a power of log n) in the number n of sequences, for almost all trees.
Full reconstruction of Markov models on evolutionary trees: identifiability and consistency
- Math. Biosci
, 1996
"... A Markov model of evolution of characters on a phylogenetic tree consists of a tree topology together with a speci cation of probability transition matrices on the edges of the tree. Previous work has shown that under mild conditions, the tree topology may be reconstructed, in the sense that the top ..."
Abstract
-
Cited by 45 (0 self)
- Add to MetaCart
A Markov model of evolution of characters on a phylogenetic tree consists of a tree topology together with a speci cation of probability transition matrices on the edges of the tree. Previous work has shown that under mild conditions, the tree topology may be reconstructed, in the sense that the topology is identi able from knowledge of the joint distribution of character states at pairs of terminal nodes of the tree. Also, the method of maximum likelihood is statistically consistent for inferring the tree topology. In this paper we answer the analogous questions for reconstruction of the full model, including the edge transition matrices: under mild conditions, such full reconstruction is achievable, not by using pairs of terminal nodes, but rather by using triples of terminal nodes. The identi ability result generalizes previous results that were restricted either to characters having two states or to transition matrices having special structure. The proof develops matrix relationships that may be exploited to identify the model. We also use the identi ability result to prove that the method of maximum likelihood is consistent for reconstructing the full model. 1 markov models on evolutionary trees 2 1
Efficient Algorithms for Inverting Evolution
- Proceedings of the ACM Symposium on the Foundations of Computer Science
, 1999
"... Evolution can be mathematically modelled by a stochastic process that operates on the DNA of species. Such models are based on the established theory that the DNA sequences, or genomes, of all extant species have been derived from the genome of the common ancestor of all species by a process of rand ..."
Abstract
-
Cited by 43 (3 self)
- Add to MetaCart
Evolution can be mathematically modelled by a stochastic process that operates on the DNA of species. Such models are based on the established theory that the DNA sequences, or genomes, of all extant species have been derived from the genome of the common ancestor of all species by a process of random mutation and natural selection. A stochastic model...
Phylogenetic Tree Construction Using Markov Chain Monte Carlo
, 1999
"... We describe a Bayesian method based on Markov chain simulation to study the phylogenetic relationship in a group of DNA sequences. Under simple models of mutational events, our method produces a Markov chain whose stationary distribution is the conditional distribution of the phylogeny given the obs ..."
Abstract
-
Cited by 36 (0 self)
- Add to MetaCart
We describe a Bayesian method based on Markov chain simulation to study the phylogenetic relationship in a group of DNA sequences. Under simple models of mutational events, our method produces a Markov chain whose stationary distribution is the conditional distribution of the phylogeny given the observed sequences. Our algorithm strikes a reasonable balance between the desire to move globally through the space of phylogenies and the need to make computationally feasible moves in areas of high probability. Since phylogenetic information is described by a tree, we have created new diagnostics to handle this type of data structure. An important by-product of the Markov chain Monte Carlo phylogeny building technique is that it provides estimates and corresponding measures of variability for any aspect of the phylogeny under study.
Constructing Big Trees from Short Sequences
- Proceedings of the 24th International Colloquium on Automata, Languages, and Programming
, 1997
"... . The construction of evolutionary trees is a fundamental problem in biology, and yet methods for reconstructing evolutionary trees are not reliable when it comes to inferring accurate topologies of large divergent evolutionary trees from realistic length sequences. We address this problem and prese ..."
Abstract
-
Cited by 32 (6 self)
- Add to MetaCart
. The construction of evolutionary trees is a fundamental problem in biology, and yet methods for reconstructing evolutionary trees are not reliable when it comes to inferring accurate topologies of large divergent evolutionary trees from realistic length sequences. We address this problem and present a new polynomial time algorithm for reconstructing evolutionary trees called the Short Quartets Method which is consistent and which has greater statistical power than other polynomial time methods, such as Neighbor-Joining and the 3-approximation algorithm by Agarwala et al. (and the "Double Pivot" variant of the Agarwala et al. algorithm by Cohen and Farach) for the L1-nearest tree problem. Our study indicates that our method will produce the correct topology from shorter sequences than can be guaranteed using these other methods. 1 Introduction Evolutionary trees indicate how species evolved from a common ancestor and are of fundamental concern to biologists. There are many methods f...
Learning Nonsingular Phylogenies and Hidden Markov Models
- Proceedings of the thirty-seventh annual ACM Symposium on Theory of computing, Baltimore (STOC05
, 2005
"... In this paper, we study the problem of learning phylogenies and hidden Markov models. We call the Markov model nonsingular if all transtion matrices have determinants bounded away from 0 (and 1). We highlight the role of the nonsingularity condition for the learning problem. Learning hidden Markov m ..."
Abstract
-
Cited by 18 (6 self)
- Add to MetaCart
In this paper, we study the problem of learning phylogenies and hidden Markov models. We call the Markov model nonsingular if all transtion matrices have determinants bounded away from 0 (and 1). We highlight the role of the nonsingularity condition for the learning problem. Learning hidden Markov models without the nonsingularity condition is at least as hard as learning parity with noise. On the other hand, we give a polynomial-time algorithm for learning nonsingular phylogenies and hidden Markov models.
General Time-Reversible Distances with Unequal Rates across Sites: Mixing Γ and Inverse Gaussian Distributions with Invariant Sites
, 1997
"... This paper aims to explain to biologists the assumptions of these distances and to clarify some earlier misconceptions. Importantly, nearly all of the currently used distance estimates (including those of Tamura, 1992; Tamura and Nei, 1994) are special cases (restrictions) of the general time-revers ..."
Abstract
-
Cited by 18 (2 self)
- Add to MetaCart
This paper aims to explain to biologists the assumptions of these distances and to clarify some earlier misconceptions. Importantly, nearly all of the currently used distance estimates (including those of Tamura, 1992; Tamura and Nei, 1994) are special cases (restrictions) of the general time-reversible distance (see Zharkikh, 1994; Swofford et al., 1996)
Recovering Evolutionary Trees Through Harmonic Greedy Triplets (Extended Abstract)
, 1999
"... . We give a greedy learning algorithm for reconstructing an evolutionary tree based on a harmonic average on triplets of taxa. This algorithm runs in polynomial time in the input size. Using the Jukes-Cantor model of evolution, our algorithm is mathematically proven to require sample sequences of on ..."
Abstract
-
Cited by 17 (4 self)
- Add to MetaCart
. We give a greedy learning algorithm for reconstructing an evolutionary tree based on a harmonic average on triplets of taxa. This algorithm runs in polynomial time in the input size. Using the Jukes-Cantor model of evolution, our algorithm is mathematically proven to require sample sequences of only polynomial lengths in the number of taxa in order to recover the correct tree topology with high probability. In addition to recovering the topology, the algorithm also estimates the tree edge lengths with high accuracy. Our theoretical analysis is supported by simulated experiments, in which the algorithm has demonstrated high success rates in reconstructing a large tree from short sequences. 1. Introduction. Algorithms for reconstructing evolutionary trees are principal tools in biology [11]. These algorithms usually compare aligned character sequences for the taxa in question to infer their evolutionary relationships [19]. In the past, such characters were often categorical variables o...
Fast Recovery of Evolutionary Trees with Thousands of Nodes
- RECOMB
, 2001
"... We present a novel distance-based algorithm for evolutionary tree reconstruction. Our algorithm reconstructs the topology of a tree with n leaves in O(n 2 ) time using O(n) working space. In the general Markov model of evolution, the algorithm recovers the topology successfully with (1-o(1)) probabi ..."
Abstract
-
Cited by 17 (0 self)
- Add to MetaCart
We present a novel distance-based algorithm for evolutionary tree reconstruction. Our algorithm reconstructs the topology of a tree with n leaves in O(n 2 ) time using O(n) working space. In the general Markov model of evolution, the algorithm recovers the topology successfully with (1-o(1)) probability from sequences with polynomial length in n. Moreover, for almost all trees, our algorithm achieves the same success probability on polylogarithmic sample sizes. The theoretical results are supported by simulation experiments involving trees with 500, 1895, and 3135 leaves. The topologies of the trees are recovered with high success from 2000 bp DNA sequences.

