Gascuel O. Approximate LikelihoodRatio Test for Branches: A
 Fast, Accurate, and Powerful Alternative. Systematic Biology
"... Abstract.—We revisit statistical tests for branches of evolutionary trees reconstructed upon molecular data. A new, fast, approximate likelihoodratio test (aLRT) for branches is presented here as a competitive alternative to nonparametric bootstrap and Bayesian estimation of branch support. The aLR ..."
Abstract.—We revisit statistical tests for branches of evolutionary trees reconstructed upon molecular data. A new, fast, approximate likelihoodratio test (aLRT) for branches is presented here as a competitive alternative to nonparametric bootstrap and Bayesian estimation of branch support. The aLRT is based on the idea of the conventional LRT, with the null hypothesis corresponding to the assumption that the inferred branch has length 0. We show that the LRT statistic is asymptotically distributed as a maximum of three random variables drawn from the 1 2 1 2 χ 2 0 + χ
Full reconstruction of Markov models on evolutionary trees: identifiability and consistency
 Math. Biosci
, 1996
"... A Markov model of evolution of characters on a phylogenetic tree consists of a tree topology together with a speci cation of probability transition matrices on the edges of the tree. Previous work has shown that under mild conditions, the tree topology may be reconstructed, in the sense that the top ..."
A Markov model of evolution of characters on a phylogenetic tree consists of a tree topology together with a speci cation of probability transition matrices on the edges of the tree. Previous work has shown that under mild conditions, the tree topology may be reconstructed, in the sense that the topology is identi able from knowledge of the joint distribution of character states at pairs of terminal nodes of the tree. Also, the method of maximum likelihood is statistically consistent for inferring the tree topology. In this paper we answer the analogous questions for reconstruction of the full model, including the edge transition matrices: under mild conditions, such full reconstruction is achievable, not by using pairs of terminal nodes, but rather by using triples of terminal nodes. The identi ability result generalizes previous results that were restricted either to characters having two states or to transition matrices having special structure. The proof develops matrix relationships that may be exploited to identify the model. We also use the identi ability result to prove that the method of maximum likelihood is consistent for reconstructing the full model. 1 markov models on evolutionary trees 2 1
Phylogenetic Tree Construction Using Markov Chain Monte Carlo
, 1999
"... We describe a Bayesian method based on Markov chain simulation to study the phylogenetic relationship in a group of DNA sequences. Under simple models of mutational events, our method produces a Markov chain whose stationary distribution is the conditional distribution of the phylogeny given the obs ..."
We describe a Bayesian method based on Markov chain simulation to study the phylogenetic relationship in a group of DNA sequences. Under simple models of mutational events, our method produces a Markov chain whose stationary distribution is the conditional distribution of the phylogeny given the observed sequences. Our algorithm strikes a reasonable balance between the desire to move globally through the space of phylogenies and the need to make computationally feasible moves in areas of high probability. Since phylogenetic information is described by a tree, we have created new diagnostics to handle this type of data structure. An important byproduct of the Markov chain Monte Carlo phylogeny building technique is that it provides estimates and corresponding measures of variability for any aspect of the phylogeny under study.
Protein phylogenies and signature sequences: evolutionary relationships within prokaryotes and between prokaryotes and eukaryotes. Antonie Leeuwenhoek 72:49–61
, 1997
"... Updated information and services can be found at: ..."
Discordance of species trees with their most likely gene trees. PLoS Genetics 2(5)e68
 Journal of Bioinformatics and Computational Biology
, 2006
"... Because of the stochastic way in which lineages sort during speciation, gene trees may differ in topology from each other and from species trees. Surprisingly, assuming that genetic lineages follow a coalescent model of withinspecies evolution, we find that for any species tree topology with five o ..."
Because of the stochastic way in which lineages sort during speciation, gene trees may differ in topology from each other and from species trees. Surprisingly, assuming that genetic lineages follow a coalescent model of withinspecies evolution, we find that for any species tree topology with five or more species, there exist branch lengths for which gene tree discordance is so common that the most likely gene tree topology to evolve along the branches of a species tree differs from the species phylogeny. This counterintuitive result implies that in combining data on multiple loci, the straightforward procedure of using the most frequently observed gene tree topology as an estimate of the species tree topology can be asymptotically guaranteed to produce an incorrect estimate. We conclude with suggestions that can aid in overcoming this new obstacle to accurate genomic inference of species phylogenies. Citation: Degnan JH, Rosenberg NA (2006) Discordance of species trees with their most likely gene trees. PLoS Genet 2(5): e68. DOI: 10.1371/journal.pgen.0020068
Efficient Algorithms for Inverting Evolution
 Proceedings of the ACM Symposium on the Foundations of Computer Science
, 1999
"... Evolution can be mathematically modelled by a stochastic process that operates on the DNA of species. Such models are based on the established theory that the DNA sequences, or genomes, of all extant species have been derived from the genome of the common ancestor of all species by a process of rand ..."
Evolution can be mathematically modelled by a stochastic process that operates on the DNA of species. Such models are based on the established theory that the DNA sequences, or genomes, of all extant species have been derived from the genome of the common ancestor of all species by a process of random mutation and natural selection. A stochastic model...
Constructing Big Trees from Short Sequences
 Proceedings of the 24th International Colloquium on Automata, Languages, and Programming
, 1997
"... . The construction of evolutionary trees is a fundamental problem in biology, and yet methods for reconstructing evolutionary trees are not reliable when it comes to inferring accurate topologies of large divergent evolutionary trees from realistic length sequences. We address this problem and prese ..."
. The construction of evolutionary trees is a fundamental problem in biology, and yet methods for reconstructing evolutionary trees are not reliable when it comes to inferring accurate topologies of large divergent evolutionary trees from realistic length sequences. We address this problem and present a new polynomial time algorithm for reconstructing evolutionary trees called the Short Quartets Method which is consistent and which has greater statistical power than other polynomial time methods, such as NeighborJoining and the 3approximation algorithm by Agarwala et al. (and the "Double Pivot" variant of the Agarwala et al. algorithm by Cohen and Farach) for the L1nearest tree problem. Our study indicates that our method will produce the correct topology from shorter sequences than can be guaranteed using these other methods. 1 Introduction Evolutionary trees indicate how species evolved from a common ancestor and are of fundamental concern to biologists. There are many methods f...
Increased taxon sampling greatly reduces phylogenetic error. Syst. Biol
, 2002
"... Abstract.—Several authors have argued recently that extensive taxon sampling has a positive and important effect on the accuracy of phylogenetic estimates. However, other authors have argued that there is little bene�t of extensive taxon sampling, and so phylogenetic problems can or should be reduce ..."
Abstract.—Several authors have argued recently that extensive taxon sampling has a positive and important effect on the accuracy of phylogenetic estimates. However, other authors have argued that there is little bene�t of extensive taxon sampling, and so phylogenetic problems can or should be reduced to a few exemplar taxa as a means of reducing the computational complexity of the phylogenetic analysis. In this paper we examined �ve aspects of study design that may have led to these different perspectives. First, we considered the measurement of phylogenetic error across a wide range of taxon sample sizes, and conclude that the expected error based on randomly selecting trees (which varies by taxon sample size) must be considered in evaluating error in studies of the effects of taxon sampling. Second, we addressed the scope of the phylogenetic problems de�ned by different samples of taxa, and argue that phylogenetic scope needs to be considered in evaluating the importance of taxonsampling strategies. Third, we examined the claim that fast and simple tree searches are as effective as more thorough searches at �nding nearoptimal trees that minimize error. We show that a more complete search of tree space reduces phylogenetic error, especially as the taxon sample size increases. Fourth, we examined the effects of simple versus complex simulation models on taxonomic sampling studies. Although bene�ts of taxon sampling are apparent for all models,
Learning Nonsingular Phylogenies and Hidden Markov Models
 Proceedings of the thirtyseventh annual ACM Symposium on Theory of computing, Baltimore (STOC05
, 2005
"... In this paper, we study the problem of learning phylogenies and hidden Markov models. We call the Markov model nonsingular if all transtion matrices have determinants bounded away from 0 (and 1). We highlight the role of the nonsingularity condition for the learning problem. Learning hidden Markov m ..."
In this paper, we study the problem of learning phylogenies and hidden Markov models. We call the Markov model nonsingular if all transtion matrices have determinants bounded away from 0 (and 1). We highlight the role of the nonsingularity condition for the learning problem. Learning hidden Markov models without the nonsingularity condition is at least as hard as learning parity with noise. On the other hand, we give a polynomialtime algorithm for learning nonsingular phylogenies and hidden Markov models.
General TimeReversible Distances with Unequal Rates across Sites: Mixing Γ and Inverse Gaussian Distributions with Invariant Sites
, 1997
"... This paper aims to explain to biologists the assumptions of these distances and to clarify some earlier misconceptions. Importantly, nearly all of the currently used distance estimates (including those of Tamura, 1992; Tamura and Nei, 1994) are special cases (restrictions) of the general timerevers ..."
This paper aims to explain to biologists the assumptions of these distances and to clarify some earlier misconceptions. Importantly, nearly all of the currently used distance estimates (including those of Tamura, 1992; Tamura and Nei, 1994) are special cases (restrictions) of the general timereversible distance (see Zharkikh, 1994; Swofford et al., 1996)