Results 11  20
of
42
Slicing Hyperdimensional Oranges: The Geometry of Phylogenetic Estimation
 Mol. Phylo. Evol
, 2000
"... A new view of phylogenetic estimation is presented where data sets, tree evolution models, and estimation methods are placed in a common geometric framework. Each of these objects is placed in a vector space where the character patterns are the basis vectors. This viewpoint allows intuitive understa ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
A new view of phylogenetic estimation is presented where data sets, tree evolution models, and estimation methods are placed in a common geometric framework. Each of these objects is placed in a vector space where the character patterns are the basis vectors. This viewpoint allows intuitive understanding of various complex properties of the phylogenetic estimation problem structure. This is illustrated with examples discussing data set combinations, mixture models, consistency, and phylogenetic invariants. © 2000 Academic Press Key Words: geometry; accuracy; consistency; phylogenetic invariants; mixture models.
2006. Phylogeny of mixture models: Robustness of maximum likelihood and nonidentifiable distributions
"... We address phylogenetic reconstruction when the data is generated from a mixture distribution. Such topics have gained considerable attention in the biological community with the clear evidence of heterogeneity of mutation rates. In our work we consider data coming from a mixture of trees which shar ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
We address phylogenetic reconstruction when the data is generated from a mixture distribution. Such topics have gained considerable attention in the biological community with the clear evidence of heterogeneity of mutation rates. In our work we consider data coming from a mixture of trees which share a common topology, but differ in their edge weights (i.e., branch lengths). We first show the pitfalls of popular methods, including maximum likelihood and Markov chain Monte Carlo algorithms. We then determine in which evolutionary models, reconstructing the tree topology, under a mixture distribution, is (im)possible. We prove that every model whose transition matrices can be parameterized by an open set of multilinear polynomials, either has nonidentifiable mixture distributions, in which case reconstruction is impossible in general, or there exist linear tests which identify the topology. This duality theorem, relies on our notion of linear tests and uses ideas from convex programming duality. Linear tests are closely related to linear invariants, which were first introduced by Lake, and are natural from an algebraic geometry perspective.
Comparative Genomics via Phylogenetic Invariants for JukesCantor Semigroups
, 1999
"... We review the theory of invariants as it has been developed for comparing the DNA sequences of homologous genes from phylogenetically related species, with particular attention to the semigroups used to model sequence evolution. We also outline the computational theory of genome rearrangements, i ..."
Abstract

Cited by 7 (4 self)
 Add to MetaCart
We review the theory of invariants as it has been developed for comparing the DNA sequences of homologous genes from phylogenetically related species, with particular attention to the semigroups used to model sequence evolution. We also outline the computational theory of genome rearrangements, including the optimization problems in calculating edit distances between genomes and the simpler notion of breakpoint distance. The combinatorics of rearrangements, involving nonlocal changes in the relative order of genes in the genome, are more complex than the base substitutions responsible for gene sequence evolution. Nevertheless we can construct a partial model of gene order evolution through symmetry assumptions about disruptions of genes adjacencies. Based on the extended JukesCantor semigroup that results, we can derive a complete set of linear phylogenetic invariants. We use these invariants to relate mitochondrial genomes from a number of animal phyla. 1 Invariants for...
Phylogenetic Invariants for Genome Rearrangements
 Journal of Computational Biology
, 1999
"... We review the combinatorial optimization problems in calculating edit distances between genomes and phylogenetic inference based on minimizing gene order changes. With a view to avoiding the computational cost and the "long branches attract" artifact of some treebuilding methods, we explore the prob ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
We review the combinatorial optimization problems in calculating edit distances between genomes and phylogenetic inference based on minimizing gene order changes. With a view to avoiding the computational cost and the "long branches attract" artifact of some treebuilding methods, we explore the probabilization of genome rearrangement models prior to developing a methodology based on branchlength invariants. We characterize probabilistically the evolution of the structure of the gene adjacency set for reversals on unsigned circular genomes and, using a nontrivial recurrence relation, reversals on signed genomes. Concepts from the theory of invariants developed for the phylogenetics of homologous gene sequences can be used to derive a complete set of linear invariants for unsigned reversals, as well as for a mixed rearrangement model for signed genomes, though not for pure transposition or pure signed reversal models. The invariants are based on an extended JukesCantor semigroup. We i...
Constructing and counting phylogenetic invariants
 Journal of Computational Biology
, 1998
"... Abstract. The method of invariants is an approach to the problem of reconstructing the phylogenetic tree of a collection of m taxa using nucleotide sequence data. Models for the respective probabilities of the 4m possible vectors of bases at a given site will have unknown parameters that describe th ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
Abstract. The method of invariants is an approach to the problem of reconstructing the phylogenetic tree of a collection of m taxa using nucleotide sequence data. Models for the respective probabilities of the 4m possible vectors of bases at a given site will have unknown parameters that describe the random mechanism by which substitution occurs along the branches of a putative phylogenetic tree. An invariant is a polynomial in these probabilities that, for a given phylogeny, is zero for all choices of the substitution mechanism parameters. If the invariantistypically non{zero for another phylogenetic tree, then estimates of the invariant can be used as evidence to support one phylogeny over another. Previous work of Evans and Speed showed that, for certain commonly used substitution models, the problem of nding a minimal generating set for the ideal of invariants can be reduced to the linear algebra problem of nding a basis for a certain lattice (that is, a free Zmodule). They also conjectured that the cardinality ofsuch a generating set can be computed using a simple \degrees of freedom " formula. We verify this conjecture. Along the way, we explain in detail how the observations of Evans and Speed lead to a simple, computationally feasible algorithm for constructing a minimal generating set. 1.
Computational advances in maximum likelihood methods for molecular phylogeny. Genome Research
 Genome Research
, 1998
"... service ..."
Pitfalls of heterogeneous processes for phylogenetic reconstruction
 Systematic Biology
, 2006
"... Different genes often have different phylogenetic histories. Even within regions having the same phylogenetic history, the mutation rates often vary. We investigate the prospects of phylogenetic reconstruction when all the characters are generated from the same tree topology, but the branch lengths ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
Different genes often have different phylogenetic histories. Even within regions having the same phylogenetic history, the mutation rates often vary. We investigate the prospects of phylogenetic reconstruction when all the characters are generated from the same tree topology, but the branch lengths vary (with possibly different tree shapes). Furthering work of Kolaczkowski and Thornton (2004) and Chang (1996), we show examples where maximum likelihood (under a homogeneous model) is an inconsistent estimator of the tree. We then explore the prospects of phylogenetic inference under a heterogeneous model. In some models, there are examples where phylogenetic inference under any method is impossible – despite the fact that there is a common tree topology. In particular, there are nonidentifiable mixture distributions, i.e., multiple topologies generate identical mixture distributions. We address which evolutionary models have nonidentifiable mixture distributions and prove that the following duality theorem holds for most DNA substitution models. The model has either: (i) Nonidentifiability – two different tree topologies can produce identical mixture distributions, and hence distinguishing between the two topologies is impossible; or (ii) Linear tests – there exist linear tests which identify the common tree topology for character data generated by a mixture distribution. The theorem holds for models whose transition matrices can be parameterized by open sets, which includes most of the popular models, such as TamuraNei and Kimura’s 2parameter model. The duality theorem relies on our notion of linear tests, which are related to Lake’s linear invariants. 1
Phylogenetic Invariants for Metazoan Mitochondrial Genome Evolution
 GENOME INFORMATICS
, 1998
"... The method of phylogenetic invariants was developed to apply to aligned sequence data generated, according to a stochastic substitution model, for N species related through an unknown phylogenetic tree. The invariants are functions of the probabilities of the observable Ntuples, which are identi ..."
Abstract

Cited by 4 (3 self)
 Add to MetaCart
The method of phylogenetic invariants was developed to apply to aligned sequence data generated, according to a stochastic substitution model, for N species related through an unknown phylogenetic tree. The invariants are functions of the probabilities of the observable Ntuples, which are identically zero, over all choices of branch length, for some trees. Evaluating the invariants associated with all possible trees, using observed Ntuple frequencies over all sequence positions, enables us to rapidly infer the generating tree. An aspect of evolution at the genomic level much studied recently is the rearrangements of gene order along the chromosome from one species to another. Instead of the substitutions responsible for sequence evolution, we examine the nonlocal processes responsible for genome rearrangements such as inversion of arbitrarily long segments of chromosomes. By treating the potential adjacency of each possible pair of genes as a "position", an appropriate "substitution" model can be recognized as governing the rearrangement process, and a probabilistically principled phylogenetic inference can be set up. We calculate the invariants for this process for N = 5, and apply them to mitochondrial genome data from coelomate metazoans, showing how they resolvekey aspects of branching order.
Metric learning for phylogenetic invariants
, 2008
"... We introduce new methods for phylogenetic tree quartet construction by using machine learning to optimize the power of phylogenetic invariants. Phylogenetic invariants are polynomials in the joint probabilities which vanish under a model of evolution on a phylogenetic tree. We give algorithms for se ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
We introduce new methods for phylogenetic tree quartet construction by using machine learning to optimize the power of phylogenetic invariants. Phylogenetic invariants are polynomials in the joint probabilities which vanish under a model of evolution on a phylogenetic tree. We give algorithms for selecting a good set of invariants and for learning a metric on this set of invariants which optimally distinguishes the different models. Our learning algorithms involve linear and semidefinite programming on data simulated over a wide range of parameters. We provide extensive tests of the learned metrics on simulated data from phylogenetic trees with four leaves under the JukesCantor and Kimura 3parameter models of DNA evolution. Our method greatly improves on other uses of invariants and is competitive with or better than neighborjoining. In particular, we obtain metrics trained on trees with short internal branches which perform much better than neighbor joining on this region of parameter space.
PHYLOGENETIC ALGEBRAIC GEOMETRY
, 2004
"... Abstract. Phylogenetic algebraic geometry is concerned with certain complex projective algebraic varieties derived from finite trees. Real positive points on these varieties represent probabilistic models of evolution. For small trees, we recover classical geometric objects, such as toric and determ ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
Abstract. Phylogenetic algebraic geometry is concerned with certain complex projective algebraic varieties derived from finite trees. Real positive points on these varieties represent probabilistic models of evolution. For small trees, we recover classical geometric objects, such as toric and determinantal varieties and their secant varieties, but larger trees lead to new and largely unexplored territory. This paper gives a selfcontained introduction to this subject and offers numerous open problems for algebraic geometers. 1.