Results 11 - 20
of
29
Constructing and counting phylogenetic invariants
- Journal of Computational Biology
, 1998
"... Abstract. The method of invariants is an approach to the problem of reconstructing the phylogenetic tree of a collection of m taxa using nucleotide sequence data. Models for the respective probabilities of the 4m possible vectors of bases at a given site will have unknown parameters that describe th ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
Abstract. The method of invariants is an approach to the problem of reconstructing the phylogenetic tree of a collection of m taxa using nucleotide sequence data. Models for the respective probabilities of the 4m possible vectors of bases at a given site will have unknown parameters that describe the random mechanism by which substitution occurs along the branches of a putative phylogenetic tree. An invariant is a polynomial in these probabilities that, for a given phylogeny, is zero for all choices of the substitution mechanism parameters. If the invariantistypically non{zero for another phylogenetic tree, then estimates of the invariant can be used as evidence to support one phylogeny over another. Previous work of Evans and Speed showed that, for certain commonly used substitution models, the problem of nding a minimal generating set for the ideal of invariants can be reduced to the linear algebra problem of nding a basis for a certain lattice (that is, a free Z-module). They also conjectured that the cardinality ofsuch a generating set can be computed using a simple \degrees of freedom " formula. We verify this conjecture. Along the way, we explain in detail how the observations of Evans and Speed lead to a simple, computationally feasible algorithm for constructing a minimal generating set. 1.
Phylogenetic Invariants for Genome Rearrangements
- Journal of Computational Biology
, 1999
"... We review the combinatorial optimization problems in calculating edit distances between genomes and phylogenetic inference based on minimizing gene order changes. With a view to avoiding the computational cost and the "long branches attract" artifact of some treebuilding methods, we explore the prob ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
We review the combinatorial optimization problems in calculating edit distances between genomes and phylogenetic inference based on minimizing gene order changes. With a view to avoiding the computational cost and the "long branches attract" artifact of some treebuilding methods, we explore the probabilization of genome rearrangement models prior to developing a methodology based on branch-length invariants. We characterize probabilistically the evolution of the structure of the gene adjacency set for reversals on unsigned circular genomes and, using a nontrivial recurrence relation, reversals on signed genomes. Concepts from the theory of invariants developed for the phylogenetics of homologous gene sequences can be used to derive a complete set of linear invariants for unsigned reversals, as well as for a mixed rearrangement model for signed genomes, though not for pure transposition or pure signed reversal models. The invariants are based on an extended Jukes-Cantor semigroup. We i...
Slicing Hyperdimensional Oranges: The Geometry of Phylogenetic Estimation
- Mol. Phylo. Evol
, 2000
"... A new view of phylogenetic estimation is presented where data sets, tree evolution models, and estimation methods are placed in a common geometric framework. Each of these objects is placed in a vector space where the character patterns are the basis vectors. This viewpoint allows intuitive understa ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
A new view of phylogenetic estimation is presented where data sets, tree evolution models, and estimation methods are placed in a common geometric framework. Each of these objects is placed in a vector space where the character patterns are the basis vectors. This viewpoint allows intuitive understanding of various complex properties of the phylogenetic estimation problem structure. This is illustrated with examples discussing data set combinations, mixture models, consistency, and phylogenetic invariants. © 2000 Academic Press Key Words: geometry; accuracy; consistency; phylogenetic invariants; mixture models.
2006. Phylogeny of mixture models: Robustness of maximum likelihood and nonidentifiable distributions
"... We address phylogenetic reconstruction when the data is generated from a mixture distribution. Such topics have gained considerable attention in the biological community with the clear evidence of heterogeneity of mutation rates. In our work we consider data coming from a mixture of trees which shar ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
We address phylogenetic reconstruction when the data is generated from a mixture distribution. Such topics have gained considerable attention in the biological community with the clear evidence of heterogeneity of mutation rates. In our work we consider data coming from a mixture of trees which share a common topology, but differ in their edge weights (i.e., branch lengths). We first show the pitfalls of popular methods, including maximum likelihood and Markov chain Monte Carlo algorithms. We then determine in which evolutionary models, reconstructing the tree topology, under a mixture distribution, is (im)possible. We prove that every model whose transition matrices can be parameterized by an open set of multilinear polynomials, either has non-identifiable mixture distributions, in which case reconstruction is impossible in general, or there exist linear tests which identify the topology. This duality theorem, relies on our notion of linear tests and uses ideas from convex programming duality. Linear tests are closely related to linear invariants, which were first introduced by Lake, and are natural from an algebraic geometry perspective.
Phylogenetic Invariants for Metazoan Mitochondrial Genome Evolution
- GENOME INFORMATICS
, 1998
"... The method of phylogenetic invariants was developed to apply to aligned sequence data generated, according to a stochastic substitution model, for N species related through an unknown phylogenetic tree. The invariants are functions of the probabilities of the observable N-tuples, which are identi ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
The method of phylogenetic invariants was developed to apply to aligned sequence data generated, according to a stochastic substitution model, for N species related through an unknown phylogenetic tree. The invariants are functions of the probabilities of the observable N-tuples, which are identically zero, over all choices of branch length, for some trees. Evaluating the invariants associated with all possible trees, using observed N-tuple frequencies over all sequence positions, enables us to rapidly infer the generating tree. An aspect of evolution at the genomic level much studied recently is the rearrangements of gene order along the chromosome from one species to another. Instead of the substitutions responsible for sequence evolution, we examine the non-local processes responsible for genome rearrangements such as inversion of arbitrarily long segments of chromosomes. By treating the potential adjacency of each possible pair of genes as a "position", an appropriate "substitution" model can be recognized as governing the rearrangement process, and a probabilistically principled phylogenetic inference can be set up. We calculate the invariants for this process for N = 5, and apply them to mitochondrial genome data from coelomate metazoans, showing how they resolvekey aspects of branching order.
Metric learning for phylogenetic invariants
, 2008
"... We introduce new methods for phylogenetic tree quartet construction by using machine learning to optimize the power of phylogenetic invariants. Phylogenetic invariants are polynomials in the joint probabilities which vanish under a model of evolution on a phylogenetic tree. We give algorithms for se ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
We introduce new methods for phylogenetic tree quartet construction by using machine learning to optimize the power of phylogenetic invariants. Phylogenetic invariants are polynomials in the joint probabilities which vanish under a model of evolution on a phylogenetic tree. We give algorithms for selecting a good set of invariants and for learning a metric on this set of invariants which optimally distinguishes the different models. Our learning algorithms involve linear and semidefinite programming on data simulated over a wide range of parameters. We provide extensive tests of the learned metrics on simulated data from phylogenetic trees with four leaves under the Jukes-Cantor and Kimura 3-parameter models of DNA evolution. Our method greatly improves on other uses of invariants and is competitive with or better than neighbor-joining. In particular, we obtain metrics trained on trees with short internal branches which perform much better than neighbor joining on this region of parameter space.
PHYLOGENETIC ALGEBRAIC GEOMETRY
, 2004
"... Abstract. Phylogenetic algebraic geometry is concerned with certain complex projective algebraic varieties derived from finite trees. Real positive points on these varieties represent probabilistic models of evolution. For small trees, we recover classical geometric objects, such as toric and determ ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Abstract. Phylogenetic algebraic geometry is concerned with certain complex projective algebraic varieties derived from finite trees. Real positive points on these varieties represent probabilistic models of evolution. For small trees, we recover classical geometric objects, such as toric and determinantal varieties and their secant varieties, but larger trees lead to new and largely unexplored territory. This paper gives a self-contained introduction to this subject and offers numerous open problems for algebraic geometers. 1.
Using invariants for phylogenetic tree construction,” in Emerging Applications of Algebraic Geometry
, 2008
"... Abstract. Phylogenetic invariants are certain polynomials in the joint probability distribution of a Markov model on a phylogenetic tree. Such polynomials are of theoretical interest in the field of algebraic statistics and they are also of practical interest—they can be used to construct phylogenet ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Abstract. Phylogenetic invariants are certain polynomials in the joint probability distribution of a Markov model on a phylogenetic tree. Such polynomials are of theoretical interest in the field of algebraic statistics and they are also of practical interest—they can be used to construct phylogenetic trees. This paper is a self-contained introduction to the algebraic, statistical, and computational challenges involved in the practical use of phylogenetic invariants. We survey the relevant literature and provide some partial answers and many open problems.
Computational advances in maximum likelihood methods for molecular phylogeny. Genome Research
- Genome Research
, 1998
"... service ..."
Pitfalls of heterogeneous processes for phylogenetic reconstruction
- Systematic Biology
, 2006
"... Different genes often have different phylogenetic histories. Even within regions having the same phylogenetic history, the mutation rates often vary. We investigate the prospects of phylogenetic reconstruction when all the characters are generated from the same tree topology, but the branch lengths ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Different genes often have different phylogenetic histories. Even within regions having the same phylogenetic history, the mutation rates often vary. We investigate the prospects of phylogenetic reconstruction when all the characters are generated from the same tree topology, but the branch lengths vary (with possibly different tree shapes). Furthering work of Kolaczkowski and Thornton (2004) and Chang (1996), we show examples where maximum likelihood (under a homogeneous model) is an inconsistent estimator of the tree. We then explore the prospects of phylogenetic inference under a heterogeneous model. In some models, there are examples where phylogenetic inference under any method is impossible – despite the fact that there is a common tree topology. In particular, there are non-identifiable mixture distributions, i.e., multiple topologies generate identical mixture distributions. We address which evolutionary models have nonidentifiable mixture distributions and prove that the following duality theorem holds for most DNA substitution models. The model has either: (i) Non-identifiability – two different tree topologies can produce identical mixture distributions, and hence distinguishing between the two topologies is impossible; or (ii) Linear tests – there exist linear tests which identify the common tree topology for character data generated by a mixture distribution. The theorem holds for models whose transition matrices can be parameterized by open sets, which includes most of the popular models, such as Tamura-Nei and Kimura’s 2-parameter model. The duality theorem relies on our notion of linear tests, which are related to Lake’s linear invariants. 1

