Results 1  10
of
18
IDENTIFIABILITY OF A MARKOVIAN MODEL OF MOLECULAR EVOLUTION WITH GAMMADISTRIBUTED RATES
, 2008
"... Inference of evolutionary trees and rates from biological sequences is commonly performed using continuoustime Markov models of character change. The Markov process evolves along an unknown tree while observations arise only from the tips of the tree. Rate heterogeneity is present in most real data ..."
Abstract

Cited by 17 (4 self)
 Add to MetaCart
(Show Context)
Inference of evolutionary trees and rates from biological sequences is commonly performed using continuoustime Markov models of character change. The Markov process evolves along an unknown tree while observations arise only from the tips of the tree. Rate heterogeneity is present in most real data sets and is accounted for by the use of flexible mixture models where each site is allowed its own rate. Very little has been rigorously established concerning the identifiability of the models currently in common use in data analysis, although nonidentifiability was proven for a semiparametric model and an incorrect proof of identifiability was published for a general parametric model (GTR+Γ+I). Here we prove that one of the most widely used models (GTR+Γ) is identifiable for generic parameters, and for all parameter choices in the case of 4state (DNA) models. This is the first proof of identifiability of a phylogenetic model with a continuous distribution of rates.
Population Recovery and Partial Identification
"... We study several problems in which an unknown distribution over an unknown population of vectors needs to be recovered from partial or noisy samples, each of which nearly completely erases or obliterates the original vector. Such problems naturally arise in a variety of contexts in learning, cluster ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
(Show Context)
We study several problems in which an unknown distribution over an unknown population of vectors needs to be recovered from partial or noisy samples, each of which nearly completely erases or obliterates the original vector. Such problems naturally arise in a variety of contexts in learning, clustering, statistics, computational biology, data mining and database privacy, where loss and error may be introduced by nature, inaccurate measurements, or on purpose. We give fairly efficient algorithms to recover the data under fairly general assumptions. Underlying our algorithms is a new structure we call a partial identification (PID) graph for an arbitrary finite set of vectors over any alphabet. This graph captures the extent to which certain subsets of coordinates in each vector distinguish it from other vectors. PID graphs yield strategies for dimension reductions and reassembly of statistical information. The quality of our algorithms (sequential and parallel runtime, as well as numerical stability) critically depends on three parameters of PID graphs: width, depth and cost. The combinatorial heart of this work is showing that every set of vectors posses a PID graph in which all three parameters are small (we prove some limitations on their tradeoffs as well). We further give an efficient algorithm to find such nearoptimal PID graphs for any set of vectors. Our efficient PID graphs imply general algorithms for these recovery problems, even when loss or noise are just below the informationtheoretic limit! In the learning/clustering context this gives a new algorithm for learning mixtures of binomial distributions (with known marginals) whose running time depends only quasipolynomially on the number of clusters. We discuss implications to privacy and coding as well.
Identifiability of 2tree mixtures for groupbased models
, 2009
"... Phylogenetic data arising on two possibly different tree topologies might be mixed through several biological mechanisms, including incomplete lineage sorting or horizontal gene transfer in the case of different topologies, or simply different substitution processes on characters in the case of the ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
Phylogenetic data arising on two possibly different tree topologies might be mixed through several biological mechanisms, including incomplete lineage sorting or horizontal gene transfer in the case of different topologies, or simply different substitution processes on characters in the case of the same topology. Recent work on a 2state symmetric model of character change showed such a mixture model has nonidentifiable parameters, and thus it is theoretically impossible to determine the two tree topologies from any amount of data under such circumstances. Here the question of identifiability is investigated for 2tree mixtures of the 4state groupbased models, which are more relevant to DNA sequence data. Using algebraic techniques, we show that the tree parameters are identifiable for the JC and K2P models. We also prove that generic substitution parameters for the JC mixture models are identifiable, and for the K2P and K3P models obtain generic identifiability results for mixtures on the same tree. This indicates that the full phylogenetic signal remains in such mixtures, and that the 2state symmetric result is thus a misleading guide to the behavior of other models.
The space of phylogenetic mixtures of equivariant models
"... Abstract. Background: The selection of an evolutionary model to best fit given molecular data is usually a heuristic choice. In his seminal book, J. Felsenstein suggested that certain linear equations satisfied by the expected probabilities of patterns observed at the leaves of a phylogenetic tree c ..."
Abstract

Cited by 4 (3 self)
 Add to MetaCart
(Show Context)
Abstract. Background: The selection of an evolutionary model to best fit given molecular data is usually a heuristic choice. In his seminal book, J. Felsenstein suggested that certain linear equations satisfied by the expected probabilities of patterns observed at the leaves of a phylogenetic tree could be used for model selection. It remained an open question, however, whether these equations were sufficient to fully characterize the evolutionary model under consideration. Results: Here we prove that, for most equivariant models of evolution, the space of distributions satisfying these linear equations coincides with the space of distributions arising from mixtures of trees. In other words, we prove that the evolution of an observed multiple sequence alignment can be modeled by a mixture of phylogenetic trees under an equivariant evolutionary model if and only if the distribution of patterns at its columns satisfies the linear equations mentioned above. Moreover, we provide a set of linearly independent equations defining this space of phylogenetic mixtures for each equivariant model and for any number of taxa. Lastly, we use these results to perform a study of identifiability of phylogenetic mixtures. Conclusions: The space of phylogenetic mixtures under equivariant models is a linear space that fully characterizes the evolutionary model. We provide an explicit algorithm to obtain the equations defining these spaces for a number of models and taxa. Its implementation has proved to be a powerful tool for model selection.
Identifiability of 3class jukescantor mixtures, arXiv preprint arXiv:1406.7256
"... Abstract. We prove identifiability of the tree parameters of the 3class JukesCantor mixture model. The proof uses ideas from algebraic statistics, in particular: finding phylogenetic invariants that separate the varieties associated to different triples of trees; computing dimensions of the resul ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Abstract. We prove identifiability of the tree parameters of the 3class JukesCantor mixture model. The proof uses ideas from algebraic statistics, in particular: finding phylogenetic invariants that separate the varieties associated to different triples of trees; computing dimensions of the resulting phylogenetic varieties; and using the disentangling number to reduce to trees with a small number of leaves. Symbolic computation also plays a key role in handling the many different cases and finding relevant phylogenetic invariants. 1.
Enclose it or Lose it! Computeraided Proofs in Statistics
, 2010
"... Enclosure methods are a class of computeraided proofs used in analysis. They are used increasingly to solve open problems in mathematics. The proposed project will use enclosure methods to address two open statistical decision problems: 1. rigorous parameter estimation in a chaotic statistical expe ..."
Abstract
 Add to MetaCart
Enclosure methods are a class of computeraided proofs used in analysis. They are used increasingly to solve open problems in mathematics. The proposed project will use enclosure methods to address two open statistical decision problems: 1. rigorous parameter estimation in a chaotic statistical experiment, and 2. rigorous point estimation and exact posterior sampling in phylogenetics. To address these problems, we will adapt and extend recent developments in contractor programming, interval constraint propagation, algebraic statistical constraints and employ a novel mapped subpaving arithmetic. A C++ class library that can harness UC’s super computing power for such computeraided proofs will be made publicly available along with a database of solutions.
extinction and
, 2002
"... Neural response to emotional faces with and without awareness: ..."
(Show Context)
TROPICAL MIXTURES OF STAR TREE METRICS
, 907
"... Abstract. We study tree metrics that can be realized as a mixture of two star tree metrics. We prove that the only trees admitting such a decomposition are the ones having only one internal edge and, moreover, certain relations among the weights assigned to all edges must hold. We also describe the ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract. We study tree metrics that can be realized as a mixture of two star tree metrics. We prove that the only trees admitting such a decomposition are the ones having only one internal edge and, moreover, certain relations among the weights assigned to all edges must hold. We also describe the fibers of the corresponding mixture map. In addition, we discuss the general framework of tropical secant varieties and we interpret our results within this setting. Finally, after discussing recent results on upper bounds on star tree ranks of metrics on n taxa, we show that analogous bounds for star tree metric ranks cannot exist. 1.
Similarities as Evidence for Common Ancestry–A Likelihood Epistemology
, 2015
"... 3 A sufficient condition for a similarity to favor common ancestry ..."