Results 1  10
of
21
RHODES,J.A.(2009). Identifiability of parameters in latent structure models with many observed variables
 Ann. Statist
"... While hidden class models of various types arise in many statistical applications, it is often difficult to establish the identifiability of their parameters. Focusing on models in which there is some structure of independence of some of the observed variables conditioned on hidden ones, we demonstr ..."
Abstract

Cited by 21 (4 self)
 Add to MetaCart
While hidden class models of various types arise in many statistical applications, it is often difficult to establish the identifiability of their parameters. Focusing on models in which there is some structure of independence of some of the observed variables conditioned on hidden ones, we demonstrate a general approach for establishing identifiability utilizing algebraic arguments. A theorem of J. Kruskal for a simple latentclass model with finite state space lies at the core of our results, though we apply it to a diverse set of models. These include mixtures of both finite and nonparametric product distributions, hidden Markov models and random graph mixture models, and lead to a number of new results and improvements to old ones. In the parametric setting, this approach indicates that for such models, the classical definition of identifiability is typically too strong. Instead generic identifiability holds, which implies that the set of nonidentifiable parameters has measure zero, so that parameter inference is still meaningful. In particular, this sheds light on the properties of finite mixtures of Bernoulli products, which have been used for decades despite being known to have nonidentifiable parameters. In the nonparametric setting, we again obtain identifiability only when certain restrictions are placed on the distributions that are mixed, but we explicitly describe the conditions. 1. Introduction. Statistical
Performance of a New Invariants Method on Homogeneous and Nonhomogeneous Quartet Trees
, 2006
"... ..."
Using invariants for phylogenetic tree construction,” in Emerging Applications of Algebraic Geometry
, 2008
"... Abstract. Phylogenetic invariants are certain polynomials in the joint probability distribution of a Markov model on a phylogenetic tree. Such polynomials are of theoretical interest in the field of algebraic statistics and they are also of practical interest—they can be used to construct phylogenet ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
Abstract. Phylogenetic invariants are certain polynomials in the joint probability distribution of a Markov model on a phylogenetic tree. Such polynomials are of theoretical interest in the field of algebraic statistics and they are also of practical interest—they can be used to construct phylogenetic trees. This paper is a selfcontained introduction to the algebraic, statistical, and computational challenges involved in the practical use of phylogenetic invariants. We survey the relevant literature and provide some partial answers and many open problems.
2006. Phylogeny of mixture models: Robustness of maximum likelihood and nonidentifiable distributions
"... We address phylogenetic reconstruction when the data is generated from a mixture distribution. Such topics have gained considerable attention in the biological community with the clear evidence of heterogeneity of mutation rates. In our work we consider data coming from a mixture of trees which shar ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
We address phylogenetic reconstruction when the data is generated from a mixture distribution. Such topics have gained considerable attention in the biological community with the clear evidence of heterogeneity of mutation rates. In our work we consider data coming from a mixture of trees which share a common topology, but differ in their edge weights (i.e., branch lengths). We first show the pitfalls of popular methods, including maximum likelihood and Markov chain Monte Carlo algorithms. We then determine in which evolutionary models, reconstructing the tree topology, under a mixture distribution, is (im)possible. We prove that every model whose transition matrices can be parameterized by an open set of multilinear polynomials, either has nonidentifiable mixture distributions, in which case reconstruction is impossible in general, or there exist linear tests which identify the topology. This duality theorem, relies on our notion of linear tests and uses ideas from convex programming duality. Linear tests are closely related to linear invariants, which were first introduced by Lake, and are natural from an algebraic geometry perspective.
Mixedup trees: the structure of phylogenetic mixtures
 Bull. Math. Biol
"... In this paper we apply new geometric and combinatorial methods to the study of phylogenetic mixtures. The focus of the geometric approach is to describe the geometry of phylogenetic mixture distributions for the two state random cluster model, which is a generalization of the two state symmetric (CF ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
In this paper we apply new geometric and combinatorial methods to the study of phylogenetic mixtures. The focus of the geometric approach is to describe the geometry of phylogenetic mixture distributions for the two state random cluster model, which is a generalization of the two state symmetric (CFN) model. In particular, we show that the set of mixture distributions forms a convex polytope and we calculate its dimension; corollaries include a simple criterion for when a mixture of branch lengths on the star tree can mimic the site pattern frequency vector of a resolved quartet tree. Furthermore, by computing volumes of polytopes we can clarify how “common ” nonidentifiable mixtures are under the CFN model. We also present a new combinatorial result which extends any identifiability result for a specific pair of trees of size six to arbitrary pairs of trees. Next we present a positive result showing identifiability of ratesacrosssites models. Finally, we answer a question raised in a previous paper concerning “mixed branch repulsion ” on trees larger than quartet trees under the CFN model.
Pitfalls of heterogeneous processes for phylogenetic reconstruction
 Systematic Biology
, 2006
"... Different genes often have different phylogenetic histories. Even within regions having the same phylogenetic history, the mutation rates often vary. We investigate the prospects of phylogenetic reconstruction when all the characters are generated from the same tree topology, but the branch lengths ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
Different genes often have different phylogenetic histories. Even within regions having the same phylogenetic history, the mutation rates often vary. We investigate the prospects of phylogenetic reconstruction when all the characters are generated from the same tree topology, but the branch lengths vary (with possibly different tree shapes). Furthering work of Kolaczkowski and Thornton (2004) and Chang (1996), we show examples where maximum likelihood (under a homogeneous model) is an inconsistent estimator of the tree. We then explore the prospects of phylogenetic inference under a heterogeneous model. In some models, there are examples where phylogenetic inference under any method is impossible – despite the fact that there is a common tree topology. In particular, there are nonidentifiable mixture distributions, i.e., multiple topologies generate identical mixture distributions. We address which evolutionary models have nonidentifiable mixture distributions and prove that the following duality theorem holds for most DNA substitution models. The model has either: (i) Nonidentifiability – two different tree topologies can produce identical mixture distributions, and hence distinguishing between the two topologies is impossible; or (ii) Linear tests – there exist linear tests which identify the common tree topology for character data generated by a mixture distribution. The theorem holds for models whose transition matrices can be parameterized by open sets, which includes most of the popular models, such as TamuraNei and Kimura’s 2parameter model. The duality theorem relies on our notion of linear tests, which are related to Lake’s linear invariants. 1
Identifiability of latent class models with many observed variables
"... While latent class models of various types arise in many statistical applications, it is often difficult to establish their identifiability. Focusing on models in which there is some structure of independence of some of the observed variables conditioned on hidden ones, we demonstrate a general ap ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
While latent class models of various types arise in many statistical applications, it is often difficult to establish their identifiability. Focusing on models in which there is some structure of independence of some of the observed variables conditioned on hidden ones, we demonstrate a general approach for establishing identifiability, utilizing algebraic arguments. A theorem of J. Kruskal for a simple latent class model with finite state space lies at the core of our results, though we apply it to a diverse set of models. These include mixtures of both finite and nonparametric product distributions, hidden Markov models, and random graph models, and lead to a number of new results and improvements to old ones. In the parametric setting we argue that the classical definition of identifiability is too strong, and should be replaced by the concept of generic identifiability. Generic identifiability implies that the set of nonidentifiable parameters has zero measure, so that the model remains useful for inference. In particular, this sheds light on the properties of finite mixtures of Bernoulli products, which have been used for decades despite being known to be nonidentifiable models. In the nonparametric setting, we again obtain identifiability only when certain restrictions are placed on the distributions that are mixed, but we explicitly describe the conditions.
IDENTIFIABILITY OF A MARKOVIAN MODEL OF MOLECULAR EVOLUTION WITH GAMMADISTRIBUTED RATES
, 2008
"... Inference of evolutionary trees and rates from biological sequences is commonly performed using continuoustime Markov models of character change. The Markov process evolves along an unknown tree while observations arise only from the tips of the tree. Rate heterogeneity is present in most real data ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
Inference of evolutionary trees and rates from biological sequences is commonly performed using continuoustime Markov models of character change. The Markov process evolves along an unknown tree while observations arise only from the tips of the tree. Rate heterogeneity is present in most real data sets and is accounted for by the use of flexible mixture models where each site is allowed its own rate. Very little has been rigorously established concerning the identifiability of the models currently in common use in data analysis, although nonidentifiability was proven for a semiparametric model and an incorrect proof of identifiability was published for a general parametric model (GTR+Γ+I). Here we prove that one of the most widely used models (GTR+Γ) is identifiable for generic parameters, and for all parameter choices in the case of 4state (DNA) models. This is the first proof of identifiability of a phylogenetic model with a continuous distribution of rates.
1 Identifiability of 2tree mixtures for groupbased models
, 909
"... Abstract — Phylogenetic data arising on two possibly different tree topologies might be mixed through several biological mechanisms, including incomplete lineage sorting or horizontal gene transfer in the case of different topologies, or simply different substitution processes on characters in the c ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Abstract — Phylogenetic data arising on two possibly different tree topologies might be mixed through several biological mechanisms, including incomplete lineage sorting or horizontal gene transfer in the case of different topologies, or simply different substitution processes on characters in the case of the same topology. Recent work on a 2state symmetric model of character change showed such a mixture model has nonidentifiable parameters, and thus it is theoretically impossible to determine the two tree topologies from any amount of data under such circumstances. Here the question of identifiability is investigated for 2tree mixtures of the 4state groupbased models, which are more relevant to DNA sequence data. Using algebraic techniques, we show that the tree parameters are identifiable for the JC and K2P models. We also prove that generic substitution parameters for the JC mixture models are identifiable, and for the K2P and K3P models obtain generic identifiability results for mixtures on the same tree. This indicates that the full phylogenetic signal remains in such mixtures, and that the 2state symmetric result is thus a misleading guide to the behavior of other models. I.
ORIGINAL ARTICLE Mixedup Trees: the Structure of Phylogenetic Mixtures
"... Abstract In this paper, we apply new geometric and combinatorial methods to the study of phylogenetic mixtures. The focus of the geometric approach is to describe the geometry of phylogenetic mixture distributions for the two state random cluster model, which is a generalization of the two state sym ..."
Abstract
 Add to MetaCart
Abstract In this paper, we apply new geometric and combinatorial methods to the study of phylogenetic mixtures. The focus of the geometric approach is to describe the geometry of phylogenetic mixture distributions for the two state random cluster model, which is a generalization of the two state symmetric (CFN) model. In particular, we show that the set of mixture distributions forms a convex polytope and we calculate its dimension; corollaries include a simple criterion for when a mixture of branch lengths on the star tree can mimic the site pattern frequency vector of a resolved quartet tree. Furthermore, by computing volumes of polytopes we can clarify how “common ” nonidentifiable mixtures are under the CFN model. We also present a new combinatorial result which extends any identifiability result for a specific pair of trees of size six to arbitrary pairs of trees. Next we present a positive result showing identifiability of ratesacrosssites models. Finally, we answer a question raised in a previous paper concerning “mixed branch repulsion ” on trees larger than quartet trees under the CFN model.