Results 1  10
of
39
Statistical challenges with high dimensionality: Feature selection in knowledge discovery
 Proceedings of the International Congress of Mathematicians
, 2006
"... Abstract. Technological innovations have revolutionized the process of scientific research and knowledge discovery. The availability of massive data and challenges from frontiers of research and development have reshaped statistical thinking, data analysis and theoretical studies. The challenges of ..."
Abstract

Cited by 35 (9 self)
 Add to MetaCart
Abstract. Technological innovations have revolutionized the process of scientific research and knowledge discovery. The availability of massive data and challenges from frontiers of research and development have reshaped statistical thinking, data analysis and theoretical studies. The challenges of highdimensionality arise in diverse fields of sciences and the humanities, ranging from computational biology and health studies to financial engineering and risk management. In all of these fields, variable selection and feature extraction are crucial for knowledge discovery. We first give a comprehensive overview of statistical challenges with high dimensionality in these diverse disciplines. We then approach the problem of variable selection and feature extraction using a unified framework: penalized likelihood methods. Issues relevant to the choice of penalty functions are addressed. We demonstrate that for a host of statistical problems, as long as the dimensionality is not excessively large, we can estimate the model parameters as well as if the best model is known in advance. The persistence property in risk minimization is also addressed. The applicability of such a theory and method to diverse statistical problems is demonstrated. Other related problems with highdimensionality are also discussed.
The identifiability of tree topology for phylogenetic models, including covarion and mixture models, arXive qbio.PE/0511009
"... Abstract. For a model of molecular evolution to be useful for phylogenetic inference, the topology of evolutionary trees must be identifiable. That is, from a joint distribution the model predicts, it must be possible to recover the tree parameter. We establish tree identifiability for a number of p ..."
Abstract

Cited by 23 (7 self)
 Add to MetaCart
Abstract. For a model of molecular evolution to be useful for phylogenetic inference, the topology of evolutionary trees must be identifiable. That is, from a joint distribution the model predicts, it must be possible to recover the tree parameter. We establish tree identifiability for a number of phylogenetic models, including a covarion model and a variety of mixture models with a limited number of classes. The proof is based on the introduction of a more general model, allowing more states at internal nodes of the tree than at leaves, and the study of the algebraic variety formed by the joint distributions to which it gives rise. Tree identifiability is first established for this general model through the use of certain phylogenetic invariants. 1.
Performance of a New Invariants Method on Homogeneous and Nonhomogeneous Quartet Trees
, 2006
"... ..."
Algebraic statistical models
 Statistica Sinica
"... Abstract: Many statistical models are algebraic in that they are defined in terms of polynomial constraints, or in terms of polynomial or rational parametrizations. The parameter spaces of such models are typically semialgebraic subsets of the parameter space of a reference model with nice properti ..."
Abstract

Cited by 13 (4 self)
 Add to MetaCart
Abstract: Many statistical models are algebraic in that they are defined in terms of polynomial constraints, or in terms of polynomial or rational parametrizations. The parameter spaces of such models are typically semialgebraic subsets of the parameter space of a reference model with nice properties, such as for example a regular exponential family. This observation leads to the definition of an ‘algebraic exponential family’. This new definition provides a unified framework for the study of statistical models with algebraic structure. In this paper we review the ingredients to this definition and illustrate in examples how computational algebraic geometry can be used to solve problems arising in statistical inference in algebraic models. Key words and phrases: Algebraic statistics, computational algebraic geometry, exponential family, maximum likelihood estimation, model invariants, singularities. 1.
Toric geometry of cuts and splits
 Michigan Math. J
"... Abstract. Associated to any graph is a toric ideal whose generators record relations among the cuts of the graph. We study these ideals and the geometry of the corresponding toric varieties. Our theorems and conjectures relate the combinatorial structure of the graph and the corresponding cut polyto ..."
Abstract

Cited by 12 (1 self)
 Add to MetaCart
Abstract. Associated to any graph is a toric ideal whose generators record relations among the cuts of the graph. We study these ideals and the geometry of the corresponding toric varieties. Our theorems and conjectures relate the combinatorial structure of the graph and the corresponding cut polytope to algebraic properties of the ideal. Cut ideals generalize toric ideals arising in phylogenetics and the study of contingency tables. 1.
Using invariants for phylogenetic tree construction,” in Emerging Applications of Algebraic Geometry
, 2008
"... Abstract. Phylogenetic invariants are certain polynomials in the joint probability distribution of a Markov model on a phylogenetic tree. Such polynomials are of theoretical interest in the field of algebraic statistics and they are also of practical interest—they can be used to construct phylogenet ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
Abstract. Phylogenetic invariants are certain polynomials in the joint probability distribution of a Markov model on a phylogenetic tree. Such polynomials are of theoretical interest in the field of algebraic statistics and they are also of practical interest—they can be used to construct phylogenetic trees. This paper is a selfcontained introduction to the algebraic, statistical, and computational challenges involved in the practical use of phylogenetic invariants. We survey the relevant literature and provide some partial answers and many open problems.
Catalog of small trees
, 2005
"... This chapter is concerned with the description of the Small Trees website which can be found at the following web address: ..."
Abstract

Cited by 9 (2 self)
 Add to MetaCart
This chapter is concerned with the description of the Small Trees website which can be found at the following web address:
Conjunctive bayesian networks
 Bernoulli
, 2007
"... Conjunctive Bayesian networks (CBNs) are graphical models that describe the accumulation of events which are constrained in the order of their occurrence. A CBN is given by a partial order on a (finite) set of events. CBNs generalize the oncogenetic tree models of Desper et al. (1999) by allowing th ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
Conjunctive Bayesian networks (CBNs) are graphical models that describe the accumulation of events which are constrained in the order of their occurrence. A CBN is given by a partial order on a (finite) set of events. CBNs generalize the oncogenetic tree models of Desper et al. (1999) by allowing the occurrence of an event to depend on more than one predecessor event. The present paper studies the statistical and algebraic properties of CBNs. We determine the maximum likelihood parameters and present a combinatorial solution to the model selection problem. Our method performs well on two datasets where the events are HIV mutations associated with drug resistance. Concluding with a study of the algebraic properties of CBNs, we show that CBNs are toric varieties after a coordinate transformation and that their ideals possess a quadratic Gröbner basis.
The strand symmetric model
, 2005
"... This chapter is devoted to the study of strand symmetric Markov models on trees from the standpoint of algebraic statistics. By a strand symmetric Markov model, we mean one whose mutation structure reflects the symmetry induced by the doublestranded structure of DNA. In particular, a strand ..."
Abstract

Cited by 8 (4 self)
 Add to MetaCart
This chapter is devoted to the study of strand symmetric Markov models on trees from the standpoint of algebraic statistics. By a strand symmetric Markov model, we mean one whose mutation structure reflects the symmetry induced by the doublestranded structure of DNA. In particular, a strand