Results 1 
8 of
8
Polynomial Learning of Distribution Families
"... Abstract—The question of polynomial learnability of probability distributions, particularly Gaussian mixture distributions, has recently received significant attention in theoretical computer science and machine learning. However, despite major progress, the general question of polynomial learnabili ..."
Abstract

Cited by 46 (0 self)
 Add to MetaCart
(Show Context)
Abstract—The question of polynomial learnability of probability distributions, particularly Gaussian mixture distributions, has recently received significant attention in theoretical computer science and machine learning. However, despite major progress, the general question of polynomial learnability of Gaussian mixture distributions still remained open. The current work resolves the question of polynomial learnability for Gaussian mixtures in high dimension with an arbitrary fixed number of components. Specifically, we show that parameters of a Gaussian mixture distribution with fixed number of components can be learned using a sample whose size is polynomial in dimension and all other parameters. The result on learning Gaussian mixtures relies on an analysis of distributions belonging to what we call “polynomial families” in low dimension. These families are characterized by their moments being polynomial in parameters and include almost all common probability distributions as well as their mixtures and products. Using tools from real algebraic geometry, we show that parameters of any distribution belonging to such a family can be learned in polynomial time and using a polynomial number of sample points. The result on learning polynomial families is quite general and is of independent interest. To estimate parameters of a Gaussian mixture distribution in high dimensions, we provide a deterministic algorithm for dimensionality reduction. This allows us to reduce learning a highdimensional mixture to a polynomial number of parameter estimations in low dimension. Combining this reduction with the results on polynomial families yields our result on learning arbitrary Gaussian mixtures in high dimensions. Index Terms—Gaussian mixture learning, polynomial learnability I.
Spectral Methods for Learning Multivariate Latent Tree Structure
"... This work considers the problem of learning the structure of multivariate linear tree models, which include a variety of directed tree graphical models with continuous, discrete, and mixed latent variables such as linearGaussian models, hidden Markov models, Gaussian mixture models, and Markov evol ..."
Abstract

Cited by 18 (4 self)
 Add to MetaCart
(Show Context)
This work considers the problem of learning the structure of multivariate linear tree models, which include a variety of directed tree graphical models with continuous, discrete, and mixed latent variables such as linearGaussian models, hidden Markov models, Gaussian mixture models, and Markov evolutionary trees. The setting is one where we only have samples from certain observed variables in the tree, and our goal is to estimate the tree structure (i.e., the graph of how the underlying hidden variables are connected to each other and to the observed variables). We propose the Spectral Recursive Grouping algorithm, an efficient and simple bottomup procedure for recovering the tree structure from independent samples of the observed variables. Our finite sample size bounds for exact recovery of the tree structure reveal certain natural dependencies on underlying statistical and structural properties of the underlying joint distribution. Furthermore, our sample complexity guarantees have no explicit dependence on the dimensionality of the observed variables, making the algorithm applicable to many highdimensional settings. At the heart of our algorithm is a spectral quartet test for determining the relative topology of a quartet of variables from secondorder statistics. 1
High dimensional expectationmaximization algorithm: Statistical optimization and asymptotic normality. arXiv arXiv preprint:1412.8729
, 2014
"... We provide a general theory of the expectationmaximization (EM) algorithm for inferring high dimensional latent variable models. In particular, we make two contributions: (i) For parameter estimation, we propose a novel high dimensional EM algorithm which naturally incorporates sparsity structure ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
We provide a general theory of the expectationmaximization (EM) algorithm for inferring high dimensional latent variable models. In particular, we make two contributions: (i) For parameter estimation, we propose a novel high dimensional EM algorithm which naturally incorporates sparsity structure into parameter estimation. With an appropriate initialization, this algorithm converges at a geometric rate and attains an estimator with the (near)optimal statistical rate of convergence. (ii) Based on the obtained estimator, we propose new inferential procedures for testing hypotheses and constructing confidence intervals for low dimensional components of high dimensional parameters. For a broad family of statistical models, our framework establishes the first computationally feasible approach for optimal estimation and asymptotic inference in high dimensions. Our theory is supported by thorough numerical results. 1
Nearoptimalsample estimators for spherical gaussian mixtures
, 2014
"... Many important distributions are high dimensional, and often they can be modeled as Gaussian mixtures. We derive the first sampleefficient polynomialtime estimator for highdimensional spherical Gaussian mixtures. Based on intuitive spectral reasoning, it approximates mixtures of k spherical Gau ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
Many important distributions are high dimensional, and often they can be modeled as Gaussian mixtures. We derive the first sampleefficient polynomialtime estimator for highdimensional spherical Gaussian mixtures. Based on intuitive spectral reasoning, it approximates mixtures of k spherical Gaussians in ddimensions to within `1 distance using O(dk9(log2 d)/4) samples and Ok,(d3 log5 d) computation time. Conversely, we show that any estimator requires Ω(dk/2) samples, hence the algorithm’s sample complexity is nearly optimal in the dimension. The implied timecomplexity factor Ok, is exponential in k, but much smaller than previously known. We also construct a simple estimator for onedimensional Gaussian mixtures that uses Õ(k/2) samples and Õ((k/)3k+1) computation time.
Convex Relaxations of Bregman Divergence Clustering
"... Although many convex relaxations of clustering have been proposed in the past decade, current formulations remain restricted to spherical Gaussian or discriminative models and are susceptible to imbalanced clusters. To address these shortcomings, we propose a new class of convex relaxations that can ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Although many convex relaxations of clustering have been proposed in the past decade, current formulations remain restricted to spherical Gaussian or discriminative models and are susceptible to imbalanced clusters. To address these shortcomings, we propose a new class of convex relaxations that can be flexibly applied to more general forms of Bregman divergence clustering. By basing these new formulations on normalized equivalence relations we retain additional control on relaxation quality, which allows improvement in clustering quality. We furthermore develop optimization methods that improve scalability by exploiting recent implicit matrix norm methods. In practice, we find that the new formulations are able to efficiently produce tighter clusterings that improve the accuracy of state of the art methods. 1
A SinglePass Algorithm for Efficiently Recovering Sparse Cluster Centers of Highdimensional Data
"... Learning a statistical model for highdimensional data is an important topic in machine learning. Although this problem has been well studied in the supervised setting, little is known about its unsupervised counterpart. In this work, we focus on the problem of clustering highdimensional data with ..."
Abstract
 Add to MetaCart
Learning a statistical model for highdimensional data is an important topic in machine learning. Although this problem has been well studied in the supervised setting, little is known about its unsupervised counterpart. In this work, we focus on the problem of clustering highdimensional data with sparse centers. In particular, we address the following open question in unsupervised learning: “is it possible to reliably cluster highdimensional data when the number of samples is smaller than the data dimensionality? ” We develop an efficient clustering algorithm that is able to estimate sparse cluster centers with a single pass over the data. Our theoretical analysis shows that the proposed algorithm is able to accurately recover cluster centers with only O(s log d) number of samples (data points), provided all the cluster centers are ssparse vectors in a d dimensional space. Experimental results verify both the effectiveness and efficiency of the proposed clustering algorithm compared to the stateoftheart algorithms on several benchmark datasets.
Detection and Feature Selection in Sparse Mixture Models
"... Abstract. We consider Gaussian mixture models in high dimensions and concentrate on the twin tasks of detection and feature selection. Under sparsity assumptions on the difference in means, we derive information bounds and establish the performance of various procedures, including the top sparse eig ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract. We consider Gaussian mixture models in high dimensions and concentrate on the twin tasks of detection and feature selection. Under sparsity assumptions on the difference in means, we derive information bounds and establish the performance of various procedures, including the top sparse eigenvalue of the sample covariance matrix and other projection tests based on moments, such as the skewness and kurtosis tests of Malkovich and Afifi (1973), and other variants which we were better able to control under the null.