Results 1  10
of
33
Tensor decompositions for learning latent variable models
, 2014
"... This work considers a computationally and statistically efficient parameter estimation method for a wide class of latent variable models—including Gaussian mixture models, hidden Markov models, and latent Dirichlet allocation—which exploits a certain tensor structure in their loworder observable mo ..."
Abstract

Cited by 83 (7 self)
 Add to MetaCart
This work considers a computationally and statistically efficient parameter estimation method for a wide class of latent variable models—including Gaussian mixture models, hidden Markov models, and latent Dirichlet allocation—which exploits a certain tensor structure in their loworder observable moments (typically, of second and thirdorder). Specifically, parameter estimation is reduced to the problem of extracting a certain (orthogonal) decomposition of a symmetric tensor derived from the moments; this decomposition can be viewed as a natural generalization of the singular value decomposition for matrices. Although tensor decompositions are generally intractable to compute, the decomposition of these specially structured tensors can be efficiently obtained by a variety of approaches, including power iterations and maximization approaches (similar to the case of matrices). A detailed analysis of a robust tensor power method is provided, establishing an analogue of Wedin’s perturbation theorem for the singular vectors of matrices. This implies a robust and computationally tractable estimation approach for several popular latent variable models.
A Spectral Algorithm for Latent Dirichlet Allocation
"... Topic modeling is a generalization of clustering that posits that observations (words in a document) are generated by multiple latent factors (topics), as opposed to just one. This increased representational power comes at the cost of a more challenging unsupervised learning problem of estimating th ..."
Abstract

Cited by 49 (11 self)
 Add to MetaCart
Topic modeling is a generalization of clustering that posits that observations (words in a document) are generated by multiple latent factors (topics), as opposed to just one. This increased representational power comes at the cost of a more challenging unsupervised learning problem of estimating the topicword distributions when only words are observed, and the topics are hidden. This work provides a simple and efficient learning procedure that is guaranteed to recover the parameters for a wide class of topic models, including Latent Dirichlet Allocation (LDA). For LDA, the procedure correctly recovers both the topicword distributions and the parameters of the Dirichlet prior over the topic mixtures, using only trigram statistics (i.e., third order moments, which may be estimated with documents containing just three words). The method, called Excess Correlation Analysis, is based on a spectral decomposition of loworder moments via two singular value decompositions (SVDs). Moreover, the algorithm is scalable, since the SVDs are carried out only on k × k matrices, where k is the number of latent factors (topics) and is typically much smaller than the dimension of the observation (word) space. 1
Fast and robust recursive algorithms for separable nonnegative matrix factorization. arXiv preprint arXiv:1208.1237
, 2012
"... ar ..."
Experiments with Spectral Learning of LatentVariable PCFGs
"... Latentvariable PCFGs (LPCFGs) are a highly successful model for natural language parsing. Recent work (Cohen et al., 2012) has introduced a spectral algorithm for parameter estimation of LPCFGs, which—unlike the EM algorithm—is guaranteed to give consistent parameter estimates (it has PACstyle g ..."
Abstract

Cited by 22 (8 self)
 Add to MetaCart
(Show Context)
Latentvariable PCFGs (LPCFGs) are a highly successful model for natural language parsing. Recent work (Cohen et al., 2012) has introduced a spectral algorithm for parameter estimation of LPCFGs, which—unlike the EM algorithm—is guaranteed to give consistent parameter estimates (it has PACstyle guarantees of sample complexity). This paper describes experiments using the spectral algorithm. We show that the algorithm provides models with the same accuracy as EM, but is an order of magnitude more efficient. We describe a number of key steps used to obtain this level of performance; these should be relevant to other work on the application of spectral learning algorithms. We view our results as strong empirical evidence for the viability of spectral methods as an alternative to EM. 1
Fast conical hull algorithms for nearseparable nonnegative matrix factorization
 In ACM/IEEE conference on Supercomputing
, 2009
"... The separability assumption (Donoho & Stodden, 2003; Arora et al., 2012a) turns nonnegative matrix factorization (NMF) into a tractable problem. Recently, a new class of provablycorrect NMF algorithms have emerged under this assumption. In this paper, we reformulate the separable NMF problem a ..."
Abstract

Cited by 21 (1 self)
 Add to MetaCart
The separability assumption (Donoho & Stodden, 2003; Arora et al., 2012a) turns nonnegative matrix factorization (NMF) into a tractable problem. Recently, a new class of provablycorrect NMF algorithms have emerged under this assumption. In this paper, we reformulate the separable NMF problem as that of finding the extreme rays of the conical hull of a finite set of vectors. From this geometricperspective, we derive new separable NMF algorithms that are highly scalable and empirically noise robust, and haveseveralotherfavorablepropertiesin relation to existing methods. A parallel implementation of our algorithm demonstrates high scalability on shared and distributedmemory machines. 1.
New Algorithms for Learning Incoherent and Overcomplete Dictionaries
, 2014
"... In sparse recovery we are given a matrix A ∈ Rn×m (“the dictionary”) and a vector of the form AX where X is sparse, and the goal is to recover X. This is a central notion in signal processing, statistics and machine learning. But in applications such as sparse coding, edge detection, compression an ..."
Abstract

Cited by 19 (2 self)
 Add to MetaCart
(Show Context)
In sparse recovery we are given a matrix A ∈ Rn×m (“the dictionary”) and a vector of the form AX where X is sparse, and the goal is to recover X. This is a central notion in signal processing, statistics and machine learning. But in applications such as sparse coding, edge detection, compression and super resolution, the dictionary A is unknown and has to be learned from random examples of the form Y = AX where X is drawn from an appropriate distribution — this is the dictionary learning problem. In most settings, A is overcomplete: it has more columns than rows. This paper presents a polynomialtime algorithm for learning overcomplete dictionaries; the only previously known algorithm with provable guarantees is the recent work of Spielman et al. (2012) who gave an algorithm for the undercomplete case, which is rarely the case in applications. Our algorithm applies to incoherent dictionaries which have been a central object of study since they were introduced in seminal work of Donoho and Huo (1999). In particular, a dictionary is µincoherent if each pair of columns has inner product at most µ/ n. The algorithm makes natural stochastic assumptions about the unknown sparse vector X, which can contain k ≤ cmin(√n/µ log n,m1/2−η) nonzero entries (for any η> 0). This is close to the best k allowable by the best sparse recovery algorithms even if one knows the dictionary A exactly. Moreover, both the running time and sample complexity depend on log 1/, where is the target accuracy, and so our algorithms converge very quickly to the true dictionary. Our algorithm can also tolerate substantial amounts of noise provided it is incoherent with respect to the dictionary (e.g., Gaussian). In the noisy setting, our running time and sample complexity depend polynomially on 1/, and this is necessary.
R.: Robust nearseparable nonnegative matrix factorization using linear optimization
 Journal of Machine Learning Research
, 2014
"... ar ..."
Robustness analysis of Hottopixx, a linear programming model for factoring nonnegative matrices
 SIAM Journal on Matrix Analysis and Applications
, 2013
"... ar ..."
Utopian: Userdriven topic modeling based on interactive nonnegative matrix factorization
 IEEE TVCG
"... Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/ ..."
Abstract

Cited by 10 (1 self)
 Add to MetaCart
(Show Context)
Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/
Provable bounds for learning some deep representations.
 ArXiv:1310.6343,
, 2013
"... Abstract We give algorithms with provable guarantees that learn a class of deep nets in the generative model view popularized by Hinton and others. Our generative model is an n node multilayer network that has degree at most n γ for some γ < 1 and each edge has a random edge weight in [−1, 1]. O ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
Abstract We give algorithms with provable guarantees that learn a class of deep nets in the generative model view popularized by Hinton and others. Our generative model is an n node multilayer network that has degree at most n γ for some γ < 1 and each edge has a random edge weight in [−1, 1]. Our algorithm learns almost all networks in this class with polynomial running time. The sample complexity is quadratic or cubic depending upon the details of the model. The algorithm uses layerwise learning. It is based upon a novel idea of observing correlations among features and using these to infer the underlying edge structure via a global graph recovery procedure. The analysis of the algorithm reveals interesting structure of neural nets with random edge weights.