Results 1 - 10
of
21
A practical algorithm for topic modeling with . . .
, 2013
"... Topic models provide a useful method for dimensionality reduction and exploratory data analysis in large text corpora. Most approaches to topic model learning have been based on a maximum likelihood objective. Efficient algorithms exist that attempt to approximate this objective, but they have no pr ..."
Abstract
-
Cited by 45 (1 self)
- Add to MetaCart
Topic models provide a useful method for dimensionality reduction and exploratory data analysis in large text corpora. Most approaches to topic model learning have been based on a maximum likelihood objective. Efficient algorithms exist that attempt to approximate this objective, but they have no provable guarantees. Recently, algorithms have been introduced that provide provable bounds, but these algorithms are not practical because they are inefficient and not robust to violations of model assumptions. In this paper we present an algorithm for learning topic models that is both provable and practical. The algorithm produces results comparable to the best MCMC implementations while running orders of magnitude faster.
R.: Robust near-separable nonnegative matrix factorization using linear optimization
- Journal of Machine Learning Research
, 2014
"... ar ..."
(Show Context)
Robustness analysis of Hottopixx, a linear programming model for factoring nonnegative matrices
- SIAM Journal on Matrix Analysis and Applications
, 2013
"... ar ..."
(Show Context)
The why and how of nonnegative matrix factorization
- REGULARIZATION, OPTIMIZATION, KERNELS, AND SUPPORT VECTOR MACHINES. CHAPMAN & HALL/CRC
, 2014
"... ..."
(Show Context)
Efficient Distributed Topic Modeling with Provable Guarantees
"... Topic modeling for large-scale distributed web-collections requires distributed tech-niques that account for both computational and communication costs. We consider topic modeling under the separability assumption and develop novel computationally efficient methods that provably achieve the statisti ..."
Abstract
-
Cited by 5 (4 self)
- Add to MetaCart
(Show Context)
Topic modeling for large-scale distributed web-collections requires distributed tech-niques that account for both computational and communication costs. We consider topic modeling under the separability assumption and develop novel computationally efficient methods that provably achieve the statisti-cal performance of the state-of-the-art cen-tralized approaches while requiring insignifi-cant communication between the distributed document collections. We achieve trade-offs between communication and computa-tion without actually transmitting the doc-uments. Our scheme is based on exploiting the geometry of normalized word-word co-occurrence matrix and viewing each row of this matrix as a vector in a high-dimensional space. We relate the solid angle subtended by extreme points of the convex hull of these vectors to topic identities and construct dis-tributed schemes to identify topics. 1
Necessary and sufficient conditions for novel word detection in separable topic models
- In Advances in on Neural Information Processing Systems (NIPS), Workshop on Topic Models: Computation, Application, Lake Tahoe
, 2013
"... ar ..."
(Show Context)
A Vavasis, “Semidefinite programming based preconditioning for more robust nearseparable nonnegative matrix factorization,” arXiv preprint arXiv:1310.2273
, 2013
"... ar ..."
Ellipsoidal Rounding for Nonnegative Matrix Factorization Under Noisy Separability
, 2013
"... We present a numerical algorithm for nonnegative matrix factorization (NMF) problems under noisy separability. An NMF problem under separability can be stated as one of finding all vertices of the convex hull of data points. The research interest of this paper is to find the vectors as close to the ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
(Show Context)
We present a numerical algorithm for nonnegative matrix factorization (NMF) problems under noisy separability. An NMF problem under separability can be stated as one of finding all vertices of the convex hull of data points. The research interest of this paper is to find the vectors as close to the vertices as possible in a situation in which noise is added to the data points. Our algorithm is designed to capture the shape of the convex hull of data points by using its enclosing ellipsoid. We show that the algorithm has correctness and robustness properties from theoretical and practical perspectives; correctness here means that if the data points do not contain any noise, the algorithm can find the vertices of their convex hull; robustness means that if the data points contain noise, the algorithm can find the near-vertices. Finally, we apply the algorithm to document clustering, and report the experimental results.
Random projections for non-negative matrix factorization. arXiv preprint arXiv:1405.4275
, 2014
"... Non-negative matrix factorization (NMF) is a widely used tool for exploratory data analysis in many disciplines. In this paper, we describe an approach to NMF based on random projections and give a geometric analysis of a prototypical algorithm. Our main result shows the proto-algorithm requires κ̄k ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Non-negative matrix factorization (NMF) is a widely used tool for exploratory data analysis in many disciplines. In this paper, we describe an approach to NMF based on random projections and give a geometric analysis of a prototypical algorithm. Our main result shows the proto-algorithm requires κ̄k log k optimizations to find all the extreme columns of the matrix, where k is the number of extreme columns, and κ ̄ is a geometric condition number. We show empirically that the proto-algorithm is robust to noise and well-suited to modern distributed computing architectures.
Provable Algorithms for Machine Learning Problems
, 2013
"... Modern machine learning algorithms can extract useful information from text, images and videos. All these applications involve solving NP-hard problems in average case using heuris-tics. What properties of the input allow it to be solved efficiently? Theoretically analyzing the heuristics is often v ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
(Show Context)
Modern machine learning algorithms can extract useful information from text, images and videos. All these applications involve solving NP-hard problems in average case using heuris-tics. What properties of the input allow it to be solved efficiently? Theoretically analyzing the heuristics is often very challenging. Few results were known. This thesis takes a different approach: we identify natural properties of the input, then design new algorithms that provably works assuming the input has these properties. We are able to give new, provable and sometimes practical algorithms for learning tasks related to text corpus, images and social networks. The first part of the thesis presents new algorithms for learning thematic structure in documents. We show under a reasonable assumption, it is possible to provably learn many topic models, including the famous Latent Dirichlet Allocation. Our algorithm is the first provable algorithms for topic modeling. An implementation runs 50 times faster than latest MCMC implementation and produces comparable results. The second part of the thesis provides ideas for provably learning deep, sparse representa-tions. We start with sparse linear representations, and give the first algorithm for dictionary learning problem with provable guarantees. Then we apply similar ideas to deep learning: under reasonable assumptions our algorithms can learn a deep network built by denoising autoencoders. The final part of the thesis develops a framework for learning latent variable models. We demonstrate how various latent variable models can be reduced to orthogonal tensor decomposition, and then be solved using tensor power method. We give a tight perturbation analysis for tensor power method, which reduces the number of samples required to learn many latent variable models. In theory, the assumptions in this thesis help us understand why intractable problems in machine learning can often be solved; in practice, the results suggest inherently new approaches for machine learning. We hope the assumptions and algorithms inspire new research problems and learning algorithms. iii