Results 11  20
of
37
The why and how of nonnegative matrix factorization
 REGULARIZATION, OPTIMIZATION, KERNELS, AND SUPPORT VECTOR MACHINES. CHAPMAN & HALL/CRC
, 2014
"... ..."
(Show Context)
Spectral Methods for Supervised Topic Models
"... Supervised topic models simultaneously model the latent topic structure of large collections of documents and a response variable associated with each document. Existing inference methods are based on either variational approximation or Monte Carlo sampling. This paper presents a novel spectral dec ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
(Show Context)
Supervised topic models simultaneously model the latent topic structure of large collections of documents and a response variable associated with each document. Existing inference methods are based on either variational approximation or Monte Carlo sampling. This paper presents a novel spectral decomposition algorithm to recover the parameters of supervised latent Dirichlet allocation (sLDA) models. The SpectralsLDA algorithm is provably correct and computationally efficient. We prove a sample complexity bound and subsequently derive a sufficient condition for the identifiability of sLDA. Thorough experiments on a diverse range of synthetic and realworld datasets verify the theory and demonstrate the practical effectiveness of the algorithm. 1
LargeScale Distributed Nonnegative Sparse Coding and Sparse Dictionary Learning
"... We consider the problem of building compact, unsupervised representationsoflarge,highdimensional,nonnegativedata using sparse coding and dictionary learning schemes, with an emphasis on executing the algorithm in a MapReduce environment. The proposed algorithms may be seen as parallel optimizatio ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
We consider the problem of building compact, unsupervised representationsoflarge,highdimensional,nonnegativedata using sparse coding and dictionary learning schemes, with an emphasis on executing the algorithm in a MapReduce environment. The proposed algorithms may be seen as parallel optimization procedures for constructing sparse nonnegative factorizations of large, sparse matrices. Our approach alternates between a parallel sparse coding phase implemented using greedy or convex (l1) regularized risk minimization procedures, and a sequential dictionary learning phase where we solve a set of l0 optimization problems exactly. These twofold sparsity constraints lead to better statistical performance on text analysis tasks and at the same time make it possible to implement each iteration in a single MapReduce job. We detail our implementations and optimizations that lead to the ability to factor matrices with more than 100 million rows and billions of nonzero entries in just a few hours on a small commodity cluster.
GOSUS: Grassmannian Online Subspace Updates with Structuredsparsity
"... We study the problem of online subspace learning in the context of sequential observations involving structured perturbations. In online subspace learning, the observations are an unknown mixture of two components presented to the model sequentially — the main effect which pertains to the subspace a ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
We study the problem of online subspace learning in the context of sequential observations involving structured perturbations. In online subspace learning, the observations are an unknown mixture of two components presented to the model sequentially — the main effect which pertains to the subspace and a residual/error term. If no additional requirement is imposed on the residual, it often corresponds to noise terms in the signal which were unaccounted for by the main effect. To remedy this, one may impose ‘structural’ contiguity, which has the intended effect of leveraging the secondary terms as a covariate that helps the estimation of the subspace itself, instead of merely serving as a noise residual. We show that the corresponding online estimation procedure can be written as an approximate optimization process on a Grassmannian. We propose an efficient numerical solution, GOSUS, Grassmannian Online Subspace Updates with Structuredsparsity, for this problem. GOSUS is expressive enough in modeling both homogeneous perturbations of the subspace and structural contiguities of outliers, and after certain manipulations, solvable via an alternating direction method of multipliers (ADMM). We evaluate the empirical performance of this algorithm on two problems of interest: online background subtraction and online multiple face tracking, and demonstrate that it achieves competitive performance with the stateoftheart in near real time. 1.
Hierarchical Clustering of Hyperspectral Images Using RankTwo Nonnegative Matrix Factorization
 IEEE, Transactions on Geoscience and Remote Sensing
, 2015
"... In this paper, we design a hierarchical clustering algorithm for highresolution hyperspectral images. At the core of the algorithm, a new ranktwo nonnegative matrix factorizations (NMF) algorithm is used to split the clusters, which is motivated by convex geometry concepts. The method starts with ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
(Show Context)
In this paper, we design a hierarchical clustering algorithm for highresolution hyperspectral images. At the core of the algorithm, a new ranktwo nonnegative matrix factorizations (NMF) algorithm is used to split the clusters, which is motivated by convex geometry concepts. The method starts with a single cluster containing all pixels, and, at each step, (i) selects a cluster in such a way that the error at the next step is minimized, and (ii) splits the selected cluster into two disjoint clusters using ranktwo NMF in such a way that the clusters are well balanced and stable. The proposed method can also be used as an endmember extraction algorithm in the presence of pure pixels. The effectiveness of this approach is illustrated on several synthetic and realworld hyperspectral images, and shown to outperform standard clustering techniques such as kmeans, spherical kmeans and standard NMF.
A Vavasis, “Semidefinite programming based preconditioning for more robust nearseparable nonnegative matrix factorization,” arXiv preprint arXiv:1310.2273
, 2013
"... ar ..."
Ellipsoidal Rounding for Nonnegative Matrix Factorization Under Noisy Separability
, 2013
"... We present a numerical algorithm for nonnegative matrix factorization (NMF) problems under noisy separability. An NMF problem under separability can be stated as one of finding all vertices of the convex hull of data points. The research interest of this paper is to find the vectors as close to the ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
We present a numerical algorithm for nonnegative matrix factorization (NMF) problems under noisy separability. An NMF problem under separability can be stated as one of finding all vertices of the convex hull of data points. The research interest of this paper is to find the vectors as close to the vertices as possible in a situation in which noise is added to the data points. Our algorithm is designed to capture the shape of the convex hull of data points by using its enclosing ellipsoid. We show that the algorithm has correctness and robustness properties from theoretical and practical perspectives; correctness here means that if the data points do not contain any noise, the algorithm can find the vertices of their convex hull; robustness means that if the data points contain noise, the algorithm can find the nearvertices. Finally, we apply the algorithm to document clustering, and report the experimental results.
Random projections for nonnegative matrix factorization. arXiv preprint arXiv:1405.4275
, 2014
"... Nonnegative matrix factorization (NMF) is a widely used tool for exploratory data analysis in many disciplines. In this paper, we describe an approach to NMF based on random projections and give a geometric analysis of a prototypical algorithm. Our main result shows the protoalgorithm requires κ̄k ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
Nonnegative matrix factorization (NMF) is a widely used tool for exploratory data analysis in many disciplines. In this paper, we describe an approach to NMF based on random projections and give a geometric analysis of a prototypical algorithm. Our main result shows the protoalgorithm requires κ̄k log k optimizations to find all the extreme columns of the matrix, where k is the number of extreme columns, and κ ̄ is a geometric condition number. We show empirically that the protoalgorithm is robust to noise and wellsuited to modern distributed computing architectures.
Tripartite graph clustering for dynamic sentiment analysis on social media
, 2014
"... The growing popularity of social media (e.g., Twitter) allows users to easily share information with each other and influence others by expressing their own sentiments on various subjects. In this work, we propose an unsupervised triclustering framework, which analyzes both userlevel and tweetlev ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
(Show Context)
The growing popularity of social media (e.g., Twitter) allows users to easily share information with each other and influence others by expressing their own sentiments on various subjects. In this work, we propose an unsupervised triclustering framework, which analyzes both userlevel and tweetlevel sentiments through coclustering of a tripartite graph. A compelling feature of the proposed framework is that the quality of sentiment clustering of tweets, users, and features can be mutually improved by joint clustering. We further investigate the evolution of userlevel sentiments and latent feature vectors in an online framework and devise an efficient online algorithm to sequentially update the clustering of tweets, users and features with newly arrived data. The online framework not only provides better quality of both dynamic userlevel and tweetlevel sentiment analysis, but also improves the computational and storage efficiency. We verified the effectiveness and efficiency of the proposed approaches on the November 2012 California ballot Twitter data. 1.
Towards quantifying vertex similarity in networks
 InternetMathematics
, 2014
"... Abstract. Vertex similarity is a major problem in network science with a wide range of applications. In this work we provide novel perspectives on finding (dis)similar vertices within a network and across two networks with the same number of vertices (graph matching). With respect to the former pro ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
Abstract. Vertex similarity is a major problem in network science with a wide range of applications. In this work we provide novel perspectives on finding (dis)similar vertices within a network and across two networks with the same number of vertices (graph matching). With respect to the former problem, we propose to optimize a geometric objective which allows us to express each vertex uniquely as a convex combination of a few extreme types of vertices. Our method has the important advantage of supporting efficiently several types of queries such as “which other vertices are most similar to this vertex? ” by the use of the appropriate data structures and of mining interesting patterns in the network. With respect to the latter problem (graph matching), we propose the generalized condition number –a quantity widely used in numerical analysis – κ(LG, LH) of the Laplacian matrix representations of G,H as a measure of graph similarity, where G,H are the graphs of interest. We show that this objective has a solid theoretical basis and propose a deterministic and a randomized graph alignment algorithm. We evaluate our algorithms on both synthetic and real data. We observe that our proposed methods achieve highquality results and provide us with significant insights into the network structure. 1.