Results 1  10
of
68
Guaranteed rank minimization via singular value projection
 In NIPS 2010
, 2010
"... Minimizing the rank of a matrix subject to affine constraints is a fundamental problem with many important applications in machine learning and statistics. In this paper we propose a simple and fast algorithm SVP (Singular Value Projection) for rank minimization under affine constraints (ARMP) and s ..."
Abstract

Cited by 100 (7 self)
 Add to MetaCart
(Show Context)
Minimizing the rank of a matrix subject to affine constraints is a fundamental problem with many important applications in machine learning and statistics. In this paper we propose a simple and fast algorithm SVP (Singular Value Projection) for rank minimization under affine constraints (ARMP) and show that SVP recovers the minimum rank solution for affine constraints that satisfy a restricted isometry property (RIP). Our method guarantees geometric convergence rate even in the presence of noise and requires strictly weaker assumptions on the RIP constants than the existing methods. We also introduce a Newtonstep for our SVP framework to speedup the convergence with substantial empirical gains. Next, we address a practically important application of ARMP the problem of lowrank matrix completion, for which the defining affine constraints do not directly obey RIP, hence the guarantees of SVP do not hold. However, we provide partial progress towards a proof of exact recovery for our algorithm by showing a more restricted isometry property and observe empirically that our algorithm recovers lowrank incoherent matrices from an almost optimal number of uniformly sampled entries. We also demonstrate empirically that our algorithms outperform existing methods, such as those of [5, 18, 14], for ARMP and the matrix completion problem by an order of magnitude and are also more robust to noise and sampling schemes. In particular, results show that our SVPNewton method is significantly robust to noise and performs impressively on a more realistic powerlaw sampling scheme for the matrix completion problem. 1
A scalable collaborative filtering framework based on coclustering
 Fifth IEEE International Conference on Data Mining
, 2005
"... Collaborative filteringbased recommender systems, which automatically predict preferred products of a user using known preferences of other users, have become extremely popular in recent years due to the increase in webbased activities such as ecommerce and online content distribution. Current co ..."
Abstract

Cited by 69 (1 self)
 Add to MetaCart
(Show Context)
Collaborative filteringbased recommender systems, which automatically predict preferred products of a user using known preferences of other users, have become extremely popular in recent years due to the increase in webbased activities such as ecommerce and online content distribution. Current collaborative filtering techniques such as correlation and SVD based methods provide good accuracy, but are computationally very expensive and can only be deployed in static offline settings where the known preference information does not change with time. However, a number of practical scenarios require dynamic realtime collaborative filtering that can allow new users, items and ratings to enter the system at a rapid rate. In this paper, we consider a novel collaborative filtering approach based on a recently proposed weighted coclustering algorithm [3] that involves simultaneous clustering of users and items. We design incremental and parallel versions of the coclustering algorithm and use it to build an efficient realtime collaborative filtering framework. Empirical evaluation of our approach on large movie and book rating datasets demonstrates that it is possible to obtain an accuracy comparable to that of the correlation and matrix factorization based approaches at a much lower computational cost. 1
Intelligent Techniques for Web Personalization”
 in postproceedings of the Second Workshop on Intelligent Techniques in Web Personalization,
, 2005
"... ..."
(Show Context)
Recovering the missing components in a large noisy lowrank matrix: application to SFM source
 IEEE Transactions on Pattern Analysis and Machine Intelligence
"... AbstractIn computer vision, it is common to require operations on matrices with “missing data”, for example because of occlusion or tracking failures in the Structure from Motion (SFM) problem. Such a problem can be tackled, allowing the recovery of the missing values, if the matrix should be of ..."
Abstract

Cited by 48 (4 self)
 Add to MetaCart
(Show Context)
AbstractIn computer vision, it is common to require operations on matrices with “missing data”, for example because of occlusion or tracking failures in the Structure from Motion (SFM) problem. Such a problem can be tackled, allowing the recovery of the missing values, if the matrix should be of low rank (when noise free). The filling in of missing values is known as imputation. Imputation can also be applied in the various subspace techniques for face and shape classification, online “recommender ” systems, and a wide variety of other applications. However, iterative imputation can lead to the “recovery ” of data that is seriously in error. In this paper we provide a method to recover the most reliable imputation, in terms of deciding when the inclusion of extra rows or columns, containing significant numbers of missing entries, is likely to lead to poor recovery of the missing parts. Although the proposed approach can be equally applied to a wide range of imputation methods, this paper addresses only the SFM problem. The performance of the proposed method is compared with Jacobs ’ and Shum’s methods for SFM.
Fast Incremental and Personalized PageRank
"... In this paper, we analyze the efficiency of Monte Carlo methods for incremental computation of PageRank, personalized PageRank, and similar random walk based methods (with focus on SALSA), on largescale dynamically evolving social networks. We assume that the graph of friendships is stored in distr ..."
Abstract

Cited by 36 (3 self)
 Add to MetaCart
In this paper, we analyze the efficiency of Monte Carlo methods for incremental computation of PageRank, personalized PageRank, and similar random walk based methods (with focus on SALSA), on largescale dynamically evolving social networks. We assume that the graph of friendships is stored in distributed shared memory, as is the case for large social networks such as Twitter. For global PageRank, we assume that the social network has n nodes, and m adversarially chosen edges arrive in a random order. We show that with a reset probability of, the expected total work needed to maintain an accurate estimate (using the Monte Carlo method) of the PageRank n ln m of every node at all times is O ( 2). This is significantly better than all known bounds for incremental PageRank. For instance, if we naively recompute the PageRanks as each edge arrives, the simple power iteration method needs
Optimal multiscale patterns in time series streams
 In SIGMOD
, 2006
"... We introduce a method to discover optimal local patterns, which concisely describe the main trends in a time series. Our approach examines the time series at multiple time scales (i.e., window sizes) and efficiently discovers the key patterns in each. We also introduce a criterion to select the best ..."
Abstract

Cited by 19 (2 self)
 Add to MetaCart
(Show Context)
We introduce a method to discover optimal local patterns, which concisely describe the main trends in a time series. Our approach examines the time series at multiple time scales (i.e., window sizes) and efficiently discovers the key patterns in each. We also introduce a criterion to select the best window sizes, which most concisely capture the key oscillatory as well as aperiodic trends. Our key insight lies in learning an optimal orthonormal transform from the data itself, as opposed to using a predetermined basis or approximating function (such as piecewise constant, shortwindow Fourier or wavelets), which essentially restricts us to a particular family of trends. Our method lifts that limitation, while lending itself to fast, incremental estimation in a streaming setting. Experimental evaluation shows that our method can capture meaningful patterns in a variety of settings. Our streaming approach requires order of magnitude less time and space, while still producing concise and informative patterns. 1.
Active Spectral Clustering
 In ICDM
, 2010
"... Abstract—The technique of spectral clustering is widely used to segment a range of data from graphs to images. Our work marks a natural progression of spectral clustering from the original passive unsupervised formulation to our active semisupervised formulation. We follow the widely used area of c ..."
Abstract

Cited by 17 (0 self)
 Add to MetaCart
(Show Context)
Abstract—The technique of spectral clustering is widely used to segment a range of data from graphs to images. Our work marks a natural progression of spectral clustering from the original passive unsupervised formulation to our active semisupervised formulation. We follow the widely used area of constrained clustering and allow supervision in the form of pairwise relations between two nodes: MustLink and CannotLink. Unlike most previous constrained clustering work, our constraints are specified incrementally by querying an oracle (domain expert). Since in practice, each query comes with a cost, our goal is to maximally improve the result with as few queries as possible. The advantages of our approach include: 1) it is principled by querying the constraints which maximally reduce the expected error; 2) it can incorporate both hard and soft constraints which are prevalent in practice. We empirically show that our method significantly outperforms the baseline approach, namely constrained spectral clustering with randomly selected constraints, on UCI benchmark data sets. Keywordsspectral clustering; active learning; constrained clustering I.
Parallel Algorithms for Mining LargeScale RichMedia Data
 Proc. 17th ACM Int’l Conf. Multimedia, (MM ’09
, 2009
"... ..."
Matrix Completion from PowerLaw Distributed
"... The lowrank matrix completion problem is a fundamental problem with many important applications. Recently, [4],[13] and [5] obtained the first nontrivial theoretical results for the problem assuming that the observed entries are sampled uniformly at random. Unfortunately, most realworld datasets ..."
Abstract

Cited by 12 (1 self)
 Add to MetaCart
(Show Context)
The lowrank matrix completion problem is a fundamental problem with many important applications. Recently, [4],[13] and [5] obtained the first nontrivial theoretical results for the problem assuming that the observed entries are sampled uniformly at random. Unfortunately, most realworld datasets do not satisfy this assumption, but instead exhibit powerlaw distributed samples. In this paper, we propose a graph theoretic approach to matrix completion that solves the problem for more realistic sampling models. Our method is simpler to analyze than previous methods with the analysis reducing to computing the threshold for complete cascades in random graphs, a problem of independent interest. By analyzing the graph theoretic problem, we show that our method achieves exact recovery when the observed entries are sampled from the ChungLuVu model, which can generate powerlaw distributed graphs. We also hypothesize that our algorithm solves the matrix completion problem from an optimal number of entries for the popular preferential attachment model and provide strong empirical evidence for the claim. Furthermore, our method is easy to implement and is substantially faster than existing methods. We demonstrate the effectiveness of our method on random instances where the lowrank matrix is sampled according to the prevalent random graph models for complex networks and present promising preliminary results on the Netflix challenge dataset. 1
Collaborative filtering via Euclidean embedding
 In Proceedings of the Fourth ACM Conference on Recommender Systems, RecSys ’10
, 2010
"... Recommendation systems suggest items based on user preferences. Collaborative filtering is a popular approach in which recommending is based on the rating history of the system. One of the most accurate and scalable collaborative filtering algorithms is matrix factorization, which is based on a late ..."
Abstract

Cited by 10 (2 self)
 Add to MetaCart
(Show Context)
Recommendation systems suggest items based on user preferences. Collaborative filtering is a popular approach in which recommending is based on the rating history of the system. One of the most accurate and scalable collaborative filtering algorithms is matrix factorization, which is based on a latent factor model. We propose a novel Euclidean embedding method as an alternative latent factor model to implement collaborative filtering. In this method, users and items are embedded in a unified Euclidean space where the distance between a user and an item is inversely proportional to the rating. This model is comparable to matrix factorization in terms of both scalability and accuracy while providing several advantages. First, the result of Euclidean embedding is more intuitively understandable for humans, allowing useful visualizations. Second, the neighborhood structure of the unified Euclidean space allows very efficient recommendation queries. Finally, the method facilitates online implementation requirements such as mapping new users or items in an existing model. Our experimental results confirm these advantages and show that collaborative filtering via Euclidean embedding is a promising approach for online recommender systems.