Results 1  10
of
34
A Survey of Collaborative Filtering Techniques
, 2009
"... As one of the most successful approaches to building recommender systems, collaborative filtering (CF) uses the known preferences of a group of users to make recommendations or predictions of the unknown preferences for other users. In this paper, we first introduce CF tasks and their main challenge ..."
Abstract

Cited by 205 (0 self)
 Add to MetaCart
As one of the most successful approaches to building recommender systems, collaborative filtering (CF) uses the known preferences of a group of users to make recommendations or predictions of the unknown preferences for other users. In this paper, we first introduce CF tasks and their main challenges, such as data sparsity, scalability, synonymy, gray sheep, shilling attacks, privacy protection, etc., and their possible solutions. We then present three main categories of CF techniques: memorybased, modelbased, and hybrid CF algorithms (that combine CF with other recommendation techniques), with examples for representative algorithms of each category, and analysis of their predictive performance and their ability to address the challenges. From basic techniques to the stateoftheart, we attempt to present a comprehensive survey for CF techniques, which can be served as a roadmap for research and practice in this area.
Spectral Regularization Algorithms for Learning Large Incomplete Matrices
, 2009
"... We use convex relaxation techniques to provide a sequence of regularized lowrank solutions for largescale matrix completion problems. Using the nuclear norm as a regularizer, we provide a simple and very efficient convex algorithm for minimizing the reconstruction error subject to a bound on the n ..."
Abstract

Cited by 104 (5 self)
 Add to MetaCart
We use convex relaxation techniques to provide a sequence of regularized lowrank solutions for largescale matrix completion problems. Using the nuclear norm as a regularizer, we provide a simple and very efficient convex algorithm for minimizing the reconstruction error subject to a bound on the nuclear norm. Our algorithm SoftImpute iteratively replaces the missing elements with those obtained from a softthresholded SVD. With warm starts this allows us to efficiently compute an entire regularization path of solutions on a grid of values of the regularization parameter. The computationally intensive part of our algorithm is in computing a lowrank SVD of a dense matrix. Exploiting the problem structure, we show that the task can be performed with a complexity linear in the matrix dimensions. Our semidefiniteprogramming algorithm is readily scalable to large matrices: for example it can obtain a rank80 approximation of a 10 6 × 10 6 incomplete matrix with 10 5 observed entries in 2.5 hours, and can fit a rank 40 approximation to the full Netflix training set in 6.6 hours. Our methods show very good performance both in training and test error when compared to other competitive stateofthe art techniques. 1.
Nonlinear Matrix Factorization with Gaussian Processes
"... A popular approach to collaborative filtering is matrix factorization. In this paper we develop a nonlinear probabilistic matrix factorization using Gaussian process latent variable models. We use stochastic gradient descent (SGD) to optimize the model. SGD allows us to apply Gaussian processes to ..."
Abstract

Cited by 72 (1 self)
 Add to MetaCart
(Show Context)
A popular approach to collaborative filtering is matrix factorization. In this paper we develop a nonlinear probabilistic matrix factorization using Gaussian process latent variable models. We use stochastic gradient descent (SGD) to optimize the model. SGD allows us to apply Gaussian processes to data sets with millions of observations without approximate methods. We apply our approach to benchmark movie recommender data sets. The results show better than previous stateoftheart performance. 1.
A Simple Algorithm for Nuclear Norm Regularized Problems
"... Optimization problems with a nuclear norm regularization, such as e.g. low norm matrix factorizations, have seen many applications recently. We propose a new approximation algorithm building upon the recent sparse approximate SDP solver of (Hazan, 2008). The experimental efficiency of our method is ..."
Abstract

Cited by 48 (3 self)
 Add to MetaCart
(Show Context)
Optimization problems with a nuclear norm regularization, such as e.g. low norm matrix factorizations, have seen many applications recently. We propose a new approximation algorithm building upon the recent sparse approximate SDP solver of (Hazan, 2008). The experimental efficiency of our method is demonstrated on large matrix completion problems such as the Netflix dataset. The algorithm comes with strong convergence guarantees, and can be interpreted as a first theoretically justified variant of SimonFunktype SVD heuristics. The method is free of tuning parameters, and very easy to parallelize. 1.
Collaborative filtering and the missing at random assumption
 In Proceedings of the 23rd Conference on Uncertainty in Artificial Intelligence (UAI
, 2007
"... Rating prediction is an important application, and a popular research topic in collaborative ltering. However, both the validity of learning algorithms, and the validity of standard testing procedures rest on the assumption that missing ratings are missing at random (MAR). In this paper we presen ..."
Abstract

Cited by 41 (4 self)
 Add to MetaCart
(Show Context)
Rating prediction is an important application, and a popular research topic in collaborative ltering. However, both the validity of learning algorithms, and the validity of standard testing procedures rest on the assumption that missing ratings are missing at random (MAR). In this paper we present the results of a user study in which we collect a random sample of ratings from current users of an online radio service. An analysis of the rating data collected in the study shows that the sample of random ratings has markedly dierent properties than ratings of userselected songs. When asked to report on their own rating behaviour, a large number of users indicate they believe their opinion of a song does aect whether they choose to rate that song, a violation of the MAR condition. Finally, we present experimental results showing that incorporating an explicit model of the missing data mechanism can lead to signi cant improvements in prediction performance on the random sample of ratings. 1
Fast nonparametric matrix factorization for largescale collaborative filtering. The 32nd SIGIR conference
, 2009
"... With the sheer growth of online user data, it becomes challenging to develop preference learning algorithms that are sufficiently flexible in modeling but also affordable in computation. In this paper we develop nonparametric matrix factorization methods by allowing the latent factors of two lowran ..."
Abstract

Cited by 32 (3 self)
 Add to MetaCart
(Show Context)
With the sheer growth of online user data, it becomes challenging to develop preference learning algorithms that are sufficiently flexible in modeling but also affordable in computation. In this paper we develop nonparametric matrix factorization methods by allowing the latent factors of two lowrank matrix factorization methods, the singular value decomposition (SVD) and probabilistic principal component analysis (pPCA), to be datadriven, with the dimensionality increasing with data size. We show that the formulations of the two nonparametric models are very similar, and their optimizations share similar procedures. Compared to traditional parametric lowrank methods, nonparametric models are appealing for their flexibility in modeling complex data dependencies. However, this modeling advantage comes at a computational price — it is highly challenging to scale them to largescale problems, hampering their application to applications such as collaborative filtering. In this paper we introduce novel optimization algorithms, which are simple to implement, which allow learning both nonparametric matrix factorization models to be highly efficient on largescale problems. Our experiments on EachMovie and Netflix, the two largest public benchmarks to date, demonstrate that the nonparametric models make more accurate predictions of user ratings, and are computationally comparable or sometimes even faster in training, in comparison with previous stateoftheart parametric matrix factorization models.
Collaborative Prediction and Ranking with NonRandom Missing Data
"... A fundamental aspect of ratingbased recommender systems is the observation process, the process by which users choose the items they rate. Nearly all research on collaborative filtering and recommender systems is founded on the assumption that missing ratings are missing at random. The statistical ..."
Abstract

Cited by 24 (3 self)
 Add to MetaCart
A fundamental aspect of ratingbased recommender systems is the observation process, the process by which users choose the items they rate. Nearly all research on collaborative filtering and recommender systems is founded on the assumption that missing ratings are missing at random. The statistical theory of missing data shows that incorrect assumptions about missing data can lead to biased parameter estimation and prediction. In a recent study, we demonstrated strong evidence for violations of the missing at random condition in a real recommender system. In this paper we present the first study of the effect of nonrandom missing data on collaborative ranking, and extend our previous results regarding the impact of nonrandom missing data on collaborative prediction.
Applying collaborative filtering techniques to movie search for better ranking and browsing
 In KDD
, 2007
"... In general web search engines, such as Google and Yahoo! Search, document relevance for the given query and item authority are two major components of the ranking system. However, many information search tools in ecommerce sites ignore item authority in their ranking systems. In part, this may stem ..."
Abstract

Cited by 22 (0 self)
 Add to MetaCart
(Show Context)
In general web search engines, such as Google and Yahoo! Search, document relevance for the given query and item authority are two major components of the ranking system. However, many information search tools in ecommerce sites ignore item authority in their ranking systems. In part, this may stem from the relative difficulty of generating item authorities due to the different characteristics of documents (or items) between ecommerce sites and the web. Links between documents in an ecommerce site often represent relationship rather than recommendation. For example, two documents (items) are connected since both are produced by the same company. We propose a new ranking method, which combines recommender systems with information search tools for better search and browsing. Our method uses a collaborative filtering algorithm to generate personal item authorities for each user and combines them with item proximities for better ranking. To demonstrate our approach, we build a prototype movie search engine called MAD6 (Movies, Actors and Directors; 6 degrees of separation).
Mixed Membership Matrix Factorization
"... Discrete mixed membership modeling and continuous latent factor modeling (also known as matrix factorization) are two popular, complementary approaches to dyadic data analysis. In this work, we develop a fully Bayesian framework for integrating the two approaches into unified Mixed Membership Matrix ..."
Abstract

Cited by 18 (1 self)
 Add to MetaCart
(Show Context)
Discrete mixed membership modeling and continuous latent factor modeling (also known as matrix factorization) are two popular, complementary approaches to dyadic data analysis. In this work, we develop a fully Bayesian framework for integrating the two approaches into unified Mixed Membership Matrix Factorization (M 3 F) models. We introduce two M 3 F models, derive Gibbs sampling inference procedures, and validate our methods on the EachMovie, MovieLens, and Netflix Prize collaborative filtering datasets. We find that, even when fitting fewer parameters, the M 3 F models outperform stateoftheart latent factor approaches on all benchmarks, yielding the greatest gains in accuracy on sparselyrated, highvariance items. 1.
Collaborative filtering via ensembles of matrix . . .
, 2007
"... We present a Matrix Factorization (MF) based approach for the Netflix Prize competition. Currently MF based algorithms are popular and have proved successful for collaborative filtering tasks. For the Netflix Prize competition, we adopt three different types of MF algorithms: regularized MF, maximum ..."
Abstract

Cited by 17 (0 self)
 Add to MetaCart
We present a Matrix Factorization (MF) based approach for the Netflix Prize competition. Currently MF based algorithms are popular and have proved successful for collaborative filtering tasks. For the Netflix Prize competition, we adopt three different types of MF algorithms: regularized MF, maximum margin MF and nonnegative MF. Furthermore, for each MF algorithm, instead of selecting the optimal parameters, we combine the results obtained with several parameters. With this method, we achieve a performance that is more than 6 % better than the Netflix’s own system.