Results 1  10
of
36
Relational Learning via Collective Matrix Factorization
, 2008
"... Relational learning is concerned with predicting unknown values of a relation, given a database of entities and observed relations among entities. An example of relational learning is movie rating prediction, where entities could include users, movies, genres, and actors. Relations would then encode ..."
Abstract

Cited by 130 (4 self)
 Add to MetaCart
(Show Context)
Relational learning is concerned with predicting unknown values of a relation, given a database of entities and observed relations among entities. An example of relational learning is movie rating prediction, where entities could include users, movies, genres, and actors. Relations would then encode users ’ ratings of movies, movies ’ genres, and actors ’ roles in movies. A common prediction technique given one pairwise relation, for example a #users × #movies ratings matrix, is lowrank matrix factorization. In domains with multiple relations, represented as multiple matrices, we may improve predictive accuracy by exploiting information from one relation while predicting another. To this end, we propose a collective matrix factorization model: we simultaneously factor several matrices, sharing parameters among factors when an entity participates in multiple relations. Each relation can have a different value type and error distribution; so, we allow nonlinear relationships between the parameters and outputs, using Bregman divergences to measure error. We extend standard alternating projection algorithms to our model, and derive an efficient Newton update for the projection. Furthermore, we propose stochastic optimization methods to deal with large, sparse matrices. Our model generalizes several existing matrix factorization methods, and therefore yields new largescale optimization algorithms for these problems. Our model can handle any pairwise relational schema and a
Distributed nonnegative matrix factorization for webscale dyadic data analysis on mapreduce
 In WWW ’10: Proceedings of the 19th international conference on World wide web
, 2010
"... The Web abounds with dyadic data that keeps increasing by every single second. Previous work has repeatedly shown the usefulness of extracting the interaction structure inside dyadic data [21, 9, 8]. A commonly used tool in extracting the underlying structure is the matrix factorization, whose fame ..."
Abstract

Cited by 39 (1 self)
 Add to MetaCart
(Show Context)
The Web abounds with dyadic data that keeps increasing by every single second. Previous work has repeatedly shown the usefulness of extracting the interaction structure inside dyadic data [21, 9, 8]. A commonly used tool in extracting the underlying structure is the matrix factorization, whose fame was further boosted in the Netflix challenge [26]. When we were trying to replicate the same success on realworld Web dyadic data, we were seriously challenged by the scalability of available tools. We therefore in this paper report our efforts on scaling up the nonnegative matrix factorization (NMF) technique. We show that by carefully partitioning the data and arranging the computations to maximize data locality and parallelism, factorizing a tens of millions by hundreds of millions matrix with billions of nonzero cells can be accomplished within tens of hours. This result effectively assures practitioners of the scalability of NMF on Webscale dyadic data.
Scalable distributed inference of dynamic user interests for behavioral targeting
 In KDD
, 2011
"... Historical user activity is key for building user profiles to predict the user behavior and affinities in many web applications such as targeting of online advertising, content personalization and social recommendations. User profiles are temporal, and changes in a user’s activity patterns are parti ..."
Abstract

Cited by 37 (7 self)
 Add to MetaCart
(Show Context)
Historical user activity is key for building user profiles to predict the user behavior and affinities in many web applications such as targeting of online advertising, content personalization and social recommendations. User profiles are temporal, and changes in a user’s activity patterns are particularly useful for improved prediction and recommendation. For instance, an increased interest in carrelated web pages may well suggest that the user might be shopping for a new vehicle.In this paper we present a comprehensive statistical framework for user profiling based on topic models which is able to capture such effects in a fully unsupervised fashion. Our method models topical interests of a user dynamically where both the user association with the topics and the topics themselves are allowed to vary over time, thus ensuring that the profiles remain current. We describe a streaming, distributed inference algorithm which is able to handle tens of millions of users. Our results show that our model contributes towards improved behavioral targeting of display advertising relative to baseline models that do not incorporate topical and/or temporal dependencies. As a sideeffect our model yields humanunderstandable results which can be used in an intuitive fashion by advertisers.
Bayesian coclustering
 In Proceedings of the 8th IEEE International Conference on Data Mining (ICDM
, 2008
"... In recent years, coclustering has emerged as a powerful data mining tool that can analyze dyadic data connecting two entities. However, almost all existing coclustering techniques are partitional, and allow individual rows and columns of a data matrix to belong to only one cluster. Several current ..."
Abstract

Cited by 33 (4 self)
 Add to MetaCart
(Show Context)
In recent years, coclustering has emerged as a powerful data mining tool that can analyze dyadic data connecting two entities. However, almost all existing coclustering techniques are partitional, and allow individual rows and columns of a data matrix to belong to only one cluster. Several current applications, such as recommendation systems and market basket analysis, can substantially benefit from a mixed membership of rows and columns. In this paper, we present Bayesian coclustering (BCC) models, that allow a mixed membership in row and column clusters. BCC maintains separate Dirichlet priors for rows and columns over the mixed membership and assumes each observation to be generated by an exponential family distribution corresponding to its row and column clusters. We propose a fast variational algorithm for inference and parameter estimation. The model is designed to naturally handle sparse matrices as the inference is done only based on the nonmissing entries. In addition to finding a cocluster structure in observations, the model outputs a low dimensional coembedding, and accurately predicts missing values in the original matrix. We demonstrate the efficacy of the model through experiments on both simulated and real data. 1
Generalized probabilistic matrix factorizations for collaborative filtering
, 2010
"... Abstract—Probabilistic matrix factorization (PMF) methods have shown great promise in collaborative filtering. In this paper, we consider several variants and generalizations of PMF framework inspired by three broad questions: Are the prior distributions used in existing PMF models suitable, or can ..."
Abstract

Cited by 21 (4 self)
 Add to MetaCart
(Show Context)
Abstract—Probabilistic matrix factorization (PMF) methods have shown great promise in collaborative filtering. In this paper, we consider several variants and generalizations of PMF framework inspired by three broad questions: Are the prior distributions used in existing PMF models suitable, or can one get better predictive performance with different priors? Are there suitable extensions to leverage side information? Are there benefits to taking into account row and column biases? We develop new families of PMF models to address these questions along with efficient approximate inference algorithms for learning and prediction. Through extensive experiments on movie recommendation datasets, we illustrate that simpler models directly capturing correlations among latent factors can outperform existing PMF models, side information can benefit prediction accuracy, and accounting for row/column biases leads to improvements in predictive performance. Keywordsprobabilistic matrix factorization, topic models, variational inference I.
A SpatioTemporal Approach to Collaborative Filtering
"... In this paper, we propose a novel spatiotemporal model for collaborative filtering applications. Our model is based on lowrank matrix factorization that uses a spatiotemporal filtering approach to estimate user and item factors. The spatial component regularizes the factors by exploiting correlat ..."
Abstract

Cited by 19 (2 self)
 Add to MetaCart
(Show Context)
In this paper, we propose a novel spatiotemporal model for collaborative filtering applications. Our model is based on lowrank matrix factorization that uses a spatiotemporal filtering approach to estimate user and item factors. The spatial component regularizes the factors by exploiting correlation across users and/or items, modeled as a function of some implicit feedback (e.g., who rated what) and/or some side information (e.g., user demographics, browsing history). In particular, we incorporate correlation in factors through a Markov random field prior in a probabilistic framework, whereby the neighborhood weights are functions of user and item covariates. The temporal component ensures that the user/item factors adapt to process changes that occur through time and is implemented in a state space framework with fast estimation through Kalman filtering. Our spatiotemporal filtering (STKF hereafter) approach provides a single joint model to simultaneously incorporate both spatial and temporal structure in ratings and therefore provides an accurate method to predict future ratings. To ensure scalability of STKF, we employ a meanfield approximation for inference. Incorporating user/item covariates in estimating neighborhood weights also helps in dealing with both coldstart and warmstart problems seamlessly in a single unified modeling framework; covariates predict factors for new users and items through the neighborhood. We illustrate our method on simulated data, benchmark data and data obtained from a relatively new recommender system application arising in the context of Yahoo! Front Page.
Approximation algorithms for coclustering
 In Proceedings PODS 2008
, 2008
"... Coclustering is the simultaneous partitioning of the rows and columns of a matrix such that the blocks induced by the row/column partitions are good clusters. Motivated by several applications in text mining, marketbasket analysis, and bioinformatics, this problem has attracted severe attention in ..."
Abstract

Cited by 19 (0 self)
 Add to MetaCart
(Show Context)
Coclustering is the simultaneous partitioning of the rows and columns of a matrix such that the blocks induced by the row/column partitions are good clusters. Motivated by several applications in text mining, marketbasket analysis, and bioinformatics, this problem has attracted severe attention in the past few years. Unfortunately, to date, most of the algorithmic work on this problem has been heuristic in nature. In this work we obtain the first approximation algorithms for the coclustering problem. Our algorithms are simple and obtain constantfactor approximation solutions to the optimum. We also show that coclustering is NPhard, thereby complementing our algorithmic result.
Pervasive parallelism in data mining: dataflow solution to coclustering large and sparse netflix data
 In KDD
, 2009
"... All Netflix Prize algorithms proposed so far are prohibitively costly for largescale production systems. In this paper, we describe an efficient dataflow implementation of a collaborative filtering (CF) solution to the Netflix Prize problem [1] based on weighted coclustering [5]. The dataflow libr ..."
Abstract

Cited by 15 (1 self)
 Add to MetaCart
(Show Context)
All Netflix Prize algorithms proposed so far are prohibitively costly for largescale production systems. In this paper, we describe an efficient dataflow implementation of a collaborative filtering (CF) solution to the Netflix Prize problem [1] based on weighted coclustering [5]. The dataflow library we use facilitates the development of sophisticated parallel programs designed to fully utilize commodity multicore hardware, while hiding traditional difficulties such as queuing, threading, memory management, and deadlocks. The dataflow CF implementation first compresses the large, sparse training dataset into coclusters. Then it generates recommendations by combining the average ratings of the coclusters with the biases of the users and movies. When configured to identify 20x20 coclusters in the Netflix training dataset, the implementation predicted over 100 million ratings in 16.31 minutes and achieved an RMSE of 0.88846 without any finetuning or domain knowledge. This is an effective realtime prediction runtime of 9.7 µs per rating which is far superior to previously reported results. Moreover, the implemented coclustering framework supports a wide variety of other largescale data mining applications and forms the basis for predictive modeling on large, dyadic datasets [4, 7].
Residual Bayesian Coclustering for Matrix Approximation
"... In recent years, matrix approximation for missing value prediction has emerged as an important problem in a variety of domains such as recommendation systems, ecommerce and online advertisement. While matrix factorization based algorithms typically have good approximation accuracy, such algorithms ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
(Show Context)
In recent years, matrix approximation for missing value prediction has emerged as an important problem in a variety of domains such as recommendation systems, ecommerce and online advertisement. While matrix factorization based algorithms typically have good approximation accuracy, such algorithms can be slow especially for large matrices. Further, such algorithms cannot naturally make prediction on new rows or columns. In this paper, we propose residual Bayesian coclustering (RBC), which learns a generative model corresponding to the matrix from the nonmissing entries, and uses the model to predict the missing entries. RBC is an extension of Bayesian coclustering by taking row and column bias into consideration. The model allows mixed memberships of rows and columns to multiple clusters, and can naturally handle the prediction on new rows and columns which are not used in the training process, given only a few nonmissing entries in them. We propose two variational inference based algorithms for learning the model and predicting missing entries. One of the proposed algorithms leads to a parallel RBC which can achieve significant speedups. The efficacy of RBC is demonstrated by extensive experimental comparisons with stateoftheart algorithms on real world datasets. 1
CoBaFi: Collaborative Bayesian Filtering
 In World Wide Web Conference
, 2014
"... Given a large dataset of users ’ ratings of movies, what is the best model to accurately predict which movies a person will like? And how can we prevent spammers from tricking our algorithms into suggesting a bad movie? Is it possible to infer structure between movies simultaneously? In this paper w ..."
Abstract

Cited by 4 (4 self)
 Add to MetaCart
(Show Context)
Given a large dataset of users ’ ratings of movies, what is the best model to accurately predict which movies a person will like? And how can we prevent spammers from tricking our algorithms into suggesting a bad movie? Is it possible to infer structure between movies simultaneously? In this paper we describe a unified Bayesian approach to Collaborative Filtering that accomplishes all of these goals. It models the discrete structure of ratings and is flexible to the often nonGaussian shape of the distribution. Additionally, our method finds a coclustering of the users and items, which improves the model’s accuracy and makes the model robust to fraud. We offer three main contributions: (1) We provide a novel model and Gibbs sampling algorithm that accurately models the quirks of real world ratings, such as convex ratings distributions. (2) We provide proof of our model’s robustness to spam and anomalous behavior. (3) We use several real world datasets to demonstrate the model’s effectiveness in accurately predicting user’s ratings, avoiding prediction skew in the face of injected spam, and finding interesting patterns in real world ratings data.