Results 1  10
of
133
Convex multitask feature learning
 Machine Learning
, 2007
"... Summary. We present a method for learning sparse representations shared across multiple tasks. This method is a generalization of the wellknown singletask 1norm regularization. It is based on a novel nonconvex regularizer which controls the number of learned features common across the tasks. We p ..."
Abstract

Cited by 139 (15 self)
 Add to MetaCart
Summary. We present a method for learning sparse representations shared across multiple tasks. This method is a generalization of the wellknown singletask 1norm regularization. It is based on a novel nonconvex regularizer which controls the number of learned features common across the tasks. We prove that the method is equivalent to solving a convex optimization problem for which there is an iterative algorithm which converges to an optimal solution. The algorithm has a simple interpretation: it alternately performs a supervised and an unsupervised step, where in the former step it learns taskspecific functions and in the latter step it learns commonacrosstasks sparse representations for these functions. We also provide an extension of the algorithm which learns sparse nonlinear representations using kernels. We report experiments on simulated and real data sets which demonstrate that the proposed method can both improve the performance relative to learning each task independently and lead to a few learned features common across related tasks. Our algorithm can also be used, as a special case, to simply select – not learn – a few common variables across the tasks 3.
Fast maximum margin matrix factorization for collaborative prediction
 In Proceedings of the 22nd International Conference on Machine Learning (ICML
, 2005
"... Maximum Margin Matrix Factorization (MMMF) was recently suggested (Srebro et al., 2005) as a convex, infinite dimensional alternative to lowrank approximations and standard factor models. MMMF can be formulated as a semidefinite programming (SDP) and learned using standard SDP solvers. However, cu ..."
Abstract

Cited by 133 (7 self)
 Add to MetaCart
Maximum Margin Matrix Factorization (MMMF) was recently suggested (Srebro et al., 2005) as a convex, infinite dimensional alternative to lowrank approximations and standard factor models. MMMF can be formulated as a semidefinite programming (SDP) and learned using standard SDP solvers. However, current SDP solvers can only handle MMMF problems on matrices of dimensionality up to a few hundred. Here, we investigate a direct gradientbased optimization method for MMMF and demonstrate it on large collaborative prediction problems. We compare against results obtained by Marlin (2004) and find that MMMF substantially outperforms all nine methods he tested. 1.
Probabilistic Matrix Factorization
"... Many existing approaches to collaborative filtering can neither handle very large datasets nor easily deal with users who have very few ratings. In this paper we present the Probabilistic Matrix Factorization (PMF) model which scales linearly with the number of observations and, more importantly, pe ..."
Abstract

Cited by 132 (4 self)
 Add to MetaCart
Many existing approaches to collaborative filtering can neither handle very large datasets nor easily deal with users who have very few ratings. In this paper we present the Probabilistic Matrix Factorization (PMF) model which scales linearly with the number of observations and, more importantly, performs well on the large, sparse, and very imbalanced Netflix dataset. We further extend the PMF model to include an adaptive prior on the model parameters and show how the model capacity can be controlled automatically. Finally, we introduce a constrained version of the PMF model that is based on the assumption that users who have rated similar sets of movies are likely to have similar preferences. The resulting model is able to generalize considerably better for users with very few ratings. When the predictions of multiple PMF models are linearly combined with the predictions of Restricted Boltzmann Machines models, we achieve an error rate of 0.8861, that is nearly 7 % better than the score of Netflix’s own system. 1
Multitask feature learning
 Advances in Neural Information Processing Systems 19
, 2007
"... We present a method for learning a lowdimensional representation which is shared across a set of multiple related tasks. The method builds upon the wellknown 1norm regularization problem using a new regularizer which controls the number of learned features common for all the tasks. We show that th ..."
Abstract

Cited by 131 (7 self)
 Add to MetaCart
We present a method for learning a lowdimensional representation which is shared across a set of multiple related tasks. The method builds upon the wellknown 1norm regularization problem using a new regularizer which controls the number of learned features common for all the tasks. We show that this problem is equivalent to a convex optimization problem and develop an iterative algorithm for solving it. The algorithm has a simple interpretation: it alternately performs a supervised and an unsupervised step, where in the latter step we learn commonacrosstasks representations and in the former step we learn taskspecific functions using these representations. We report experiments on a simulated and a real data set which demonstrate that the proposed method dramatically improves the performance relative to learning each task independently. Our algorithm can also be used, as a special case, to simply select – not learn – a few common features across the tasks.
Restricted Boltzmann machines for collaborative filtering
 In Machine Learning, Proceedings of the Twentyfourth International Conference (ICML 2004). ACM
, 2007
"... Most of the existing approaches to collaborative filtering cannot handle very large data sets. In this paper we show how a class of twolayer undirected graphical models, called Restricted Boltzmann Machines (RBM’s), can be used to model tabular data, such as user’s ratings of movies. We present eff ..."
Abstract

Cited by 118 (12 self)
 Add to MetaCart
Most of the existing approaches to collaborative filtering cannot handle very large data sets. In this paper we show how a class of twolayer undirected graphical models, called Restricted Boltzmann Machines (RBM’s), can be used to model tabular data, such as user’s ratings of movies. We present efficient learning and inference procedures for this class of models and demonstrate that RBM’s can be successfully applied to the Netflix data set, containing over 100 million user/movie ratings. We also show that RBM’s slightly outperform carefullytuned SVD models. When the predictions of multiple RBM models and multiple SVD models are linearly combined, we achieve an error rate that is well over 6 % better than the score of Netflix’s own system. 1.
Uncovering shared structures in multiclass classification
 In Proceedings of the Twentyfourth International Conference on Machine Learning
, 2007
"... This paper suggests a method for multiclass learning with many classes by simultaneously learning shared characteristics common to the classes, and predictors for the classes in terms of these characteristics. We cast this as a convex optimization problem, using tracenorm regularization and study g ..."
Abstract

Cited by 60 (0 self)
 Add to MetaCart
This paper suggests a method for multiclass learning with many classes by simultaneously learning shared characteristics common to the classes, and predictors for the classes in terms of these characteristics. We cast this as a convex optimization problem, using tracenorm regularization and study gradientbased optimization both for the linear case and the kernelized setting. 1.
Relational Learning via Collective Matrix Factorization
, 2008
"... Relational learning is concerned with predicting unknown values of a relation, given a database of entities and observed relations among entities. An example of relational learning is movie rating prediction, where entities could include users, movies, genres, and actors. Relations would then encode ..."
Abstract

Cited by 60 (3 self)
 Add to MetaCart
Relational learning is concerned with predicting unknown values of a relation, given a database of entities and observed relations among entities. An example of relational learning is movie rating prediction, where entities could include users, movies, genres, and actors. Relations would then encode users ’ ratings of movies, movies ’ genres, and actors ’ roles in movies. A common prediction technique given one pairwise relation, for example a #users × #movies ratings matrix, is lowrank matrix factorization. In domains with multiple relations, represented as multiple matrices, we may improve predictive accuracy by exploiting information from one relation while predicting another. To this end, we propose a collective matrix factorization model: we simultaneously factor several matrices, sharing parameters among factors when an entity participates in multiple relations. Each relation can have a different value type and error distribution; so, we allow nonlinear relationships between the parameters and outputs, using Bregman divergences to measure error. We extend standard alternating projection algorithms to our model, and derive an efficient Newton update for the projection. Furthermore, we propose stochastic optimization methods to deal with large, sparse matrices. Our model generalizes several existing matrix factorization methods, and therefore yields new largescale optimization algorithms for these problems. Our model can handle any pairwise relational schema and a
A new approach to collaborative filtering: Operator estimation with spectral regularization
 Journal of Machine Learning Research
"... We present a general approach for collaborative filtering (CF) using spectral regularization to learn linear operators mapping a set of “users ” to a set of possibly desired “objects”. In particular, several recent lowrank type matrixcompletion methods for CF are shown to be special cases of our p ..."
Abstract

Cited by 50 (3 self)
 Add to MetaCart
We present a general approach for collaborative filtering (CF) using spectral regularization to learn linear operators mapping a set of “users ” to a set of possibly desired “objects”. In particular, several recent lowrank type matrixcompletion methods for CF are shown to be special cases of our proposed framework. Unlike existing regularizationbased CF, our approach can be used to incorporate additional information such as attributes of the users/objects—a feature currently lacking in existing regularizationbased CF approaches—using popular and wellknown kernel methods. We provide novel representer theorems that we use to develop new estimation methods. We then provide learning algorithms based on lowrank decompositions and test them on a standard CF data set. The experiments indicate the advantages of generalizing the existing regularizationbased CF methods to incorporate related information about users and objects. Finally, we show that certain multitask learning methods can be also seen as special cases of our proposed approach.
A spectral regularization framework for multitask structure learning
 In NIPS
, 2008
"... Learning the common structure shared by a set of supervised tasks is an important practical and theoretical problem. Knowledge of this structure may lead to better generalization performance on the tasks and may also facilitate learning new tasks. We propose a framework for solving this problem, whi ..."
Abstract

Cited by 48 (8 self)
 Add to MetaCart
Learning the common structure shared by a set of supervised tasks is an important practical and theoretical problem. Knowledge of this structure may lead to better generalization performance on the tasks and may also facilitate learning new tasks. We propose a framework for solving this problem, which is based on regularization with spectral functions of matrices. This class of regularization problems exhibits appealing computational properties and can be optimized efficiently by an alternating minimization algorithm. In addition, we provide a necessary and sufficient condition for convexity of the regularizer. We analyze concrete examples of the framework, which are equivalent to regularization with Lp matrix norms. Experiments on two real data sets indicate that the algorithm scales well with the number of tasks and improves on state of the art statistical performance. 1
Convex and SemiNonnegative Matrix Factorizations
, 2008
"... We present several new variations on the theme of nonnegative matrix factorization (NMF). Considering factorizations of the form X = F GT, we focus on algorithms in which G is restricted to contain nonnegative entries, but allow the data matrix X to have mixed signs, thus extending the applicable ra ..."
Abstract

Cited by 45 (4 self)
 Add to MetaCart
We present several new variations on the theme of nonnegative matrix factorization (NMF). Considering factorizations of the form X = F GT, we focus on algorithms in which G is restricted to contain nonnegative entries, but allow the data matrix X to have mixed signs, thus extending the applicable range of NMF methods. We also consider algorithms in which the basis vectors of F are constrained to be convex combinations of the data points. This is used for a kernel extension of NMF. We provide algorithms for computing these new factorizations and we provide supporting theoretical analysis. We also analyze the relationships between our algorithms and clustering algorithms, and consider the implications for sparseness of solutions. Finally, we present experimental results that explore the properties of these new methods.