## Large-scale Collaborative Prediction Using a Nonparametric Random Effects Model

### Cached

### Download Links

Citations: | 17 - 1 self |

### BibTeX

@MISC{Yu_large-scalecollaborative,

author = {Kai Yu and John Lafferty and Shenghuo Zhu and Yihong Gong},

title = {Large-scale Collaborative Prediction Using a Nonparametric Random Effects Model},

year = {}

}

### OpenURL

### Abstract

A nonparametric model is introduced that allows multiple related regression tasks to take inputs from a common data space. Traditional transfer learning models can be inappropriate if the dependence among the outputs cannot be fully resolved by known inputspecific and task-specific predictors. The proposed model treats such output responses as conditionally independent, given known predictors and appropriate unobserved random effects. The model is nonparametric in the sense that the dimensionality of random effects is not specified a priori but is instead determined from data. An approach to estimating the model is presented uses an EM algorithm that is efficient on a very large scale collaborative prediction problem. The obtained prediction accuracy is competitive with state-of-the-art results. 1.

### Citations

502 | Probabilistic principal component analysis
- Tipping, Bishop
- 1999
(Show Context)
Citation Context ...atrix factorization (Rennie & Srebro, 2005), a low-rank method with ℓ2 norm regularization on the factors, optimized by conjugate gradient descent. • PPCA: probabilistic principal component analysis (=-=Tipping & Bishop, 1999-=-), which is a probabilistic low-rank matrix factorization method optimized by EM algorithm. • BSRM: Bayesian stochastic relational model (Zhu et al., 2009), a semi-parametric model implemented by Gibb... |

149 | Probabilistic matrix factorization
- Salakhutdinov, Mnih
- 2008
(Show Context)
Citation Context ...nd the other from GP(0, Ω), namely, Y becomes a low-rank random function. Low-rank matrix factorization is perhaps the most popular and the state-of-the-art method for collaborative filtering, e.g., (=-=Salakhutdinov & Mnih, 2008-=-a). Our model generalizes them in the sense that it models an infinite-dimensional relational function. A simplification of our work leads to a nonparametric prin-Large-scale Collaborative Prediction... |

132 | Restricted Boltzmann machines for collaborative filtering
- SALAKHUTDINOV, MNIH, et al.
(Show Context)
Citation Context ...eral other methods: • SVD: a method almost the same as FMMMF, using a gradient-based method for optimization (Kurucz et al., 2007). • RBM: Restricted Boltzmann Machine trained by contrast divergence (=-=Salakhutdinov et al., 2007-=-). • PMF and BPMF: probabilistic matrix factorization (Salakhutdinov & Mnih, 2008b), and its Bayesian version (Salakhutdinov & Mnih, 2008a). • PMF-VB: probabilistic matrix factorization using a variat... |

97 | Learning Gaussian Processes from Multiple Tasks
- Yu, Tresp, et al.
- 2005
(Show Context)
Citation Context ... There is a large body of research on multi-task learning using Gaussian processes, including those that learn the covariance Σ shared across tasks (Lawrence & Platt, 2004; Schwaighofer et al., 2005; =-=Yu et al., 2005-=-), and those that additionally consider the covariance Ω between tasks (Yu et al., 2007; Bonilla et al., 2008). The methods that only use Σ have been applied to collaborative prediction (Schwaighofer ... |

89 | b). Bayesian probabilistic matrix factorization using Markov Chain Monte Carlo
- Salakhutdinov, Mnih
- 2008
(Show Context)
Citation Context ...nd the other from GP(0, Ω), namely, Y becomes a low-rank random function. Low-rank matrix factorization is perhaps the most popular and the state-of-the-art method for collaborative filtering, e.g., (=-=Salakhutdinov & Mnih, 2008-=-a). Our model generalizes them in the sense that it models an infinite-dimensional relational function. A simplification of our work leads to a nonparametric prin-Large-scale Collaborative Prediction... |

83 | Multi-task Gaussian process prediction
- Bonilla, Chai, et al.
(Show Context)
Citation Context ...w mij = m(xi, zj) is a regression function based on task-specific predictors xi ∈ X . In order to directly model the dependency between tasks, a multi-task Gaussian process approach (Yu et al., 2007; =-=Bonilla et al., 2008-=-) may assume m ∼ GP(0, Ω ⊗ Σ) where Σ(zj, zj ′; θΣ) ≻ 0 is a covariance function among inputs and Ω(xi, xi ′; θΩ) ≻ 0 a covariance function among tasks, which means Cov(mij, mi ′ j ′) = Ωii ′Σjj ′. Th... |

79 |
Matrix Variate Distributions
- Gupta, Nagar
- 1999
(Show Context)
Citation Context ...tribution Y ∼ MTP ( κ, 0, (Ω0 + τδ), (Σ0 + λδ) ) , where MTP defines a matrix-variate Student-t process. That is, any subset of matrix values Y ∈ R M×N follows a matrix-valued Student-t distribution (=-=Gupta & Naga, 1999-=-). The proof of the equivalence is sketched in the Appendix. 2.3. Discussion The model is nonparametric at two levels. First, the covariance functions Σ and Ω are nonparametric, because they both live... |

57 |
Some matrix-variate distribution theory: Notational considerations and a bayesian application
- Dawid
- 1981
(Show Context)
Citation Context ...variance function based on input data. The inverse-Wishart process (IWP) is a nonparametric prior for random covariance functions, based on a nonstandard notation of the inverse-Wishart distribution (=-=Dawid, 1981-=-); a brief introduction is given in the Appendix. The deviation of Y from the population mean µ is decomposed into two parts, random effects m and f. Let Ω0(xi, xi ′) = 〈φ(xi), φ(xi ′)〉, where φ is an... |

50 | Semiparametric latent factor models
- Teh, Seeger, et al.
- 2005
(Show Context)
Citation Context ...class of related work is the so-called semiparametric models, where the observations Yij are linearly generated from a finite number of basis functions randomly sampled from a Gaussian process prior (=-=Teh et al., 2005-=-). Our approach is more related to a recent work (Zhu et al., 2009), where Yij is modeled by two sets of multiplicative factors, one sampled from GP(0, Σ) and the other from GP(0, Ω), namely, Y become... |

49 | Learning to learn with the informative vector machine
- Lawrence, Platt
- 2004
(Show Context)
Citation Context ...o about 10 hours of computing time. 4. Related Work There is a large body of research on multi-task learning using Gaussian processes, including those that learn the covariance Σ shared across tasks (=-=Lawrence & Platt, 2004-=-; Schwaighofer et al., 2005; Yu et al., 2005), and those that additionally consider the covariance Ω between tasks (Yu et al., 2007; Bonilla et al., 2008). The methods that only use Σ have been applie... |

43 |
Bilinear mixed-effects models for dyadic data
- Hoff
- 2005
(Show Context)
Citation Context ...wish to choose a form of f that is appropriate to explain the data dependence, for example, introducing latent variables ui and vj such that fij has a parametric form f(ui, vj; θf ), as suggested in (=-=Hoff, 2005-=-). A nonparametric random effects model may assume f ∼ GP(0, ∆ ⊗ Υ), where the covariances ∆ and Υ are parameters to be determined from data. While this model is natural and flexible, it is subject to... |

42 |
Variational Bayesian approach to movie rating prediction
- Lim, Teh
- 2007
(Show Context)
Citation Context ...atrix factorization (Salakhutdinov & Mnih, 2008b), and its Bayesian version (Salakhutdinov & Mnih, 2008a). • PMF-VB: probabilistic matrix factorization using a variational Bayes method for inference (=-=Lim & Teh, 2007-=-). Note that sometimes the running time was not found in the papers. For BPMF, we gave a rough estimation by assuming it went through 300 iterations (each took 220 minutes as reported in the paper). S... |

35 | Z.: Stochastic relational models for discriminative link prediction
- Yu, Chu, et al.
(Show Context)
Citation Context ...0, σ 2 ) where now mij = m(xi, zj) is a regression function based on task-specific predictors xi ∈ X . In order to directly model the dependency between tasks, a multi-task Gaussian process approach (=-=Yu et al., 2007-=-; Bonilla et al., 2008) may assume m ∼ GP(0, Ω ⊗ Σ) where Σ(zj, zj ′; θΣ) ≻ 0 is a covariance function among inputs and Ω(xi, xi ′; θΩ) ≻ 0 a covariance function among tasks, which means Cov(mij, mi ′... |

24 |
Methods for large scale SVD with missing values
- Kurucz, Benczúr, et al.
- 2007
(Show Context)
Citation Context ...h is the baseline provided by Netflix. Besides those introduced in Sec. 5.1, there are several other methods: • SVD: a method almost the same as FMMMF, using a gradient-based method for optimization (=-=Kurucz et al., 2007-=-). • RBM: Restricted Boltzmann Machine trained by contrast divergence (Salakhutdinov et al., 2007). • PMF and BPMF: probabilistic matrix factorization (Salakhutdinov & Mnih, 2008b), and its Bayesian v... |

21 | Fast nonparametric matrix factorization for large-scale collaborative filBIBLIOGRAPHY 153 tering
- Yu, Zhu, et al.
- 2009
(Show Context)
Citation Context ....4251 0.0004 Movie Mean 1.3866 0.0004 FMMMF 1.1552 0.0008 PPCA 1.1045 0.0004 BSRM-1 1.0902 0.0003 BSRM-2 1.0852 0.0003 NREM-1 1.0816 0.0003 NREM-2 1.0758 0.0003 cipal component analysis (NPCA) model (=-=Yu et al., 2009-=-). A non-probabilistic nonparametric method, i.e., maximum-margin matrix factorization, was introduced in (Srebro et al., 2005). Very few matrix factorization methods use known predictors. One such a ... |

9 |
Hierarchical Bayesian modelling with Gaussian processes
- Schwaighofer, Tresp, et al.
- 2005
(Show Context)
Citation Context ...uting time. 4. Related Work There is a large body of research on multi-task learning using Gaussian processes, including those that learn the covariance Σ shared across tasks (Lawrence & Platt, 2004; =-=Schwaighofer et al., 2005-=-; Yu et al., 2005), and those that additionally consider the covariance Ω between tasks (Yu et al., 2007; Bonilla et al., 2008). The methods that only use Σ have been applied to collaborative predicti... |

3 |
The bellkor solution to the netflix prize. Technical report, AT&T Labs – Research, 2007. (Cited on page 90.) Yoshua Bengio. Learning deep architectures for ai. Technical report, Université de Montréal, Dept. IRO, 2007. (Cited on pages 80 and 85.) Yoshua B
- Bell, Koren, et al.
- 1994
(Show Context)
Citation Context ...rs in the Netflix competition reported better results by combining heterogenous models. For example, the progress award winner in 2007 combined predictions from about one hundred of different models (=-=Bell et al., 2007-=-). However, our focus here is not on developing ensemble methods. 6. Conclusion In this paper a nonparametric model for multi-task learning is introduced. The contributions are twofold: First, the mod... |