## Assessing approximate inference for binary Gaussian process classification (2005)

### Cached

### Download Links

- [www.jmlr.org]
- [www.kyb.tuebingen.mpg.de]
- [www.kyb.mpg.de]
- [jmlr.csail.mit.edu]
- [jmlr.org]
- DBLP

### Other Repositories/Bibliography

Venue: | Journal of Machine Learning Research |

Citations: | 40 - 3 self |

### BibTeX

@ARTICLE{Kuss05assessingapproximate,

author = {Malte Kuss and Carl Edward Rasmussen and Ralf Herbrich},

title = {Assessing approximate inference for binary Gaussian process classification},

journal = {Journal of Machine Learning Research},

year = {2005},

volume = {6},

pages = {1679--1704}

}

### Years of Citing Articles

### OpenURL

### Abstract

Gaussian process priors can be used to define flexible, probabilistic classification models. Unfortunately exact Bayesian inference is analytically intractable and various approximation techniques have been proposed. In this work we review and compare Laplace’s method and Expectation Propagation for approximate Bayesian inference in the binary Gaussian process classification model. We present a comprehensive comparison of the approximations, their predictive performance and marginal likelihood estimates to results obtained by MCMC sampling. We explain theoretically and corroborate empirically the advantages of Expectation Propagation compared to Laplace’s method. Keywords: Gaussian process priors, probabilistic classification, Laplace’s approximation, expectation propagation, marginal likelihood, evidence, MCMC

### Citations

3436 | LIBSVM: A Library for Support Vector Machines, 2001. Software available at www.csie.ntu.edu.tw/˜cjlin/libsvm - Chang, Lin |

2028 | Learning with Kernels
- Scholkopf, Smola
- 2002
(Show Context)
Citation Context ...h symmetric costs (thresholding the predictive uncertainty at 1/2). In order to have a better absolute impression of the predictive performance we report the results of support vector machines (SVM) (=-=Schölkopf and Smola, 2002-=-). We use the LIBSVM implementation of C-SVM by Chang and Lin (2001) with a radial basis function kernel which is equivalent to the covariance function (33) up to the signal variance parameter. The va... |

1958 | Matrix computations - Golub, Loan - 1996 |

1157 |
Information Theory, Inference, and Learning Algorithms
- MacKay
- 2003
(Show Context)
Citation Context ...dditionally, high dimensional Gaussian distributions exhibit the property that most probability mass is contained in a thin ellipsoidal shell—depending on the covariance structure—away from the mean (=-=MacKay, 2003-=-, ch. 29.2). Intuitively this occurs since in high dimensions the volume grows extremely rapidly with the radius. As an effect the mode becomes less representative (typical) for the prior distribution... |

983 | Bayes factors
- Kass, Raftery
- 1995
(Show Context)
Citation Context ...e implemented by selecting θ maximising the marginal likelihood (evidence): Z p(D|θ) = p(y|f) p(f|X,θ)df (7) which can be understood as a measure of the agreement between the model and observed data (=-=Kass and Raftery, 1995-=-; MacKay, 1999). This approach is called maximum likelihood II (MLII) type hyper-parameter estimation and motivates the need for computing the marginal likelihood. Laplace’s method as well as Expectat... |

651 | UCI repository of machine learning databases. URL http://www.ics.uci.edu/∼mlearn/ MLRepository.html - Newman, Hettich, et al. - 1998 |

562 | inference using markov chain monte carlo methods - Neal - 1993 |

399 | Monte Carlo Strategies in Scientific Computing - Liu - 2001 |

299 | Choosing multiple parameters for support vector machines - Chapelle, Vapnik, et al. - 2002 |

265 | A family of algorithms for approximate Bayesian inference - Minka - 2001 |

147 | Annealed importance sampling - Neal - 2001 |

145 | Simulating normalizing constants: From importance sampling to bridge sampling to path sampling
- Gelman, Meng
- 1997
(Show Context)
Citation Context ...ht forward and covered in Section 6.1. Good MCMC estimates of the marginal likelihood are, however, notoriously difficult to obtain, being equivalent to the free-energy estimation problem in physics (=-=Gelman and Meng, 1998-=-). In Section 6.2 we explain the use of Annealed Importance Sampling (AIS), which can be seen as a sophisticated elaboration of Thermodynamic Integration, for this task. 6.1 Hybrid MCMC Sampling Hybri... |

123 | Fast sparse Gaussian process methods: The informative vector machine
- Lawrence, Seeger, et al.
(Show Context)
Citation Context ...ralisation bounds, online learning schemes and sparse approximations (e.g. Neal, 1998; Williams and Barber, 1998; Gibbs and MacKay, 2000; Opper and Winther, 2000; Csató and Opper, 2002; Seeger, 2002; =-=Lawrence et al., 2003-=-). Despite the abundance of recent work on probabilistic GP classifiers, most experimental studies provide only anecdotal evidence, and no clear picture has yet emerged, as to when and why which algor... |

120 | Sparse on-line Gaussian processes - Csató, Opper - 2002 |

109 | Probabilities for SV machines - Platt - 2000 |

71 | Gaussian processes for classification: Mean field algorithms - Opper, Winther - 2000 |

69 | Gaussian processes for ordinal regression - Chu, Ghahramani - 2005 |

68 | Comparison of Approximate Methods for Handling Hyperparameters
- MacKay
- 1999
(Show Context)
Citation Context ...ng θ maximising the marginal likelihood (evidence): Z p(D|θ) = p(y|f) p(f|X,θ)df (7) which can be understood as a measure of the agreement between the model and observed data (Kass and Raftery, 1995; =-=MacKay, 1999-=-). This approach is called maximum likelihood II (MLII) type hyper-parameter estimation and motivates the need for computing the marginal likelihood. Laplace’s method as well as Expectation Propagatio... |

64 | A review of Gaussian random fields and correlation functions,” Norwegian Computing - Abrahamsen - 1997 |

53 | Bayesian Gaussian Process Models: PAC-Bayesian Generalization Error Bounds and Sparse Approximations - Seeger - 2003 |

40 | Curve Fitting and Optimal Design for Prediction - O’Hagan, Kingman - 1978 |

28 | Variational Gaussian process classifiers - Gibbs, MacKay |

8 | Pac-bayesian generalisation error bounds for gaussian process classification - Seeger - 2003 |

6 | Expectation propagation for exponential families, 2005. Available from http://www.cs.berkeley.edu/˜mseeger/papers/epexpfam.ps.gz - Seeger |

1 | Pattern Recognition and Neural Newtorks - Ripley - 1996 |