## Prediction With Gaussian Processes: From Linear Regression To Linear Prediction And Beyond (1997)

Venue: | Learning and Inference in Graphical Models |

Citations: | 201 - 4 self |

### BibTeX

@INPROCEEDINGS{Williams97predictionwith,

author = {C. K. I. Williams},

title = {Prediction With Gaussian Processes: From Linear Regression To Linear Prediction And Beyond},

booktitle = {Learning and Inference in Graphical Models},

year = {1997},

pages = {599--621},

publisher = {Kluwer}

}

### Years of Citing Articles

### OpenURL

### Abstract

The main aim of this paper is to provide a tutorial on regression with Gaussian processes. We start from Bayesian linear regression, and show how by a change of viewpoint one can see this method as a Gaussian process predictor based on priors over functions, rather than on priors over parameters. This leads in to a more general discussion of Gaussian processes in section 4. Section 5 deals with further issues, including hierarchical modelling and the setting of the parameters that control the Gaussian process, the covariance functions for neural network models and the use of Gaussian processes in classification problems. PREDICTION WITH GAUSSIAN PROCESSES: FROM LINEAR REGRESSION TO LINEAR PREDICTION AND BEYOND 2 1 Introduction In the last decade neural networks have been used to tackle regression and classification problems, with some notable successes. It has also been widely recognized that they form a part of a wide variety of non-linear statistical techniques that can be used for...

### Citations

9735 | The Nature of Statistical Learning Theory
- Vapnik
- 1995
(Show Context)
Citation Context ...ficiently due to Mercer's theorem has been used in some other contexts, for example in the method of potential functions (due to Aizerman, Braverman and Rozoner, 1964) and in support vector machines (=-=Vapnik, 1995-=-). In support vector regression the prior over functions is as described above, but instead of using a squared error loss function (which corresponds to Gaussian noise), a modified version of the l 1 ... |

5246 | Neural Networks for Pattern Recognition - Bishop - 1995 |

1574 | Generalized Additive Models
- Hastie, Tibshirani
- 1990
(Show Context)
Citation Context ...f the prior is a general Gaussian process and we assume a Gaussian noise model, then the predicted y-value is just some linear combination of the t-values; the method is said to be a linear smoother (=-=Hastie and Tibshirani, 1990-=-) or a linear predictor. In section 3 we have seen how linear regression can be seen from a function-space viewpoint. This opens up further possibilities, as linear regression with a prior on the weig... |

1435 | Bayesian Data Analysis
- Gelman, Carlin, et al.
- 1995
(Show Context)
Citation Context ...ntegral in equation 33 is then approximated using samples from the Markov chain. Two standard methods for constructing MCMC methods are the Gibbs sampler and MetropolisHastings algorithms (see, e.g., =-=Gelman et al, 1995-=-). However, the conditional parameter distributions are not amenable to Gibbs sampling if the covariance function has the form given by equation 30, and the Metropolis-Hastings algorithm does not util... |

1381 | Spline Models for Observational Data - Wahba - 1990 |

1356 |
Statistics for Spatial Data
- Cressie
- 1993
(Show Context)
Citation Context ...sion are discussed in Whittle (1963). ARMA models for time series are Gaussian process models. Gaussian process prediction is also well known in the geostatistics field (Journel and Huijbregts, 1978; =-=Cressie, 1993) where it-=- is known as "kriging", although this literature naturally has focussed mostly on twoand three-dimensional input spaces. Wahba has been influential in promoting the use of spline 1 In fact B... |

1190 | Pattern Recognition and Neural Networks - Ripley - 1996 |

686 | Networks For Approximation and Learning - POGGIO, GIROSI - 1990 |

642 | Bayesian Learning for Neural Networks
- Neal
- 1996
(Show Context)
Citation Context ...eural network models this posterior cannot usually be obtained analytically; computational methods used include approximations (MacKay, 1992) or the evaluation of integrals using Monte Carlo methods (=-=Neal, 1996-=-). In the Bayesian approach to neural networks, a prior on the weights of a network induces a prior over functions. An alternative method of putting a prior over functions is to use a Gaussian process... |

514 | Bayesian Inference in Statistical Analysis - Box, Tiao - 1973 |

509 | Nonparametric Regression and Generalized Linear Models - Green, Silverman - 1994 |

426 | A practical Bayesian framework for backpropagation networks
- MacKay
- 1992
(Show Context)
Citation Context ...ters (rather than just a point estimate) will be induced. However, for neural network models this posterior cannot usually be obtained analytically; computational methods used include approximations (=-=MacKay, 1992-=-) or the evaluation of integrals using Monte Carlo methods (Neal, 1996). In the Bayesian approach to neural networks, a prior on the weights of a network induces a prior over functions. An alternative... |

390 |
Design and analysis of computer experiments
- Sacks, Welch, et al.
- 1989
(Show Context)
Citation Context ...aussian processes with a particular choice of covariance function 3 . Gaussian process prediction was also suggested by O'Hagan (1978), and is widely used in the analysis of computer experiments (e.g =-=Sacks et al, 1989-=-), although in this application it is assumed that the observations are noise-free. A connection to neural networks was made by Poggio and Girosi (1990) and Girosi, Jones and Poggio (1995) with their ... |

330 |
Mining Geostatistics
- Journel, Huijbregts
- 1985
(Show Context)
Citation Context ...cations to multivariate regression are discussed in Whittle (1963). ARMA models for time series are Gaussian process models. Gaussian process prediction is also well known in the geostatistics field (=-=Journel and Huijbregts, 1978; Cressie,-=- 1993) where it is known as "kriging", although this literature naturally has focussed mostly on twoand three-dimensional input spaces. Wahba has been influential in promoting the use of spl... |

297 | Theoretical foundations of the potential function method in pattern recognition learning - Aizerman, Braverman, et al. - 1964 |

241 |
Probabilistic interpretation of feed forward classification network outputs with relationships to statistical pattern recognition
- Bridle
- 1990
(Show Context)
Citation Context ...e inputs. An early reference to this approach is the work of Silverman (1978). For the classification problem with more than two classes, a simple extension of this idea using the "softmax" =-=function (Bridle, 1990-=-) gives the predicted probability for class k as (kjx) = exp y k (x) P m exp ym (x) : (38) For the rest of this section we shall concentrate on the two-class problem; extension of the methods to the m... |

231 | Gaussian Processes for Regression
- Williams, Rasmussen
- 1996
(Show Context)
Citation Context ...al-purpose regression method. Gaussian process priors have the advantage over neural networks that at least the lowest level of a Bayesian hierarchical model can be treated analytically. Recent work (=-=Williams and Rasmussen, 1996-=-, inspired by observations in Neal, 1996) has extended the use of these priors to higher dimensional problems that have been traditionally tackled with other techniques such as neural networks, decisi... |

194 | Some Aspects of the Spline Smoothing Approach to Non-Parametric Regression Curve Fitting - Silverman - 1985 |

144 | Evaluation of Gaussian Processes and Other Methods for Non-Linear Regression
- Rasmussen
- 1996
(Show Context)
Citation Context ...gression method. Gaussian process priors have the advantage over neural networks that at least the lowest level of a Bayesian hierarchical model can be treated analytically. Recent work (Williams and =-=Rasmussen, 1996-=-, inspired by observations in Neal, 1996) has extended the use of these priors to higher dimensional problems that have been traditionally tackled with other techniques such as neural networks, decisi... |

143 | A correspondence between Bayesian estimation on stochastic processes and smoothing by splines - Kimeldorf, Wahba - 1970 |

128 | Monte Carlo implementation of Gaussian process models for Bayesian regression and classi - Neal - 1997 |

87 | Maximum Likelihood Estimation of Models for Residual Covariance in Spatial Regression - Mardia, Marshall - 1984 |

76 | Efficient Implementation of Gaussian Processes. Unpublished manuscript - Gibbs, MacKay - 1997 |

68 | Prediction and Regulation by Linear Least-Square Methods, Second Revised Edition - Whittle - 1983 |

67 |
Automatic smoothing of regression functions in generalized linear models
- O’Sullivan, Yandell, et al.
- 1986
(Show Context)
Citation Context ...s(xs). For the analytic approximation methods, there is also the question of what to do about the parameterss`. Maximum likelihood and GCV approaches can again be used as in the regression case (e.g. =-=O'Sullivan et al, 1986-=-). Barber and Williams (1997) used an approximate Bayesian scheme based on the Hybrid Monte Carlo method whereby the marginal likelihood P (tj`) (which is not available analytically) is replaced by th... |

66 | A Bayesian analysis of kriging - Handcock, Stein - 1993 |

64 |
Bayesian methods for backpropagation networks
- MacKay
- 1994
(Show Context)
Citation Context ... For irrelevant inputs, the corresponding ff l will become small, and the model will ignore that input. This is closely related to the Automatic Relevance Determination (ARD) idea of MacKay and Neal (=-=MacKay, 1993-=-; Neal 1996). The v 0 variable gives the overall scale of the local correlations, a 0 and a 1 are variables controlling the scale of the bias and linear contributions to the covariance. A simple exten... |

54 | A stochastic estimator of the trace of the influence matrix for Laplacian smoothing splines - Hutchinson - 1989 |

49 | Some new results on neural network approximation - Hornik - 1993 |

41 | Regression with input-dependent noise: a gaussian process treatment
- Goldberg, Williams, et al.
- 1998
(Show Context)
Citation Context ...it is assumed that the noise process has a variance that depends on x, and that this noise-field N(x) is drawn from a prior generated from an independent Gaussian process Z(x) by N(x) = exp Z(x) (see =-=Goldberg et al, 1997-=-). 6 Discussion In this paper I have shown how to move from simple Bayesian linear regression to regression with Gaussian processes, and have discussed some of the issues in using Gaussian process pre... |

39 | Mathematical Theory of Probability and Statistics - Mises - 1997 |

38 | Curve fitting and optimal design for prediction (with Discussion - O’Hagan - 1978 |

35 | Gaussian Processes for Bayesian classification via hybrid Monte Carlo - Barber, Williams - 1997 |

35 | Neural computation with infinite neural networks - Williams |

26 | Bayesian numerical analysis - Skilling - 1993 |

23 | Computing With Infinite Networks - Williams - 1997 |

14 | Pseudosplines - Hastie - 1996 |

9 | Density ratios, empirical likelihood and cot death - Silverman - 1978 |

4 | Variational Gaussian Process Classifiers,” Draft manuscript, available via http://wol.ra.phy.cam.ac.uk/ mackay/homepage.html - Gibbs, MacKay - 1997 |

4 | Astochastic estimator of the trace of the in uence matrix for Laplacian smoothing splines - Hutchinson - 1989 |

3 | Gaussian processes for Bayesian classi cation via hybrid Monte - Barber, Williams - 1997 |

2 | Nonparametric estimation of nonstationary covariance structure - Sampson, Guttorp - 1992 |

2 | Computing with in nite networks, in - Williams - 1997 |

1 | A fast "Monte Carlo cross-validation" procedure for large least squares problems with noisy data - Girard - 1989 |

1 | E cient Implementation of Gaussian Processes. Draft manuscript, available from http://wol.ra.phy.cam.ac.uk/mackay/homepage.html - Gibbs, MacKay - 1997 |

1 | Variational Gaussian Process Classi ers. Draft manuscript, available via http://wol.ra.phy.cam.ac.uk/mackay/homepage.html - Gibbs, MacKay - 1997 |

1 | Computation with in nite neural networks - Williams - 1997 |