Results 1  10
of
26
Prediction With Gaussian Processes: From Linear Regression To Linear Prediction And Beyond
 Learning and Inference in Graphical Models
, 1997
"... The main aim of this paper is to provide a tutorial on regression with Gaussian processes. We start from Bayesian linear regression, and show how by a change of viewpoint one can see this method as a Gaussian process predictor based on priors over functions, rather than on priors over parameters. Th ..."
Abstract

Cited by 203 (4 self)
 Add to MetaCart
The main aim of this paper is to provide a tutorial on regression with Gaussian processes. We start from Bayesian linear regression, and show how by a change of viewpoint one can see this method as a Gaussian process predictor based on priors over functions, rather than on priors over parameters. This leads in to a more general discussion of Gaussian processes in section 4. Section 5 deals with further issues, including hierarchical modelling and the setting of the parameters that control the Gaussian process, the covariance functions for neural network models and the use of Gaussian processes in classification problems. PREDICTION WITH GAUSSIAN PROCESSES: FROM LINEAR REGRESSION TO LINEAR PREDICTION AND BEYOND 2 1 Introduction In the last decade neural networks have been used to tackle regression and classification problems, with some notable successes. It has also been widely recognized that they form a part of a wide variety of nonlinear statistical techniques that can be used for...
Evaluation Of Gaussian Processes And Other Methods For NonLinear Regression
, 1996
"... This thesis develops two Bayesian learning methods relying on Gaussian processes and a rigorous statistical approach for evaluating such methods. In these experimental designs the sources of uncertainty in the estimated generalisation performances due to both variation in training and test sets are ..."
Abstract

Cited by 147 (16 self)
 Add to MetaCart
This thesis develops two Bayesian learning methods relying on Gaussian processes and a rigorous statistical approach for evaluating such methods. In these experimental designs the sources of uncertainty in the estimated generalisation performances due to both variation in training and test sets are accounted for. The framework allows for estimation of generalisation performance as well as statistical tests of significance for pairwise comparisons. Two experimental designs are recommended and supported by the DELVE software environment. Two new nonparametric Bayesian learning methods relying on Gaussian process priors over functions are developed. These priors are controlled by hyperparameters which set the characteristic length scale for each input dimension. In the simplest method, these parameters are fit from the data using optimization. In the second, fully Bayesian method, a Markov chain Monte Carlo technique is used to integrate over the hyperparameters. One advantage of these G...
A Bayesian Committee Machine
 NEURAL COMPUTATION
, 2000
"... The Bayesian committee machine (BCM) is a novel approach to combining estimators which were trained on different data sets. Although the BCM can be applied to the combination of any kind of estimators the main foci are Gaussian process regression and related systems such as regularization networks a ..."
Abstract

Cited by 78 (8 self)
 Add to MetaCart
The Bayesian committee machine (BCM) is a novel approach to combining estimators which were trained on different data sets. Although the BCM can be applied to the combination of any kind of estimators the main foci are Gaussian process regression and related systems such as regularization networks and smoothing splines for which the degrees of freedom increase with the number of training data. Somewhat surprisingly, we nd that the performance of the BCM improves if several test points are queried at the same time and is optimal if the number of test points is at least as large as the degrees of freedom of the estimator. The BCM also provides a new solution for online learning with potential applications to data mining. We apply the BCM to systems with fixed basis functions and discuss its relationship to Gaussian process regression. Finally, we also show how the ideas behind the BCM can be applied in a nonBayesian setting to extend the input dependent combination of estimators.
Comparison of Approximate Methods for Handling Hyperparameters
 NEURAL COMPUTATION
"... I examine two approximate methods for computational implementation of Bayesian hierarchical models, that is, models which include unknown hyperparameters such as regularization constants and noise levels. In the 'evidence framework' the model parameters are integrated over, and the resu ..."
Abstract

Cited by 73 (1 self)
 Add to MetaCart
I examine two approximate methods for computational implementation of Bayesian hierarchical models, that is, models which include unknown hyperparameters such as regularization constants and noise levels. In the 'evidence framework' the model parameters are integrated over, and the resulting evidence is maximized over the hyperparameters. The optimized
Gaussian Processes  A Replacement for Supervised Neural Networks?
"... These lecture notes are based on the work of Neal (1996), Williams and ..."
Abstract

Cited by 54 (0 self)
 Add to MetaCart
These lecture notes are based on the work of Neal (1996), Williams and
Variational Gaussian Process Classifiers
 IEEE Transactions on Neural Networks
, 1997
"... Gaussian processes are a promising nonlinear interpolation tool (Williams 1995; Williams and Rasmussen 1996), but it is not straightforward to solve classification problems with them. In this paper the variational methods of Jaakkola and Jordan (1996) are applied to Gaussian processes to produce an ..."
Abstract

Cited by 30 (0 self)
 Add to MetaCart
Gaussian processes are a promising nonlinear interpolation tool (Williams 1995; Williams and Rasmussen 1996), but it is not straightforward to solve classification problems with them. In this paper the variational methods of Jaakkola and Jordan (1996) are applied to Gaussian processes to produce an efficient Bayesian binary classifier. 1 Introduction Assume that we have some data D which consists of inputs fx n g N n=1 in some space, real or discrete, and corresponding targets t n which are binary categorical variables. We shall model this data using a Bayesian conditional classifier which predicts t conditional on x. We assume the existence of a function a(x) which models the `logit' log P (t=1jx) P (t=0jx) as a function of x. Thus P (t = 1jx; a(x)) = 1 1 + exp(\Gammaa(x)) (1) To complete the model we place a prior distribution over the unknown function a(x). There are two approaches to this. In the standard parametric approach, a(x) is a parameterized function a(x; w) where the...
Hyperparameters: optimize, or integrate out?
 IN MAXIMUM ENTROPY AND BAYESIAN METHODS, SANTA BARBARA
, 1996
"... I examine two approximate methods for computational implementation of Bayesian hierarchical models, that is, models which include unknown hyperparameters such as regularization constants. In the `evidence framework' the model parameters are integrated over, and the resulting evidence is maximi ..."
Abstract

Cited by 18 (4 self)
 Add to MetaCart
I examine two approximate methods for computational implementation of Bayesian hierarchical models, that is, models which include unknown hyperparameters such as regularization constants. In the `evidence framework' the model parameters are integrated over, and the resulting evidence is maximized over the hyperparameters. The optimized hyperparameters are used to define a Gaussian approximation to the posterior distribution. In the alternative `MAP' method, the true posterior probability is found by integrating over the hyperparameters. The true posterior is then maximized over the model parameters, and a Gaussian approximation is made. The similarities of the two approaches, and their relative merits, are discussed, and comparisons are made with the ideal hierarchical Bayesian solution. In moderately illposed problems, integration over hyperparameters yields a probability distribution with a skew peak which causes significant biases to arise in the MAP method. In contrast, the evidence framework is shown to introduce negligible predictive error, under straightforward conditions. General lessons are drawn concerning the distinctive properties of inference in many dimensions.
Approximation Methods for Gaussian Process Regression
, 2007
"... A wealth of computationally efficient approximation methods for Gaussian process regression have been recently proposed. We give a unifying overview of sparse approximations, following QuiñoneroCandela and Rasmussen (2005), and a brief review of approximate matrixvector multiplication methods. 1 ..."
Abstract

Cited by 18 (3 self)
 Add to MetaCart
A wealth of computationally efficient approximation methods for Gaussian process regression have been recently proposed. We give a unifying overview of sparse approximations, following QuiñoneroCandela and Rasmussen (2005), and a brief review of approximate matrixvector multiplication methods. 1
Bayesian Methods for Neural Networks: Theory and Applications
, 1995
"... this document. Before these are discussed however, perhaps we should have a tutorial on Bayesian probability theory and its application to model comparison problems. 2 Probability theory and Occam's razor ..."
Abstract

Cited by 13 (0 self)
 Add to MetaCart
this document. Before these are discussed however, perhaps we should have a tutorial on Bayesian probability theory and its application to model comparison problems. 2 Probability theory and Occam's razor
Bayesian NonLinear Modelling with Neural Networks
, 1995
"... this paper is illustrated in figure 6e. If we give a probabilistic interpretation to the model, then we can evaluate the `evidence' for alternative values of the control parameters. Overcomplex models turn out to be less probable, and the quantity ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
this paper is illustrated in figure 6e. If we give a probabilistic interpretation to the model, then we can evaluate the `evidence' for alternative values of the control parameters. Overcomplex models turn out to be less probable, and the quantity