Results 1 - 10
of
16
Prediction With Gaussian Processes: From Linear Regression To Linear Prediction And Beyond
- Learning and Inference in Graphical Models
, 1997
"... The main aim of this paper is to provide a tutorial on regression with Gaussian processes. We start from Bayesian linear regression, and show how by a change of viewpoint one can see this method as a Gaussian process predictor based on priors over functions, rather than on priors over parameters. Th ..."
Abstract
-
Cited by 160 (4 self)
- Add to MetaCart
The main aim of this paper is to provide a tutorial on regression with Gaussian processes. We start from Bayesian linear regression, and show how by a change of viewpoint one can see this method as a Gaussian process predictor based on priors over functions, rather than on priors over parameters. This leads in to a more general discussion of Gaussian processes in section 4. Section 5 deals with further issues, including hierarchical modelling and the setting of the parameters that control the Gaussian process, the covariance functions for neural network models and the use of Gaussian processes in classification problems. PREDICTION WITH GAUSSIAN PROCESSES: FROM LINEAR REGRESSION TO LINEAR PREDICTION AND BEYOND 2 1 Introduction In the last decade neural networks have been used to tackle regression and classification problems, with some notable successes. It has also been widely recognized that they form a part of a wide variety of non-linear statistical techniques that can be used for...
Evaluation Of Gaussian Processes And Other Methods For Non-Linear Regression
, 1996
"... This thesis develops two Bayesian learning methods relying on Gaussian processes and a rigorous statistical approach for evaluating such methods. In these experimental designs the sources of uncertainty in the estimated generalisation performances due to both variation in training and test sets are ..."
Abstract
-
Cited by 119 (13 self)
- Add to MetaCart
This thesis develops two Bayesian learning methods relying on Gaussian processes and a rigorous statistical approach for evaluating such methods. In these experimental designs the sources of uncertainty in the estimated generalisation performances due to both variation in training and test sets are accounted for. The framework allows for estimation of generalisation performance as well as statistical tests of significance for pairwise comparisons. Two experimental designs are recommended and supported by the DELVE software environment. Two new non-parametric Bayesian learning methods relying on Gaussian process priors over functions are developed. These priors are controlled by hyperparameters which set the characteristic length scale for each input dimension. In the simplest method, these parameters are fit from the data using optimization. In the second, fully Bayesian method, a Markov chain Monte Carlo technique is used to integrate over the hyperparameters. One advantage of these G...
A Bayesian Committee Machine
- NEURAL COMPUTATION
, 2000
"... The Bayesian committee machine (BCM) is a novel approach to combining estimators which were trained on different data sets. Although the BCM can be applied to the combination of any kind of estimators the main foci are Gaussian process regression and related systems such as regularization networks a ..."
Abstract
-
Cited by 60 (7 self)
- Add to MetaCart
The Bayesian committee machine (BCM) is a novel approach to combining estimators which were trained on different data sets. Although the BCM can be applied to the combination of any kind of estimators the main foci are Gaussian process regression and related systems such as regularization networks and smoothing splines for which the degrees of freedom increase with the number of training data. Somewhat surprisingly, we nd that the performance of the BCM improves if several test points are queried at the same time and is optimal if the number of test points is at least as large as the degrees of freedom of the estimator. The BCM also provides a new solution for online learning with potential applications to data mining. We apply the BCM to systems with fixed basis functions and discuss its relationship to Gaussian process regression. Finally, we also show how the ideas behind the BCM can be applied in a non-Bayesian setting to extend the input dependent combination of estimators.
Comparison of Approximate Methods for Handling Hyperparameters
- NEURAL COMPUTATION
"... I examine two approximate methods for computational implementation of Bayesian hierarchical models, that is, models which include unknown hyperparameters such as regularization constants and noise levels. In the 'evidence framework' the model parameters are integrated over, and the resulting evid ..."
Abstract
-
Cited by 49 (1 self)
- Add to MetaCart
I examine two approximate methods for computational implementation of Bayesian hierarchical models, that is, models which include unknown hyperparameters such as regularization constants and noise levels. In the 'evidence framework' the model parameters are integrated over, and the resulting evidence is maximized over the hyperparameters. The optimized
Gaussian Processes -- A Replacement for Supervised Neural Networks?
"... These lecture notes are based on the work of Neal (1996), Williams and ..."
Abstract
-
Cited by 43 (0 self)
- Add to MetaCart
These lecture notes are based on the work of Neal (1996), Williams and
Variational Gaussian Process Classifiers
- IEEE Transactions on Neural Networks
, 1997
"... Gaussian processes are a promising non-linear interpolation tool (Williams 1995; Williams and Rasmussen 1996), but it is not straightforward to solve classification problems with them. In this paper the variational methods of Jaakkola and Jordan (1996) are applied to Gaussian processes to produce an ..."
Abstract
-
Cited by 24 (0 self)
- Add to MetaCart
Gaussian processes are a promising non-linear interpolation tool (Williams 1995; Williams and Rasmussen 1996), but it is not straightforward to solve classification problems with them. In this paper the variational methods of Jaakkola and Jordan (1996) are applied to Gaussian processes to produce an efficient Bayesian binary classifier. 1 Introduction Assume that we have some data D which consists of inputs fx n g N n=1 in some space, real or discrete, and corresponding targets t n which are binary categorical variables. We shall model this data using a Bayesian conditional classifier which predicts t conditional on x. We assume the existence of a function a(x) which models the `logit' log P (t=1jx) P (t=0jx) as a function of x. Thus P (t = 1jx; a(x)) = 1 1 + exp(\Gammaa(x)) (1) To complete the model we place a prior distribution over the unknown function a(x). There are two approaches to this. In the standard parametric approach, a(x) is a parameterized function a(x; w) where the...
Hyperparameters: optimize, or integrate out?
- IN MAXIMUM ENTROPY AND BAYESIAN METHODS, SANTA BARBARA
, 1996
"... I examine two approximate methods for computational implementation of Bayesian hierarchical models, that is, models which include unknown hyperparameters such as regularization constants. In the `evidence framework' the model parameters are integrated over, and the resulting evidence is maximized o ..."
Abstract
-
Cited by 16 (4 self)
- Add to MetaCart
I examine two approximate methods for computational implementation of Bayesian hierarchical models, that is, models which include unknown hyperparameters such as regularization constants. In the `evidence framework' the model parameters are integrated over, and the resulting evidence is maximized over the hyperparameters. The optimized hyperparameters are used to define a Gaussian approximation to the posterior distribution. In the alternative `MAP' method, the true posterior probability is found by integrating over the hyperparameters. The true posterior is then maximized over the model parameters, and a Gaussian approximation is made. The similarities of the two approaches, and their relative merits, are discussed, and comparisons are made with the ideal hierarchical Bayesian solution. In moderately ill-posed problems, integration over hyperparameters yields a probability distribution with a skew peak which causes significant biases to arise in the MAP method. In contrast, the evidence framework is shown to introduce negligible predictive error, under straightforward conditions. General lessons are drawn concerning the distinctive properties of inference in many dimensions.
Bayesian Methods for Neural Networks: Theory and Applications
, 1995
"... this document. Before these are discussed however, perhaps we should have a tutorial on Bayesian probability theory and its application to model comparison problems. 2 Probability theory and Occam's razor ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
this document. Before these are discussed however, perhaps we should have a tutorial on Bayesian probability theory and its application to model comparison problems. 2 Probability theory and Occam's razor
Approximation Methods for Gaussian Process Regression
, 2007
"... A wealth of computationally efficient approximation methods for Gaussian process regression have been recently proposed. We give a unifying overview of sparse approximations, following Quiñonero-Candela and Rasmussen (2005), and a brief review of approximate matrix-vector multiplication methods. 1 ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
A wealth of computationally efficient approximation methods for Gaussian process regression have been recently proposed. We give a unifying overview of sparse approximations, following Quiñonero-Candela and Rasmussen (2005), and a brief review of approximate matrix-vector multiplication methods. 1
Efficient Covariance Matrix Methods for Bayesian Gaussian Processes and Hopfield Neural Networks
, 1999
"... Covariance matrices are important in many areas of neural modelling. In Hopfield networks they are used to form the weight matrix which controls the autoassociative properties of the network. In Gaussian processes, which have been shown to be the infinite neuron limit of many regularised feedforward ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Covariance matrices are important in many areas of neural modelling. In Hopfield networks they are used to form the weight matrix which controls the autoassociative properties of the network. In Gaussian processes, which have been shown to be the infinite neuron limit of many regularised feedforward neural networks, covariance matrices control the form of Bayesian prior distribution over function space. This thesis examines interesting modifications to the standard covariance matrix methods to increase functionality or efficiency of these neural techniques. Firstly the problem of adapting Gaussian process priors to perform regression on switching regimes is tackled. This involves the use of block covariance matrices and Gibbs sampling methods. Then the use of Toeplitz methods is proposed for Gaussian process regression where sampling positions can be chosen. A comparison is made between Hopfield weight matrices, and sample covariances. This allows work on sample covariances to be used ...

