Results 1  10
of
183
Gaussian processes for machine learning
, 2003
"... We give a basic introduction to Gaussian Process regression models. We focus on understanding the role of the stochastic process and how it is used to define a distribution over functions. We present the simple equations for incorporating training data and examine how to learn the hyperparameters us ..."
Abstract

Cited by 720 (2 self)
 Add to MetaCart
(Show Context)
We give a basic introduction to Gaussian Process regression models. We focus on understanding the role of the stochastic process and how it is used to define a distribution over functions. We present the simple equations for incorporating training data and examine how to learn the hyperparameters using the marginal likelihood. We explain the practical advantages of Gaussian Process and end with conclusions and a look at the current trends in GP work.
Gaussian process latent variable models for visualisation of high dimensional data
 Adv. in Neural Inf. Proc. Sys
, 2004
"... We introduce a variational inference framework for training the Gaussian process latent variable model and thus performing Bayesian nonlinear dimensionality reduction. This method allows us to variationally integrate out the input variables of the Gaussian process and compute a lower bound on the ex ..."
Abstract

Cited by 230 (13 self)
 Add to MetaCart
We introduce a variational inference framework for training the Gaussian process latent variable model and thus performing Bayesian nonlinear dimensionality reduction. This method allows us to variationally integrate out the input variables of the Gaussian process and compute a lower bound on the exact marginal likelihood of the nonlinear latent variable model. The maximization of the variational lower bound provides a Bayesian training procedure that is robust to overfitting and can automatically select the dimensionality of the nonlinear latent space. We demonstrate our method on real world datasets. The focus in this paper is on dimensionality reduction problems, but the methodology is more general. For example, our algorithm is immediately applicable for training Gaussian process models in the presence of missing or uncertain inputs. 1
Sparse Gaussian processes using pseudoinputs
 Advances in Neural Information Processing Systems 18
, 2006
"... We present a new Gaussian process (GP) regression model whose covariance is parameterized by the the locations of M pseudoinput points, which we learn by a gradient based optimization. We take M ≪ N, where N is the number of real data points, and hence obtain a sparse regression method which has O( ..."
Abstract

Cited by 229 (13 self)
 Add to MetaCart
(Show Context)
We present a new Gaussian process (GP) regression model whose covariance is parameterized by the the locations of M pseudoinput points, which we learn by a gradient based optimization. We take M ≪ N, where N is the number of real data points, and hence obtain a sparse regression method which has O(M 2 N) training cost and O(M 2) prediction cost per test case. We also find hyperparameters of the covariance function in the same joint optimization. The method can be viewed as a Bayesian regression model with particular input dependent noise. The method turns out to be closely related to several other sparse GP approaches, and we discuss the relation in detail. We finally demonstrate its performance on some large data sets, and make a direct comparison to other sparse GP methods. We show that our method can match full GP performance with small M, i.e. very sparse solutions, and it significantly outperforms other approaches in this regime. 1
Sparse multinomial logistic regression: fast algorithms and generalization bounds
 IEEE Trans. on Pattern Analysis and Machine Intelligence
"... Abstract—Recently developed methods for learning sparse classifiers are among the stateoftheart in supervised learning. These methods learn classifiers that incorporate weighted sums of basis functions with sparsitypromoting priors encouraging the weight estimates to be either significantly larg ..."
Abstract

Cited by 190 (1 self)
 Add to MetaCart
(Show Context)
Abstract—Recently developed methods for learning sparse classifiers are among the stateoftheart in supervised learning. These methods learn classifiers that incorporate weighted sums of basis functions with sparsitypromoting priors encouraging the weight estimates to be either significantly large or exactly zero. From a learningtheoretic perspective, these methods control the capacity of the learned classifier by minimizing the number of basis functions used, resulting in better generalization. This paper presents three contributions related to learning sparse classifiers. First, we introduce a true multiclass formulation based on multinomial logistic regression. Second, by combining a bound optimization approach with a componentwise update procedure, we derive fast exact algorithms for learning sparse multiclass classifiers that scale favorably in both the number of training samples and the feature dimensionality, making them applicable even to large data sets in highdimensional feature spaces. To the best of our knowledge, these are the first algorithms to perform exact multinomial logistic regression with a sparsitypromoting prior. Third, we show how nontrivial generalization bounds can be derived for our classifier in the binary case. Experimental results on standard benchmark data sets attest to the accuracy, sparsity, and efficiency of the proposed methods.
Fast Sparse Gaussian Process Methods: The Informative Vector Machine
 Advances in Neural Information Processing Systems 15
, 2003
"... We present a framework for sparse Gaussian process (GP) methods which uses forward selection with criteria based on informationtheoretic principles, previously suggested for active learning. Our goal is not only to learn dsparse predictors (which can be evaluated in O(d) rather than O(n), d ..."
Abstract

Cited by 173 (30 self)
 Add to MetaCart
(Show Context)
We present a framework for sparse Gaussian process (GP) methods which uses forward selection with criteria based on informationtheoretic principles, previously suggested for active learning. Our goal is not only to learn dsparse predictors (which can be evaluated in O(d) rather than O(n), d n, n the number of training points), but also to perform training under strong restrictions on time and memory requirements. The scaling of our method is at most O(n ), and in large realworld classification experiments we show that it can match prediction performance of the popular support vector machine (SVM), yet can be significantly faster in training. In contrast to the SVM, our approximation produces estimates of predictive probabilities (`error bars'), allows for Bayesian model selection and is less complex in implementation.
The Kernel Recursive Least Squares Algorithm
 IEEE Transactions on Signal Processing
, 2003
"... We present a nonlinear kernelbased version of the Recursive Least Squares (RLS) algorithm. Our KernelRLS (KRLS) algorithm performs linear regression in the feature space induced by a Mercer kernel, and can therefore be used to recursively construct the minimum mean squared error regressor. Spars ..."
Abstract

Cited by 141 (2 self)
 Add to MetaCart
(Show Context)
We present a nonlinear kernelbased version of the Recursive Least Squares (RLS) algorithm. Our KernelRLS (KRLS) algorithm performs linear regression in the feature space induced by a Mercer kernel, and can therefore be used to recursively construct the minimum mean squared error regressor. Sparsity of the solution is achieved by a sequential sparsification process that admits into the kernel representation a new input sample only if its feature space image cannot be suffciently well approximated by combining the images of previously admitted samples. This sparsification procedure is crucial to the operation of KRLS, as it allows it to operate online, and by effectively regularizing its solutions. A theoretical analysis of the sparsification method reveals its close affinity to kernel PCA, and a datadependent loss bound is presented, quantifying the generalization performance of the KRLS algorithm. We demonstrate the performance and scaling properties of KRLS and compare it to a stateof theart Support Vector Regression algorithm, using both synthetic and real data. We additionally test KRLS on two signal processing problems in which the use of traditional leastsquares methods is commonplace: Time series prediction and channel equalization.
Fast Forward Selection to Speed Up Sparse Gaussian Process Regression
 IN WORKSHOP ON AI AND STATISTICS 9
, 2003
"... We present a method for the sparse greedy approximation of Bayesian Gaussian process regression, featuring a novel heuristic for very fast forward selection. Our method is essentially as fast as an equivalent one which selects the "support" patterns at random, yet it can outperform random ..."
Abstract

Cited by 110 (7 self)
 Add to MetaCart
We present a method for the sparse greedy approximation of Bayesian Gaussian process regression, featuring a novel heuristic for very fast forward selection. Our method is essentially as fast as an equivalent one which selects the "support" patterns at random, yet it can outperform random selection on hard curve fitting tasks. More importantly, it leads to a sufficiently stable approximation of the log marginal likelihood of the training data, which can be optimised to adjust a large number of hyperparameters automatically.
Gaussian processes for machine learning
 International Journal of Neural Systems
, 2004
"... Gaussian processes (GPs) are natural generalisations of multivariate Gaussian random variables to infinite (countably or continuous) index sets. GPs have been applied in a large number of fields to a diverse range of ends, and very many deep theoretical analyses of various properties are available. ..."
Abstract

Cited by 92 (14 self)
 Add to MetaCart
(Show Context)
Gaussian processes (GPs) are natural generalisations of multivariate Gaussian random variables to infinite (countably or continuous) index sets. GPs have been applied in a large number of fields to a diverse range of ends, and very many deep theoretical analyses of various properties are available. This paper gives an introduction to Gaussian processes on a fairly elementary level with special emphasis on characteristics relevant in machine learning. It draws explicit connections to branches such as spline smoothing models and support vector machines in which similar ideas have been investigated. Gaussian process models are routinely used to solve hard machine learning problems. They are attractive because of their flexible nonparametric nature and computational simplicity. Treated within a Bayesian framework, very powerful statistical methods can be implemented which offer valid estimates of uncertainties in our predictions and generic model selection procedures cast as nonlinear optimization problems. Their main drawback of heavy computational scaling has recently been alleviated by the introduction of generic sparse approximations [13, 78, 31]. The mathematical literature on GPs is large and often uses deep
Assessing approximate inference for binary Gaussian process classification
 Journal of Machine Learning Research
, 2005
"... Gaussian process priors can be used to define flexible, probabilistic classification models. Unfortunately exact Bayesian inference is analytically intractable and various approximation techniques have been proposed. In this work we review and compare Laplace’s method and Expectation Propagation for ..."
Abstract

Cited by 76 (3 self)
 Add to MetaCart
Gaussian process priors can be used to define flexible, probabilistic classification models. Unfortunately exact Bayesian inference is analytically intractable and various approximation techniques have been proposed. In this work we review and compare Laplace’s method and Expectation Propagation for approximate Bayesian inference in the binary Gaussian process classification model. We present a comprehensive comparison of the approximations, their predictive performance and marginal likelihood estimates to results obtained by MCMC sampling. We explain theoretically and corroborate empirically the advantages of Expectation Propagation compared to Laplace’s method. Keywords: Gaussian process priors, probabilistic classification, Laplace’s approximation, expectation propagation, marginal likelihood, evidence, MCMC
Preference learning with gaussian processes
 In Proc. ICML*2005
, 2005
"... In this paper, we propose a probabilistic kernel approach to preference learning based on Gaussian processes. A new likelihood function is proposed to capture the preference relations in the Bayesian framework. The generalized formulation is also applicable to tackle many multiclass problems. The ov ..."
Abstract

Cited by 65 (5 self)
 Add to MetaCart
(Show Context)
In this paper, we propose a probabilistic kernel approach to preference learning based on Gaussian processes. A new likelihood function is proposed to capture the preference relations in the Bayesian framework. The generalized formulation is also applicable to tackle many multiclass problems. The overall approach has the advantages of Bayesian methods for model selection and probabilistic prediction. Experimental results compared against the constraint classification approach on several benchmark datasets verify the usefulness of this algorithm. 1.