Results 1 - 10
of
131
Sparse Gaussian processes using pseudo-inputs
- Advances in Neural Information Processing Systems 18
, 2006
"... We present a new Gaussian process (GP) regression model whose covariance is parameterized by the the locations of M pseudo-input points, which we learn by a gradient based optimization. We take M ≪ N, where N is the number of real data points, and hence obtain a sparse regression method which has O( ..."
Abstract
-
Cited by 229 (13 self)
- Add to MetaCart
(Show Context)
We present a new Gaussian process (GP) regression model whose covariance is parameterized by the the locations of M pseudo-input points, which we learn by a gradient based optimization. We take M ≪ N, where N is the number of real data points, and hence obtain a sparse regression method which has O(M 2 N) training cost and O(M 2) prediction cost per test case. We also find hyperparameters of the covariance function in the same joint optimization. The method can be viewed as a Bayesian regression model with particular input dependent noise. The method turns out to be closely related to several other sparse GP approaches, and we discuss the relation in detail. We finally demonstrate its performance on some large data sets, and make a direct comparison to other sparse GP methods. We show that our method can match full GP performance with small M, i.e. very sparse solutions, and it significantly outperforms other approaches in this regime. 1
Fast Sparse Gaussian Process Methods: The Informative Vector Machine
- Advances in Neural Information Processing Systems 15
, 2003
"... We present a framework for sparse Gaussian process (GP) methods which uses forward selection with criteria based on informationtheoretic principles, previously suggested for active learning. Our goal is not only to learn d--sparse predictors (which can be evaluated in O(d) rather than O(n), d ..."
Abstract
-
Cited by 173 (30 self)
- Add to MetaCart
(Show Context)
We present a framework for sparse Gaussian process (GP) methods which uses forward selection with criteria based on informationtheoretic principles, previously suggested for active learning. Our goal is not only to learn d--sparse predictors (which can be evaluated in O(d) rather than O(n), d n, n the number of training points), but also to perform training under strong restrictions on time and memory requirements. The scaling of our method is at most O(n ), and in large real-world classification experiments we show that it can match prediction performance of the popular support vector machine (SVM), yet can be significantly faster in training. In contrast to the SVM, our approximation produces estimates of predictive probabilities (`error bars'), allows for Bayesian model selection and is less complex in implementation.
The Kernel Recursive Least Squares Algorithm
- IEEE Transactions on Signal Processing
, 2003
"... We present a non-linear kernel-based version of the Recursive Least Squares (RLS) algorithm. Our Kernel-RLS (KRLS) algorithm performs linear regression in the feature space induced by a Mercer kernel, and can therefore be used to recursively construct the minimum mean squared -error regressor. Spars ..."
Abstract
-
Cited by 141 (2 self)
- Add to MetaCart
(Show Context)
We present a non-linear kernel-based version of the Recursive Least Squares (RLS) algorithm. Our Kernel-RLS (KRLS) algorithm performs linear regression in the feature space induced by a Mercer kernel, and can therefore be used to recursively construct the minimum mean squared -error regressor. Sparsity of the solution is achieved by a sequential sparsification process that admits into the kernel representation a new input sample only if its feature space image cannot be suffciently well approximated by combining the images of previously admitted samples. This sparsification procedure is crucial to the operation of KRLS, as it allows it to operate on-line, and by effectively regularizing its solutions. A theoretical analysis of the sparsification method reveals its close affinity to kernel PCA, and a data-dependent loss bound is presented, quantifying the generalization performance of the KRLS algorithm. We demonstrate the performance and scaling properties of KRLS and compare it to a stateof -the-art Support Vector Regression algorithm, using both synthetic and real data. We additionally test KRLS on two signal processing problems in which the use of traditional least-squares methods is commonplace: Time series prediction and channel equalization.
Fast Forward Selection to Speed Up Sparse Gaussian Process Regression
- IN WORKSHOP ON AI AND STATISTICS 9
, 2003
"... We present a method for the sparse greedy approximation of Bayesian Gaussian process regression, featuring a novel heuristic for very fast forward selection. Our method is essentially as fast as an equivalent one which selects the "support" patterns at random, yet it can outperform random ..."
Abstract
-
Cited by 110 (7 self)
- Add to MetaCart
We present a method for the sparse greedy approximation of Bayesian Gaussian process regression, featuring a novel heuristic for very fast forward selection. Our method is essentially as fast as an equivalent one which selects the "support" patterns at random, yet it can outperform random selection on hard curve fitting tasks. More importantly, it leads to a sufficiently stable approximation of the log marginal likelihood of the training data, which can be optimised to adjust a large number of hyperparameters automatically.
Building Support Vector Machines with Reduced Classifier Complexity
- JOURNAL OF MACHINE LEARNING RESEARCH
, 2006
"... Support vector machines (SVMs), though accurate, are not preferred in applications requiring great classification speed, due to the number of support vectors being large. To overcome this problem we devise a primal method with the following properties: (1) it decouples the idea of basis functions ..."
Abstract
-
Cited by 95 (2 self)
- Add to MetaCart
Support vector machines (SVMs), though accurate, are not preferred in applications requiring great classification speed, due to the number of support vectors being large. To overcome this problem we devise a primal method with the following properties: (1) it decouples the idea of basis functions from the concept of support vectors; (2) it greedily finds a set of kernel basis functions of a specified maximum size (d max ) to approximate the SVM primal cost function well; (3) it is efficient and roughly scales as O(nd max ) where n is the number of training examples; and, (4) the number of basis functions it requires to achieve an accuracy close to the SVM accuracy is usually far less than the number of SVM support vectors.
Kernel matching pursuit
- Machine Learning
, 2002
"... Matching Pursuit algorithms learn a function that is a weighted sum of basis functions, by sequentially appending functions to an initially empty basis, to approximate a target function in the leastsquares sense. We show how matching pursuit can be extended to use non-squared error loss functions, a ..."
Abstract
-
Cited by 84 (0 self)
- Add to MetaCart
Matching Pursuit algorithms learn a function that is a weighted sum of basis functions, by sequentially appending functions to an initially empty basis, to approximate a target function in the leastsquares sense. We show how matching pursuit can be extended to use non-squared error loss functions, and how it can be used to build kernel-based solutions to machine-learning problems, while keeping control of the sparsity of the solution. We also derive MDL motivated generalization bounds for this type of algorithm, and compare them to related SVM (Support Vector Machine) bounds. Finally, links to boosting algorithms and RBF training procedures, as well as an extensive experimental comparison with SVMs for classification are given, showing comparable results with typically sparser models. 1
Bayes Meets Bellman: The Gaussian Process Approach to Temporal Difference Learning
- Proc. of the 20th International Conference on Machine Learning
, 2003
"... We present a novel Bayesian approach to the problem of value function estimation in continuous state spaces. We de ne a probabilistic generative model for the value function by imposing a Gaussian prior over value functions and assuming a Gaussian noise model. ..."
Abstract
-
Cited by 76 (8 self)
- Add to MetaCart
We present a novel Bayesian approach to the problem of value function estimation in continuous state spaces. We de ne a probabilistic generative model for the value function by imposing a Gaussian prior over value functions and assuming a Gaussian noise model.
Variational learning of inducing variables in sparse Gaussian processes
- In Artificial Intelligence and Statistics 12
, 2009
"... Sparse Gaussian process methods that use inducing variables require the selection of the inducing inputs and the kernel hyperparameters. We introduce a variational formulation for sparse approximations that jointly infers the inducing inputs and the kernel hyperparameters by maximizing a lower bound ..."
Abstract
-
Cited by 57 (6 self)
- Add to MetaCart
Sparse Gaussian process methods that use inducing variables require the selection of the inducing inputs and the kernel hyperparameters. We introduce a variational formulation for sparse approximations that jointly infers the inducing inputs and the kernel hyperparameters by maximizing a lower bound of the true log marginal likelihood. The key property of this formulation is that the inducing inputs are defined to be variational parameters which are selected by minimizing the Kullback-Leibler divergence between the variational distribution and the exact posterior distribution over the latent function values. We apply this technique to regression and we compare it with other approaches in the literature. 1