Results 1  10
of
93
Fast Sparse Gaussian Process Methods: The Informative Vector Machine
 Advances in Neural Information Processing Systems 15
, 2003
"... We present a framework for sparse Gaussian process (GP) methods which uses forward selection with criteria based on informationtheoretic principles, previously suggested for active learning. Our goal is not only to learn dsparse predictors (which can be evaluated in O(d) rather than O(n), d ..."
Abstract

Cited by 123 (27 self)
 Add to MetaCart
We present a framework for sparse Gaussian process (GP) methods which uses forward selection with criteria based on informationtheoretic principles, previously suggested for active learning. Our goal is not only to learn dsparse predictors (which can be evaluated in O(d) rather than O(n), d n, n the number of training points), but also to perform training under strong restrictions on time and memory requirements. The scaling of our method is at most O(n ), and in large realworld classification experiments we show that it can match prediction performance of the popular support vector machine (SVM), yet can be significantly faster in training. In contrast to the SVM, our approximation produces estimates of predictive probabilities (`error bars'), allows for Bayesian model selection and is less complex in implementation.
Sparse Gaussian processes using pseudoinputs
 Advances in Neural Information Processing Systems 18
, 2006
"... We present a new Gaussian process (GP) regression model whose covariance is parameterized by the the locations of M pseudoinput points, which we learn by a gradient based optimization. We take M ≪ N, where N is the number of real data points, and hence obtain a sparse regression method which has O( ..."
Abstract

Cited by 123 (8 self)
 Add to MetaCart
We present a new Gaussian process (GP) regression model whose covariance is parameterized by the the locations of M pseudoinput points, which we learn by a gradient based optimization. We take M ≪ N, where N is the number of real data points, and hence obtain a sparse regression method which has O(M 2 N) training cost and O(M 2) prediction cost per test case. We also find hyperparameters of the covariance function in the same joint optimization. The method can be viewed as a Bayesian regression model with particular input dependent noise. The method turns out to be closely related to several other sparse GP approaches, and we discuss the relation in detail. We finally demonstrate its performance on some large data sets, and make a direct comparison to other sparse GP methods. We show that our method can match full GP performance with small M, i.e. very sparse solutions, and it significantly outperforms other approaches in this regime. 1
Fast Forward Selection to Speed Up Sparse Gaussian Process Regression
 in Workshop on AI and Statistics 9
, 2003
"... We present a method for the sparse greedy approximation of Bayesian Gaussian process regression, featuring a novel heuristic for very fast forward selection. Our method is essentially as fast as an equivalent one which selects the "support" patterns at random, yet it can outperform random selection ..."
Abstract

Cited by 75 (4 self)
 Add to MetaCart
We present a method for the sparse greedy approximation of Bayesian Gaussian process regression, featuring a novel heuristic for very fast forward selection. Our method is essentially as fast as an equivalent one which selects the "support" patterns at random, yet it can outperform random selection on hard curve fitting tasks. More importantly, it leads to a suciently stable approximation of the log marginal likelihood of the training data, which can be optimised to adjust a large number of hyperparameters automatically.
Kernel matching pursuit
 Machine Learning
, 2002
"... Matching Pursuit algorithms learn a function that is a weighted sum of basis functions, by sequentially appending functions to an initially empty basis, to approximate a target function in the leastsquares sense. We show how matching pursuit can be extended to use nonsquared error loss functions, a ..."
Abstract

Cited by 62 (0 self)
 Add to MetaCart
Matching Pursuit algorithms learn a function that is a weighted sum of basis functions, by sequentially appending functions to an initially empty basis, to approximate a target function in the leastsquares sense. We show how matching pursuit can be extended to use nonsquared error loss functions, and how it can be used to build kernelbased solutions to machinelearning problems, while keeping control of the sparsity of the solution. We also derive MDL motivated generalization bounds for this type of algorithm, and compare them to related SVM (Support Vector Machine) bounds. Finally, links to boosting algorithms and RBF training procedures, as well as an extensive experimental comparison with SVMs for classification are given, showing comparable results with typically sparser models. 1
The Kernel Recursive Least Squares Algorithm
 IEEE Transactions on Signal Processing
, 2003
"... We present a nonlinear kernelbased version of the Recursive Least Squares (RLS) algorithm. Our KernelRLS (KRLS) algorithm performs linear regression in the feature space induced by a Mercer kernel, and can therefore be used to recursively construct the minimum mean squared error regressor. Spars ..."
Abstract

Cited by 62 (2 self)
 Add to MetaCart
We present a nonlinear kernelbased version of the Recursive Least Squares (RLS) algorithm. Our KernelRLS (KRLS) algorithm performs linear regression in the feature space induced by a Mercer kernel, and can therefore be used to recursively construct the minimum mean squared error regressor. Sparsity of the solution is achieved by a sequential sparsification process that admits into the kernel representation a new input sample only if its feature space image cannot be suffciently well approximated by combining the images of previously admitted samples. This sparsification procedure is crucial to the operation of KRLS, as it allows it to operate online, and by effectively regularizing its solutions. A theoretical analysis of the sparsification method reveals its close affinity to kernel PCA, and a datadependent loss bound is presented, quantifying the generalization performance of the KRLS algorithm. We demonstrate the performance and scaling properties of KRLS and compare it to a stateof theart Support Vector Regression algorithm, using both synthetic and real data. We additionally test KRLS on two signal processing problems in which the use of traditional leastsquares methods is commonplace: Time series prediction and channel equalization.
Building Support Vector Machines with Reduced Classifier Complexity
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2006
"... Support vector machines (SVMs), though accurate, are not preferred in applications requiring great classification speed, due to the number of support vectors being large. To overcome this problem we devise a primal method with the following properties: (1) it decouples the idea of basis functions ..."
Abstract

Cited by 58 (1 self)
 Add to MetaCart
Support vector machines (SVMs), though accurate, are not preferred in applications requiring great classification speed, due to the number of support vectors being large. To overcome this problem we devise a primal method with the following properties: (1) it decouples the idea of basis functions from the concept of support vectors; (2) it greedily finds a set of kernel basis functions of a specified maximum size (d max ) to approximate the SVM primal cost function well; (3) it is efficient and roughly scales as O(nd max ) where n is the number of training examples; and, (4) the number of basis functions it requires to achieve an accuracy close to the SVM accuracy is usually far less than the number of SVM support vectors.
Bayes Meets Bellman: The Gaussian Process Approach to Temporal Difference Learning
 Proc. of the 20th International Conference on Machine Learning
, 2003
"... We present a novel Bayesian approach to the problem of value function estimation in continuous state spaces. We de ne a probabilistic generative model for the value function by imposing a Gaussian prior over value functions and assuming a Gaussian noise model. ..."
Abstract

Cited by 55 (8 self)
 Add to MetaCart
We present a novel Bayesian approach to the problem of value function estimation in continuous state spaces. We de ne a probabilistic generative model for the value function by imposing a Gaussian prior over value functions and assuming a Gaussian noise model.
Efficient Kernel Machines Using the Improved Fast Gauss Transform
 Advances in Neural Information Processing Systems 17
, 2004
"... The computation required for kernel machines with N training samples is O(N ). Such computational complexity is significant even for moderate size problems and is prohibitive for large datasets. We present an approximation technique based on the improved fast Gauss transform to reduce the com ..."
Abstract

Cited by 41 (6 self)
 Add to MetaCart
The computation required for kernel machines with N training samples is O(N ). Such computational complexity is significant even for moderate size problems and is prohibitive for large datasets. We present an approximation technique based on the improved fast Gauss transform to reduce the computation to O(N). We also give an error bound for the approximation, and provide experimental results on the UCI datasets.
Algorithms and Representations for Reinforcement Learning
, 2005
"... “If we knew what it was we were doing, it would not be called research, would it?” ..."
Abstract

Cited by 37 (7 self)
 Add to MetaCart
“If we knew what it was we were doing, it would not be called research, would it?”
Gaussian Process Classification for Segmenting and Annotating Sequences
 In Proceedings of the International Conference on Machine Learning (ICML
, 2004
"... Many realworld classification tasks involve the prediction of multiple, interdependent class labels. A prototypical case of this sort deals with prediction of a sequence of labels for a sequence of observations. Such problems arise naturally in the context of annotating and segmenting observ ..."
Abstract

Cited by 35 (5 self)
 Add to MetaCart
Many realworld classification tasks involve the prediction of multiple, interdependent class labels. A prototypical case of this sort deals with prediction of a sequence of labels for a sequence of observations. Such problems arise naturally in the context of annotating and segmenting observation sequences.