Results 1  10
of
16
Sparse Bayesian Learning and the Relevance Vector Machine
, 2001
"... This paper introduces a general Bayesian framework for obtaining sparse solutions to regression and classication tasks utilising models linear in the parameters. Although this framework is fully general, we illustrate our approach with a particular specialisation that we denote the `relevance vec ..."
Abstract

Cited by 584 (5 self)
 Add to MetaCart
This paper introduces a general Bayesian framework for obtaining sparse solutions to regression and classication tasks utilising models linear in the parameters. Although this framework is fully general, we illustrate our approach with a particular specialisation that we denote the `relevance vector machine' (RVM), a model of identical functional form to the popular and stateoftheart `support vector machine' (SVM). We demonstrate that by exploiting a probabilistic Bayesian learning framework, we can derive accurate prediction models which typically utilise dramatically fewer basis functions than a comparable SVM while oering a number of additional advantages. These include the benets of probabilistic predictions, automatic estimation of `nuisance' parameters, and the facility to utilise arbitrary basis functions (e.g. non`Mercer' kernels).
Using the Nyström Method to Speed Up Kernel Machines
 Advances in Neural Information Processing Systems 13
, 2001
"... A major problem for kernelbased predictors (such as Support Vector Machines and Gaussian processes) is that the amount of computation required to find the solution scales as O(n ), where n is the number of training examples. We show that an approximation to the eigendecomposition of the Gram matrix ..."
Abstract

Cited by 293 (6 self)
 Add to MetaCart
A major problem for kernelbased predictors (such as Support Vector Machines and Gaussian processes) is that the amount of computation required to find the solution scales as O(n ), where n is the number of training examples. We show that an approximation to the eigendecomposition of the Gram matrix can be computed by the Nyström method (which is used for the numerical solution of eigenproblems). This is achieved by carrying out an eigendecomposition on a smaller system of size m < n, and then expanding the results back up to n dimensions. The computational complexity of a predictor using this approximation is O(m n). We report experiments on the USPS and abalone data sets and show that we can set m n without any significant decrease in the accuracy of the solution.
The Relevance Vector Machine
, 2000
"... The support vector machine (SVM) is a stateoftheart technique for regression and classification, combining excellent generalisation properties with a sparse kernel representation. However, it does suffer from a number of disadvantages, notably the absence of probabilistic outputs, the requirement ..."
Abstract

Cited by 219 (6 self)
 Add to MetaCart
The support vector machine (SVM) is a stateoftheart technique for regression and classification, combining excellent generalisation properties with a sparse kernel representation. However, it does suffer from a number of disadvantages, notably the absence of probabilistic outputs, the requirement to estimate a tradeoff parameter and the need to utilise `Mercer' kernel functions. In this paper we introduce the Relevance Vector Machine (RVM), a Bayesian treatment of a generalised linear model of identical functional form to the SVM. The RVM suffers from none of the above disadvantages, and examples demonstrate that for comparable generalisation performance, the RVM requires dramatically fewer kernel functions.
Gaussian processes for machine learning
 International Journal of Neural Systems
, 2004
"... Gaussian processes (GPs) are natural generalisations of multivariate Gaussian random variables to infinite (countably or continuous) index sets. GPs have been applied in a large number of fields to a diverse range of ends, and very many deep theoretical analyses of various properties are available. ..."
Abstract

Cited by 68 (15 self)
 Add to MetaCart
Gaussian processes (GPs) are natural generalisations of multivariate Gaussian random variables to infinite (countably or continuous) index sets. GPs have been applied in a large number of fields to a diverse range of ends, and very many deep theoretical analyses of various properties are available. This paper gives an introduction to Gaussian processes on a fairly elementary level with special emphasis on characteristics relevant in machine learning. It draws explicit connections to branches such as spline smoothing models and support vector machines in which similar ideas have been investigated. Gaussian process models are routinely used to solve hard machine learning problems. They are attractive because of their flexible nonparametric nature and computational simplicity. Treated within a Bayesian framework, very powerful statistical methods can be implemented which offer valid estimates of uncertainties in our predictions and generic model selection procedures cast as nonlinear optimization problems. Their main drawback of heavy computational scaling has recently been alleviated by the introduction of generic sparse approximations [13, 78, 31]. The mathematical literature on GPs is large and often uses deep
On a Connection between Kernel PCA and Metric Multidimensional Scaling
 Advances in Neural Information Processing Systems 13
, 2001
"... In this paper we show that the kernel PCA algorithm of Schölkopf et al. (1998) can be interpreted as a form of metric multidimensional scaling (MDS) when the kernel function k(x; y) is isotropic, i.e. it depends only on jjx yjj. This leads to a metric MDS algorithm where the desired configuration of ..."
Abstract

Cited by 57 (0 self)
 Add to MetaCart
In this paper we show that the kernel PCA algorithm of Schölkopf et al. (1998) can be interpreted as a form of metric multidimensional scaling (MDS) when the kernel function k(x; y) is isotropic, i.e. it depends only on jjx yjj. This leads to a metric MDS algorithm where the desired configuration of points is found via the solution of an eigenproblem rather than through the iterative optimization of the stress objective function. The question of kernel choice is also discussed.
Bayesian model selection for Support Vector machines, Gaussian processes and other kernel classifiers
"... We present a variational Bayesian method for model selection over families of kernels classifiers like Support Vector machines or Gaussian processes. The algorithm needs no user interaction and is able to adapt a large number of kernel parameters to given data without having to sacrifice training ca ..."
Abstract

Cited by 48 (6 self)
 Add to MetaCart
We present a variational Bayesian method for model selection over families of kernels classifiers like Support Vector machines or Gaussian processes. The algorithm needs no user interaction and is able to adapt a large number of kernel parameters to given data without having to sacrifice training cases for validation. This opens the possibility to use sophisticated families of kernels in situations where the small "standard kernel" classes are clearly inappropriate. We relate the method to other work done on Gaussian processes and clarify the relation between Support Vector machines and certain Gaussian process models.
A Fast Dual Algorithm for Kernel Logistic Regression
, 2002
"... This paper gives a new iterative algorithm for kernel logistic regression. It is based ..."
Abstract

Cited by 35 (0 self)
 Add to MetaCart
This paper gives a new iterative algorithm for kernel logistic regression. It is based
Covariance Kernels from Bayesian Generative Models
 Advances in Neural Information Processing Systems 14
, 2000
"... We propose the framework of mutual information kernels for learning covariance kernels, as used in Support Vector machines and Gaussian process classifiers, from unlabeled task data using Bayesian techniques. We describe an implementation of this framework which uses variational Bayesian mixtures of ..."
Abstract

Cited by 34 (3 self)
 Add to MetaCart
We propose the framework of mutual information kernels for learning covariance kernels, as used in Support Vector machines and Gaussian process classifiers, from unlabeled task data using Bayesian techniques. We describe an implementation of this framework which uses variational Bayesian mixtures of factor analyzers in order to attack classification problems in highdimensional spaces where labeled data is sparse, but unlabeled data is abundant.
SemiSupervised Learning: From Gaussian Fields to Gaussian Processes
 School of CS, CMU
, 2003
"... We show that the Gaussian random fields and harmonic energy minimizing function framework for semisupervised learning can be viewed in terms of Gaussian processes, with covariance matrices derived from the graph Laplacian. We derive hyperparameter learning with evidence maximization, and give an em ..."
Abstract

Cited by 30 (1 self)
 Add to MetaCart
We show that the Gaussian random fields and harmonic energy minimizing function framework for semisupervised learning can be viewed in terms of Gaussian processes, with covariance matrices derived from the graph Laplacian. We derive hyperparameter learning with evidence maximization, and give an empirical study of various ways to parameterize the graph weights.
Observations on the Nyström Method for Gaussian Processes
, 2002
"... A number of methods for speeding up Gaussian Process (GP) prediction have been proposed, including the Nyström method of Williams and Seeger (2001). In this paper we focus on two issues (1) the relationship of the Nyström method to the Subset of Regressors method (Poggio and Girosi, 1990; Luo and Wa ..."
Abstract

Cited by 18 (1 self)
 Add to MetaCart
A number of methods for speeding up Gaussian Process (GP) prediction have been proposed, including the Nyström method of Williams and Seeger (2001). In this paper we focus on two issues (1) the relationship of the Nyström method to the Subset of Regressors method (Poggio and Girosi, 1990; Luo and Wahba, 1997) and (2) understanding in what circumstances the Nyström approximation would be expected to provide a good approximation to exact GP regression.