Results 1  10
of
105
Gaussian process latent variable models for visualisation of high dimensional data
 Adv. in Neural Inf. Proc. Sys
, 2004
"... We introduce a variational inference framework for training the Gaussian process latent variable model and thus performing Bayesian nonlinear dimensionality reduction. This method allows us to variationally integrate out the input variables of the Gaussian process and compute a lower bound on the ex ..."
Abstract

Cited by 223 (13 self)
 Add to MetaCart
(Show Context)
We introduce a variational inference framework for training the Gaussian process latent variable model and thus performing Bayesian nonlinear dimensionality reduction. This method allows us to variationally integrate out the input variables of the Gaussian process and compute a lower bound on the exact marginal likelihood of the nonlinear latent variable model. The maximization of the variational lower bound provides a Bayesian training procedure that is robust to overfitting and can automatically select the dimensionality of the nonlinear latent space. We demonstrate our method on real world datasets. The focus in this paper is on dimensionality reduction problems, but the methodology is more general. For example, our algorithm is immediately applicable for training Gaussian process models in the presence of missing or uncertain inputs. 1
Sparse Gaussian processes using pseudoinputs
 Advances in Neural Information Processing Systems 18
, 2006
"... We present a new Gaussian process (GP) regression model whose covariance is parameterized by the the locations of M pseudoinput points, which we learn by a gradient based optimization. We take M ≪ N, where N is the number of real data points, and hence obtain a sparse regression method which has O( ..."
Abstract

Cited by 218 (13 self)
 Add to MetaCart
(Show Context)
We present a new Gaussian process (GP) regression model whose covariance is parameterized by the the locations of M pseudoinput points, which we learn by a gradient based optimization. We take M ≪ N, where N is the number of real data points, and hence obtain a sparse regression method which has O(M 2 N) training cost and O(M 2) prediction cost per test case. We also find hyperparameters of the covariance function in the same joint optimization. The method can be viewed as a Bayesian regression model with particular input dependent noise. The method turns out to be closely related to several other sparse GP approaches, and we discuss the relation in detail. We finally demonstrate its performance on some large data sets, and make a direct comparison to other sparse GP methods. We show that our method can match full GP performance with small M, i.e. very sparse solutions, and it significantly outperforms other approaches in this regime. 1
A unifying view of sparse approximate Gaussian process regression
 Journal of Machine Learning Research
, 2005
"... We provide a new unifying view, including all existing proper probabilistic sparse approximations for Gaussian process regression. Our approach relies on expressing the effective prior which the methods are using. This allows new insights to be gained, and highlights the relationship between existin ..."
Abstract

Cited by 155 (6 self)
 Add to MetaCart
(Show Context)
We provide a new unifying view, including all existing proper probabilistic sparse approximations for Gaussian process regression. Our approach relies on expressing the effective prior which the methods are using. This allows new insights to be gained, and highlights the relationship between existing methods. It also allows for a clear theoretically justified ranking of the closeness of the known approximations to the corresponding full GPs. Finally we point directly to designs of new better sparse approximations, combining the best of the existing strategies, within attractive computational constraints.
Building Support Vector Machines with Reduced Classifier Complexity
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2006
"... Support vector machines (SVMs), though accurate, are not preferred in applications requiring great classification speed, due to the number of support vectors being large. To overcome this problem we devise a primal method with the following properties: (1) it decouples the idea of basis functions ..."
Abstract

Cited by 92 (2 self)
 Add to MetaCart
Support vector machines (SVMs), though accurate, are not preferred in applications requiring great classification speed, due to the number of support vectors being large. To overcome this problem we devise a primal method with the following properties: (1) it decouples the idea of basis functions from the concept of support vectors; (2) it greedily finds a set of kernel basis functions of a specified maximum size (d max ) to approximate the SVM primal cost function well; (3) it is efficient and roughly scales as O(nd max ) where n is the number of training examples; and, (4) the number of basis functions it requires to achieve an accuracy close to the SVM accuracy is usually far less than the number of SVM support vectors.
Spatial modelling using a new class of nonstationary covariance functions
 Environmetrics
, 2006
"... We introduce a new class of nonstationary covariance functions for spatial modelling. Nonstationary covariance functions allow the model to adapt to spatial surfaces whose variability changes with location. The class includes a nonstationary version of the Matérn stationary covariance, in which the ..."
Abstract

Cited by 64 (0 self)
 Add to MetaCart
We introduce a new class of nonstationary covariance functions for spatial modelling. Nonstationary covariance functions allow the model to adapt to spatial surfaces whose variability changes with location. The class includes a nonstationary version of the Matérn stationary covariance, in which the differentiability of the spatial surface is controlled by a parameter, freeing one from fixing the differentiability in advance. The class allows one to knit together local covariance parameters into a valid global nonstationary covariance, regardless of how the local covariance structure is estimated. We employ this new nonstationary covariance in a fully Bayesian model in which the unknown spatial process has a Gaussian process (GP) distribution with a nonstationary covariance function from the class. We model the nonstationary structure in a computationally efficient way that creates nearly stationary local behavior and for which stationarity is a special case. We also suggest nonBayesian approaches to nonstationary kriging. To assess the method, we compare the Bayesian nonstationary GP model with a Bayesian stationary GP model, various standard spatial smoothing approaches, and nonstationary models that can adapt to function heterogeneity. In simulations, the nonstationary GP model adapts to function heterogeneity, unlike the stationary models, and also outperforms the other nonstationary models. On a real dataset, GP models outperform the competitors, but while the nonstationary GP gives qualitatively more sensible results, it fails to outperform the stationary GP on heldout data, illustrating the difficulty in fitting complex spatial functions with relatively few observations. The nonstationary covariance model could also be used for nonGaussian data and embedded in additive models as well as in more complicated, hierarchical spatial or spatiotemporal models. More complicated models may require simpler parameterizations for computational efficiency.
Learning Stable Nonlinear Dynamical Systems With Gaussian Mixture Models
"... Abstract—This paper presents a method to learn discrete robot motions from a set of demonstrations. We model a motion as a nonlinear autonomous (i.e., timeinvariant) dynamical system (DS) and define sufficient conditions to ensure global asymptotic stability at the target. We propose a learning met ..."
Abstract

Cited by 57 (14 self)
 Add to MetaCart
(Show Context)
Abstract—This paper presents a method to learn discrete robot motions from a set of demonstrations. We model a motion as a nonlinear autonomous (i.e., timeinvariant) dynamical system (DS) and define sufficient conditions to ensure global asymptotic stability at the target. We propose a learning method, which is called Stable Estimator of Dynamical Systems (SEDS), to learn the parameters of the DS to ensure that all motions closely follow the demonstrations while ultimately reaching and stopping at the target. Timeinvariance and global asymptotic stability at the target ensures that the system can respond immediately and appropriately to perturbations that are encountered during the motion. The method is evaluated through a set of robot experiments and on a library of human handwriting motions. Index Terms—Dynamical systems (DS), Gaussian mixture model, imitation learning, pointtopoint motions, stability analysis. I.
Variational Bayesian multinomial probit regression with Gaussian process priors
 Neural Computation
, 2005
"... It is well known in the statistics literature that augmenting binary and polychotomous response models with Gaussian latent variables enables exact Bayesian analysis via Gibbs sampling from the parameter posterior. By adopting such a data augmentation strategy, dispensing with priors over regression ..."
Abstract

Cited by 57 (17 self)
 Add to MetaCart
(Show Context)
It is well known in the statistics literature that augmenting binary and polychotomous response models with Gaussian latent variables enables exact Bayesian analysis via Gibbs sampling from the parameter posterior. By adopting such a data augmentation strategy, dispensing with priors over regression coefficients in favour of Gaussian Process (GP) priors over functions, and employing variational approximations to the full posterior we obtain efficient computational methods for Gaussian Process classification in the multiclass setting 1. The model augmentation with additional latent variables ensures full a posteriori class coupling whilst retaining the simple a priori independent GP covariance structure from which sparse approximations, such as multiclass Informative Vector Machines (IVM), emerge in a very natural and straightforward manner. This is the first time that a fully Variational Bayesian treatment for multiclass GP classification has been developed without having to resort to additional explicit approximations to the nonGaussian likelihood term. Empirical comparisons with exact analysis via MCMC and Laplace approximations illustrate the utility of the variational approximation as a computationally economic alternative to full MCMC and it is shown to be more accurate than the Laplace approximation. 1
Nonstationary Covariance Functions for Gaussian Process Regression
 In Proc. of the Conf. on Neural Information Processing Systems (NIPS
, 2004
"... We introduce a class of nonstationary covariance functions for Gaussian process (GP) regression. Nonstationary covariance functions allow the model to adapt to functions whose smoothness varies with the inputs. ..."
Abstract

Cited by 57 (2 self)
 Add to MetaCart
We introduce a class of nonstationary covariance functions for Gaussian process (GP) regression. Nonstationary covariance functions allow the model to adapt to functions whose smoothness varies with the inputs.
Accelerating Evolutionary Algorithms with Gaussian Process Fitness Function Models
 IEEE Transactions on Systems, Man and Cybernetics
, 2004
"... We present an overview of evolutionary algorithms that use empirical models of the fitness function to accelerate convergence, distinguishing between Evolution Control and the Surrogate Approach. We describe the Gaussian process model and propose using it as an inexpensive fitness function surrogate ..."
Abstract

Cited by 52 (2 self)
 Add to MetaCart
(Show Context)
We present an overview of evolutionary algorithms that use empirical models of the fitness function to accelerate convergence, distinguishing between Evolution Control and the Surrogate Approach. We describe the Gaussian process model and propose using it as an inexpensive fitness function surrogate. Implementation issues such as efficient and numerically stable computation, exploration vs. exploitation, local modeling, multiple objectives and constraints, and failed evaluations are addressed. Our resulting Gaussian Process Optimization Procedure (GPOP) clearly outperforms other evolutionary strategies on standard test functions as well as on a realworld problem: the optimization of stationary gas turbine compressor profiles.