Results 1  10
of
67
Gaussian process latent variable models for visualisation of high dimensional data
 Adv. in Neural Inf. Proc. Sys
, 2004
"... We introduce a variational inference framework for training the Gaussian process latent variable model and thus performing Bayesian nonlinear dimensionality reduction. This method allows us to variationally integrate out the input variables of the Gaussian process and compute a lower bound on the ex ..."
Abstract

Cited by 132 (5 self)
 Add to MetaCart
We introduce a variational inference framework for training the Gaussian process latent variable model and thus performing Bayesian nonlinear dimensionality reduction. This method allows us to variationally integrate out the input variables of the Gaussian process and compute a lower bound on the exact marginal likelihood of the nonlinear latent variable model. The maximization of the variational lower bound provides a Bayesian training procedure that is robust to overfitting and can automatically select the dimensionality of the nonlinear latent space. We demonstrate our method on real world datasets. The focus in this paper is on dimensionality reduction problems, but the methodology is more general. For example, our algorithm is immediately applicable for training Gaussian process models in the presence of missing or uncertain inputs. 1
Sparse Gaussian processes using pseudoinputs
 Advances in Neural Information Processing Systems 18
, 2006
"... We present a new Gaussian process (GP) regression model whose covariance is parameterized by the the locations of M pseudoinput points, which we learn by a gradient based optimization. We take M ≪ N, where N is the number of real data points, and hence obtain a sparse regression method which has O( ..."
Abstract

Cited by 123 (8 self)
 Add to MetaCart
We present a new Gaussian process (GP) regression model whose covariance is parameterized by the the locations of M pseudoinput points, which we learn by a gradient based optimization. We take M ≪ N, where N is the number of real data points, and hence obtain a sparse regression method which has O(M 2 N) training cost and O(M 2) prediction cost per test case. We also find hyperparameters of the covariance function in the same joint optimization. The method can be viewed as a Bayesian regression model with particular input dependent noise. The method turns out to be closely related to several other sparse GP approaches, and we discuss the relation in detail. We finally demonstrate its performance on some large data sets, and make a direct comparison to other sparse GP methods. We show that our method can match full GP performance with small M, i.e. very sparse solutions, and it significantly outperforms other approaches in this regime. 1
A unifying view of sparse approximate Gaussian process regression
 Journal of Machine Learning Research
, 2005
"... We provide a new unifying view, including all existing proper probabilistic sparse approximations for Gaussian process regression. Our approach relies on expressing the effective prior which the methods are using. This allows new insights to be gained, and highlights the relationship between existin ..."
Abstract

Cited by 80 (3 self)
 Add to MetaCart
We provide a new unifying view, including all existing proper probabilistic sparse approximations for Gaussian process regression. Our approach relies on expressing the effective prior which the methods are using. This allows new insights to be gained, and highlights the relationship between existing methods. It also allows for a clear theoretically justified ranking of the closeness of the known approximations to the corresponding full GPs. Finally we point directly to designs of new better sparse approximations, combining the best of the existing strategies, within attractive computational constraints.
Building Support Vector Machines with Reduced Classifier Complexity
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2006
"... Support vector machines (SVMs), though accurate, are not preferred in applications requiring great classification speed, due to the number of support vectors being large. To overcome this problem we devise a primal method with the following properties: (1) it decouples the idea of basis functions ..."
Abstract

Cited by 58 (1 self)
 Add to MetaCart
Support vector machines (SVMs), though accurate, are not preferred in applications requiring great classification speed, due to the number of support vectors being large. To overcome this problem we devise a primal method with the following properties: (1) it decouples the idea of basis functions from the concept of support vectors; (2) it greedily finds a set of kernel basis functions of a specified maximum size (d max ) to approximate the SVM primal cost function well; (3) it is efficient and roughly scales as O(nd max ) where n is the number of training examples; and, (4) the number of basis functions it requires to achieve an accuracy close to the SVM accuracy is usually far less than the number of SVM support vectors.
Nonstationary Covariance Functions for Gaussian Process Regression
 In Proc. of the Conf. on Neural Information Processing Systems (NIPS
, 2004
"... We introduce a class of nonstationary covariance functions for Gaussian process (GP) regression. Nonstationary covariance functions allow the model to adapt to functions whose smoothness varies with the inputs. ..."
Abstract

Cited by 36 (2 self)
 Add to MetaCart
We introduce a class of nonstationary covariance functions for Gaussian process (GP) regression. Nonstationary covariance functions allow the model to adapt to functions whose smoothness varies with the inputs.
Accelerating Evolutionary Algorithms with Gaussian Process Fitness Function Models
 IEEE Transactions on Systems, Man and Cybernetics
, 2004
"... We present an overview of evolutionary algorithms that use empirical models of the fitness function to accelerate convergence, distinguishing between Evolution Control and the Surrogate Approach. We describe the Gaussian process model and propose using it as an inexpensive fitness function surrogate ..."
Abstract

Cited by 35 (1 self)
 Add to MetaCart
We present an overview of evolutionary algorithms that use empirical models of the fitness function to accelerate convergence, distinguishing between Evolution Control and the Surrogate Approach. We describe the Gaussian process model and propose using it as an inexpensive fitness function surrogate. Implementation issues such as efficient and numerically stable computation, exploration vs. exploitation, local modeling, multiple objectives and constraints, and failed evaluations are addressed. Our resulting Gaussian Process Optimization Procedure (GPOP) clearly outperforms other evolutionary strategies on standard test functions as well as on a realworld problem: the optimization of stationary gas turbine compressor profiles.
Variational Bayesian multinomial probit regression with Gaussian process priors
 Neural Computation
, 2005
"... It is well known in the statistics literature that augmenting binary and polychotomous response models with Gaussian latent variables enables exact Bayesian analysis via Gibbs sampling from the parameter posterior. By adopting such a data augmentation strategy, dispensing with priors over regression ..."
Abstract

Cited by 32 (10 self)
 Add to MetaCart
It is well known in the statistics literature that augmenting binary and polychotomous response models with Gaussian latent variables enables exact Bayesian analysis via Gibbs sampling from the parameter posterior. By adopting such a data augmentation strategy, dispensing with priors over regression coefficients in favour of Gaussian Process (GP) priors over functions, and employing variational approximations to the full posterior we obtain efficient computational methods for Gaussian Process classification in the multiclass setting 1. The model augmentation with additional latent variables ensures full a posteriori class coupling whilst retaining the simple a priori independent GP covariance structure from which sparse approximations, such as multiclass Informative Vector Machines (IVM), emerge in a very natural and straightforward manner. This is the first time that a fully Variational Bayesian treatment for multiclass GP classification has been developed without having to resort to additional explicit approximations to the nonGaussian likelihood term. Empirical comparisons with exact analysis via MCMC and Laplace approximations illustrate the utility of the variational approximation as a computationally economic alternative to full MCMC and it is shown to be more accurate than the Laplace approximation. 1
Robust submodular observation selection
, 2008
"... In many applications, one has to actively select among a set of expensive observations before making an informed decision. For example, in environmental monitoring, we want to select locations to measure in order to most effectively predict spatial phenomena. Often, we want to select observations wh ..."
Abstract

Cited by 29 (3 self)
 Add to MetaCart
In many applications, one has to actively select among a set of expensive observations before making an informed decision. For example, in environmental monitoring, we want to select locations to measure in order to most effectively predict spatial phenomena. Often, we want to select observations which are robust against a number of possible objective functions. Examples include minimizing the maximum posterior variance in Gaussian Process regression, robust experimental design, and sensor placement for outbreak detection. In this paper, we present the Submodular Saturation algorithm, a simple and efficient algorithm with strong theoretical approximation guarantees for cases where the possible objective functions exhibit submodularity, an intuitive diminishing returns property. Moreover, we prove that better approximation algorithms do not exist unless NPcomplete problems admit efficient algorithms. We show how our algorithm can be extended to handle complex cost functions (incorporating nonunit observation cost or communication and path costs). We also show how the algorithm can be used to nearoptimally trade off expectedcase (e.g., the Mean Square Prediction Error in Gaussian Process regression) and worstcase (e.g., maximum predictive variance) performance. We show that many important machine learning problems fit our robust submodular observation selection formalism, and provide extensive empirical evaluation on several realworld problems. For Gaussian Process regression, our algorithm compares favorably with stateoftheart heuristics described in the geostatistics literature, while being simpler, faster and providing theoretical guarantees. For robust experimental design, our algorithm performs favorably compared to SDPbased algorithms.
Spatial modelling using a new class of nonstationary covariance functions
 Environmetrics
, 2006
"... We introduce a new class of nonstationary covariance functions for spatial modelling. Nonstationary covariance functions allow the model to adapt to spatial surfaces whose variability changes with location. The class includes a nonstationary version of the Matérn stationary covariance, in which the ..."
Abstract

Cited by 27 (0 self)
 Add to MetaCart
We introduce a new class of nonstationary covariance functions for spatial modelling. Nonstationary covariance functions allow the model to adapt to spatial surfaces whose variability changes with location. The class includes a nonstationary version of the Matérn stationary covariance, in which the differentiability of the spatial surface is controlled by a parameter, freeing one from fixing the differentiability in advance. The class allows one to knit together local covariance parameters into a valid global nonstationary covariance, regardless of how the local covariance structure is estimated. We employ this new nonstationary covariance in a fully Bayesian model in which the unknown spatial process has a Gaussian process (GP) distribution with a nonstationary covariance function from the class. We model the nonstationary structure in a computationally efficient way that creates nearly stationary local behavior and for which stationarity is a special case. We also suggest nonBayesian approaches to nonstationary kriging. To assess the method, we compare the Bayesian nonstationary GP model with a Bayesian stationary GP model, various standard spatial smoothing approaches, and nonstationary models that can adapt to function heterogeneity. In simulations, the nonstationary GP model adapts to function heterogeneity, unlike the stationary models, and also outperforms the other nonstationary models. On a real dataset, GP models outperform the competitors, but while the nonstationary GP gives qualitatively more sensible results, it fails to outperform the stationary GP on heldout data, illustrating the difficulty in fitting complex spatial functions with relatively few observations. The nonstationary covariance model could also be used for nonGaussian data and embedded in additive models as well as in more complicated, hierarchical spatial or spatiotemporal models. More complicated models may require simpler parameterizations for computational efficiency.