Results 1  10
of
17
Scalable and robust Bayesian inference via the median posterior
 In Proceedings of the 31st International Conference on Machine Learning (ICML14
, 2014
"... Many Bayesian learning methods for massive data benefit from working with small subsets of observations. In particular, significant progress has been made in scalable Bayesian learning via stochastic approximation. However, Bayesian learning methods in distributed computing environments are often p ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
Many Bayesian learning methods for massive data benefit from working with small subsets of observations. In particular, significant progress has been made in scalable Bayesian learning via stochastic approximation. However, Bayesian learning methods in distributed computing environments are often problem or distributionspecific and use ad hoc techniques. We propose a novel general approach to Bayesian inference that is scalable and robust to corruption in the data. Our technique is based on the idea of splitting the data into several nonoverlapping subgroups, evaluating the posterior distribution given each independent subgroup, and then combining the results. Our main contribution is the proposed aggregation step which is based on finding the geometric median of subset posterior distributions. Presented theoretical and numerical results confirm the advantages of our approach. 1.
Hilbert space methods for reducedrank Gaussian process regression. arXiv preprint 1401.5508
, 2014
"... This paper proposes a novel scheme for reducedrank Gaussian process regression. The method is based on an approximate series expansion of the covariance function in terms of an eigenfunction expansion of the Laplace operator in a compact subset of Rd. On this approximate eigenbasis the eigenvalues ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
This paper proposes a novel scheme for reducedrank Gaussian process regression. The method is based on an approximate series expansion of the covariance function in terms of an eigenfunction expansion of the Laplace operator in a compact subset of Rd. On this approximate eigenbasis the eigenvalues of the covariance function can be expressed as simple functions of the spectral density of the Gaussian process, which allows the GP inference to be solved under a computational cost scaling as O(nm2) (initial) and O(m3) (hyperparameter learning) with m basis functions and n data points. The approach also allows for rigorous error analysis with Hilbert space theory, and we show that the approximation becomes exact when the size of the compact subset and the number of eigenfunctions go to infinity. The expansion generalizes to Hilbert spaces with an inner product which is defined as an integral over a specified input density. The method is compared to previously proposed methods theoretically and through empirical tests with simulated and real data.
Incremental Local Gaussian Regression
"... Locally weighted regression (LWR) was created as a nonparametric method that can approximate a wide range of functions, is computationally efficient, and can learn continually from very large amounts of incrementally collected data. As an interesting feature, LWR can regress on nonstationary functi ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
Locally weighted regression (LWR) was created as a nonparametric method that can approximate a wide range of functions, is computationally efficient, and can learn continually from very large amounts of incrementally collected data. As an interesting feature, LWR can regress on nonstationary functions, a beneficial property, for instance, in control problems. However, it does not provide a proper generative model for function values, and existing algorithms have a variety of manual tuning parameters that strongly influence bias, variance and learning speed of the results. Gaussian (process) regression, on the other hand, does provide a generative model with rather blackbox automatic parameter tuning, but it has higher computational cost, especially for big data sets and if a nonstationary model is required. In this paper, we suggest a path from Gaussian (process) regression to locally weighted regression, where we retain the best of both approaches. Using a localizing function basis and approximate inference techniques, we build a Gaussian (process) regression algorithm of increasingly local nature and similar computational complexity to LWR. Empirical evaluations are performed on several synthetic and real robot datasets of increasing complexity and (big) data scale, and demonstrate that we consistently achieve on par or superior performance compared to current stateoftheart methods while retaining a principled approach to fast incremental regression with minimal manual tuning parameters. 1
Treestructured Gaussian process approximations
 in Advances in Neural Information Processing Systems 27
, 2014
"... Gaussian process regression can be accelerated by constructing a small pseudodataset to summarize the observed data. This idea sits at the heart of many approximation schemes, but such an approach requires the number of pseudodatapoints to be scaled with the range of the input space if the accura ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
(Show Context)
Gaussian process regression can be accelerated by constructing a small pseudodataset to summarize the observed data. This idea sits at the heart of many approximation schemes, but such an approach requires the number of pseudodatapoints to be scaled with the range of the input space if the accuracy of the approximation is to be maintained. This presents problems in timeseries settings or in spatial datasets where large numbers of pseudodatapoints are required since computation typically scales quadratically with the pseudodataset size. In this paper we devise an approximation whose complexity grows linearly with the number of pseudodatapoints. This is achieved by imposing a tree or chain structure on the pseudodatapoints and calibrating the approximation using a KullbackLeibler (KL) minimization. Inference and learning can then be performed efficiently using the Gaussian belief propagation algorithm. We demonstrate the validity of our approach on a set of challenging regression tasks including missing data imputation for audio and spatial datasets. We trace out the speedaccuracy tradeoff for the new method and show that the frontier dominates those obtained from a large number of existing approximation techniques. 1
Fast direct methods for Gaussian processes and analysis of NASA Kepler mission data
, 2015
"... Abstract—A number of problems in probability and statistics can be addressed using the multivariate normal (or multivariate Gaussian) distribution. In the onedimensional case, computing the probability for a given mean and variance simply requires the evaluation of the corresponding Gaussian densit ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
Abstract—A number of problems in probability and statistics can be addressed using the multivariate normal (or multivariate Gaussian) distribution. In the onedimensional case, computing the probability for a given mean and variance simply requires the evaluation of the corresponding Gaussian density. In the ndimensional setting, however, it requires the inversion of an n × n covariance matrix, C, as well as the evaluation of its determinant, det(C). In many cases, the covariance matrix is of the form C = σ2I + K, where K is computed using a specified kernel, which depends on the data and additional parameters (called hyperparameters in Gaussian process computations). The matrix C is typically dense, causing standard direct methods for inversion and determinant evaluation to require O(n3) work. This cost is prohibitive for largescale modeling. Here, we show that for the most commonly used covariance functions, the matrix C can be hierarchically factored into a product of block lowrank updates of the identity matrix, yielding an O(n log2 n) algorithm for inversion, as discussed in Ambikasaran and Darve, 2013. More importantly, we show that this factorization enables the evaluation of the determinant det(C), permitting the direct calculation of probabilities in high dimensions under fairly broad assumption about the kernel defining K. Our fast algorithm brings many problems in marginalization and the adaptation of hyperparameters within practical reach using a single CPU core. The combination of nearly optimal scaling in terms of problem size with highperformance computing resources will permit the modeling of previously intractable problems. We illustrate the performance of the scheme on standard covariance kernels, and apply it to a real data set obtained from the Kepler Mission.
Fast allocation of gaussian process experts
 In International Conference on Machine Learning
, 2014
"... We propose a scalable nonparametric Bayesian regression model based on a mixture of Gaussian process (GP) experts and the inducing points formalism underpinning sparse GP approximations. Each expert is augmented with a set of inducing points, and the allocation of data points to experts is defined ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
We propose a scalable nonparametric Bayesian regression model based on a mixture of Gaussian process (GP) experts and the inducing points formalism underpinning sparse GP approximations. Each expert is augmented with a set of inducing points, and the allocation of data points to experts is defined probabilistically based on their proximity to the experts. This allocation mechanism enables a fast variational inference procedure for learning of the inducing inputs and hyperparameters of the experts. When using K experts, our method can run K2 times faster and use K2 times less memory than popular sparse methods such as the FITC approximation. Furthermore, it is easy to parallelize and handles nonstationarity straightforwardly. Our experiments show that on mediumsized datasets (of around 104 training points) it trains up to 5 times faster than FITC while achieving comparable accuracy. On a large dataset of 105 training points, our method significantly outperforms six competitive baselines while requiring only a few hours of training. 1.
Kernel Interpolation for Scalable Structured Gaussian Processes (KISSGP)
"... We introduce a new structured kernel interpolation (SKI) framework, which generalises and unifies inducing point methods for scalable Gaussian processes (GPs). SKI methods produce kernel approximations for fast computations through kernel interpolation. The SKI framework clarifies how the quality of ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
We introduce a new structured kernel interpolation (SKI) framework, which generalises and unifies inducing point methods for scalable Gaussian processes (GPs). SKI methods produce kernel approximations for fast computations through kernel interpolation. The SKI framework clarifies how the quality of an inducing point approach depends on the number of inducing (aka interpolation) points, interpolation strategy, and GP covariance kernel. SKI also provides a mechanism to create new scalable kernel methods, through choosing different kernel interpolation strategies. Using SKI, with local cubic kernel interpolation, we introduce KISSGP, which is 1) more scalable than inducing point alternatives, 2) naturally enables Kronecker and Toeplitz algebra for substantial additional gains in scalability, without requiring any grid data, and 3) can be used for fast and expressive kernel learning. KISSGP costs O(n) time and storage for GP inference. We evaluate KISSGP for kernel matrix approximation, kernel learning, and natural sound modelling. 1
Online sparse Gaussian process regression using FITC and PITC approximations?
"... Gaussian processes; Nonparametric regression; System identification. Abstract: We provide a method which allows for online updating of sparse Gaussian Process (GP) regression algorithms for any set of inducing inputs. This method is derived both for the Fully Independent Training Conditional (FITC) ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Gaussian processes; Nonparametric regression; System identification. Abstract: We provide a method which allows for online updating of sparse Gaussian Process (GP) regression algorithms for any set of inducing inputs. This method is derived both for the Fully Independent Training Conditional (FITC) and the Partially Independent Training Conditional (PITC) approximation, and it allows the inclusion of a new measurement point xn+1 in O(m2) time, with m denoting the size of the set of inducing inputs. Due to the online nature of the algorithms, it is possible to forget earlier measurement data, which means that also the memory space required is O(m2), both for FITC and PITC. We show that this method is able to efficiently apply GP regression to a large data set with accurate results. 1.
Fast Gaussian Process Posteriors with Product Trees
"... Gaussian processes (GP) are a powerful tool for nonparametric regression; unfortunately, calculating the posterior variance in a standard GP model requires time O(n2) in the size of the training set. Previous work by Shen et al. (2006) used a kd tree structure to approximate the posterior mean in ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Gaussian processes (GP) are a powerful tool for nonparametric regression; unfortunately, calculating the posterior variance in a standard GP model requires time O(n2) in the size of the training set. Previous work by Shen et al. (2006) used a kd tree structure to approximate the posterior mean in certain GP models. We extend this approach to achieve efficient approximation of the posterior covariance using a tree clustering on pairs of training points, and demonstrate significant improvements in performance with negligible loss of accuracy. 1
Bayesian Filtering with Online Gaussian Process Latent Variable Models
"... In this paper we present a novel nonparametric approach to Bayesian filtering, where the prediction and observation models are learned in an online fashion. Our approach is able to handle multimodal distributions over both models by employing a mixture model representation with Gaussian Processes ( ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
In this paper we present a novel nonparametric approach to Bayesian filtering, where the prediction and observation models are learned in an online fashion. Our approach is able to handle multimodal distributions over both models by employing a mixture model representation with Gaussian Processes (GP) based components. To cope with the increasing complexity of the estimation process, we explore two computationally efficient GP variants, sparse online GP and local GP, which help to manage computation requirements for each mixture component. Our experiments demonstrate that our approach can track human motion much more accurately than existing approaches that learn the prediction and observation models offline and do not update these models with the incoming data stream. 1