Results 1 - 10
of
17
Scalable and robust Bayesian inference via the median posterior
- In Proceedings of the 31st International Conference on Machine Learning (ICML-14
, 2014
"... Many Bayesian learning methods for massive data benefit from working with small subsets of observations. In particular, significant progress has been made in scalable Bayesian learning via stochastic approximation. However, Bayesian learning methods in distributed computing en-vironments are often p ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Many Bayesian learning methods for massive data benefit from working with small subsets of observations. In particular, significant progress has been made in scalable Bayesian learning via stochastic approximation. However, Bayesian learning methods in distributed computing en-vironments are often problem- or distribution-specific and use ad hoc techniques. We pro-pose a novel general approach to Bayesian in-ference that is scalable and robust to corruption in the data. Our technique is based on the idea of splitting the data into several non-overlapping subgroups, evaluating the posterior distribution given each independent subgroup, and then com-bining the results. Our main contribution is the proposed aggregation step which is based on finding the geometric median of subset poste-rior distributions. Presented theoretical and nu-merical results confirm the advantages of our ap-proach. 1.
Hilbert space methods for reduced-rank Gaussian process regression. arXiv preprint 1401.5508
, 2014
"... This paper proposes a novel scheme for reduced-rank Gaussian process regression. The method is based on an approximate series expansion of the covariance function in terms of an eigenfunction expansion of the Laplace operator in a compact subset of Rd. On this ap-proximate eigenbasis the eigenvalues ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
This paper proposes a novel scheme for reduced-rank Gaussian process regression. The method is based on an approximate series expansion of the covariance function in terms of an eigenfunction expansion of the Laplace operator in a compact subset of Rd. On this ap-proximate eigenbasis the eigenvalues of the covariance function can be expressed as simple functions of the spectral density of the Gaussian process, which allows the GP inference to be solved under a computational cost scaling as O(nm2) (initial) and O(m3) (hyper-parameter learning) with m basis functions and n data points. The approach also allows for rigorous error analysis with Hilbert space theory, and we show that the approximation becomes exact when the size of the compact subset and the number of eigenfunctions go to infinity. The expansion generalizes to Hilbert spaces with an inner product which is defined as an integral over a specified input density. The method is compared to previously proposed methods theoretically and through empirical tests with simulated and real data.
Incremental Local Gaussian Regression
"... Locally weighted regression (LWR) was created as a nonparametric method that can approximate a wide range of functions, is computationally efficient, and can learn continually from very large amounts of incrementally collected data. As an interesting feature, LWR can regress on non-stationary functi ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
(Show Context)
Locally weighted regression (LWR) was created as a nonparametric method that can approximate a wide range of functions, is computationally efficient, and can learn continually from very large amounts of incrementally collected data. As an interesting feature, LWR can regress on non-stationary functions, a beneficial property, for instance, in control problems. However, it does not provide a proper generative model for function values, and existing algorithms have a variety of manual tuning parameters that strongly influence bias, variance and learning speed of the results. Gaussian (process) regression, on the other hand, does provide a generative model with rather black-box automatic parameter tuning, but it has higher computational cost, especially for big data sets and if a non-stationary model is required. In this paper, we suggest a path from Gaussian (process) regression to locally weighted regression, where we retain the best of both approaches. Using a localizing function basis and approximate inference techniques, we build a Gaussian (process) regression algorithm of increasingly local nature and similar computational complexity to LWR. Empirical evaluations are performed on several synthetic and real robot datasets of increasing complexity and (big) data scale, and demonstrate that we consistently achieve on par or superior performance compared to current state-of-the-art methods while retaining a principled approach to fast incremental regression with minimal manual tuning parameters. 1
Tree-structured Gaussian process approximations
- in Advances in Neural Information Processing Systems 27
, 2014
"... Gaussian process regression can be accelerated by constructing a small pseudo-dataset to summarize the observed data. This idea sits at the heart of many approx-imation schemes, but such an approach requires the number of pseudo-datapoints to be scaled with the range of the input space if the accura ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
(Show Context)
Gaussian process regression can be accelerated by constructing a small pseudo-dataset to summarize the observed data. This idea sits at the heart of many approx-imation schemes, but such an approach requires the number of pseudo-datapoints to be scaled with the range of the input space if the accuracy of the approxi-mation is to be maintained. This presents problems in time-series settings or in spatial datasets where large numbers of pseudo-datapoints are required since com-putation typically scales quadratically with the pseudo-dataset size. In this paper we devise an approximation whose complexity grows linearly with the number of pseudo-datapoints. This is achieved by imposing a tree or chain structure on the pseudo-datapoints and calibrating the approximation using a Kullback-Leibler (KL) minimization. Inference and learning can then be performed efficiently us-ing the Gaussian belief propagation algorithm. We demonstrate the validity of our approach on a set of challenging regression tasks including missing data imputa-tion for audio and spatial datasets. We trace out the speed-accuracy trade-off for the new method and show that the frontier dominates those obtained from a large number of existing approximation techniques. 1
Fast direct methods for Gaussian processes and analysis of NASA Kepler mission data
, 2015
"... Abstract—A number of problems in probability and statistics can be addressed using the multivariate normal (or multivariate Gaussian) distribution. In the one-dimensional case, computing the probability for a given mean and variance simply requires the evaluation of the corresponding Gaussian densit ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
(Show Context)
Abstract—A number of problems in probability and statistics can be addressed using the multivariate normal (or multivariate Gaussian) distribution. In the one-dimensional case, computing the probability for a given mean and variance simply requires the evaluation of the corresponding Gaussian density. In the n-dimensional setting, however, it requires the inversion of an n × n covariance matrix, C, as well as the evaluation of its determinant, det(C). In many cases, the covariance matrix is of the form C = σ2I + K, where K is computed using a specified kernel, which depends on the data and additional parameters (called hyperparameters in Gaussian process computations). The matrix C is typically dense, causing standard direct methods for inversion and determinant evaluation to require O(n3) work. This cost is prohibitive for large-scale modeling. Here, we show that for the most commonly used covariance functions, the matrix C can be hierarchically factored into a product of block low-rank updates of the identity matrix, yielding an O(n log2 n) algorithm for inversion, as discussed in Ambikasaran and Darve, 2013. More importantly, we show that this factorization enables the evaluation of the determinant det(C), permitting the direct calculation of probabilities in high dimensions under fairly broad assumption about the kernel defining K. Our fast algorithm brings many problems in marginalization and the adaptation of hyperparameters within practical reach using a single CPU core. The combination of nearly optimal scaling in terms of problem size with high-performance computing resources will permit the modeling of previously intractable problems. We illustrate the performance of the scheme on standard covariance kernels, and apply it to a real data set obtained from the Kepler Mission.
Fast allocation of gaussian process experts
- In International Conference on Machine Learning
, 2014
"... We propose a scalable nonparametric Bayesian regression model based on a mixture of Gaussian process (GP) experts and the inducing points for-malism underpinning sparse GP approximations. Each expert is augmented with a set of inducing points, and the allocation of data points to experts is defined ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
We propose a scalable nonparametric Bayesian regression model based on a mixture of Gaussian process (GP) experts and the inducing points for-malism underpinning sparse GP approximations. Each expert is augmented with a set of inducing points, and the allocation of data points to experts is defined probabilistically based on their prox-imity to the experts. This allocation mechanism enables a fast variational inference procedure for learning of the inducing inputs and hyperparam-eters of the experts. When using K experts, our method can run K2 times faster and use K2 times less memory than popular sparse methods such as the FITC approximation. Furthermore, it is easy to parallelize and handles non-stationarity straightforwardly. Our experiments show that on medium-sized datasets (of around 104 training points) it trains up to 5 times faster than FITC while achieving comparable accuracy. On a large dataset of 105 training points, our method sig-nificantly outperforms six competitive baselines while requiring only a few hours of training. 1.
Kernel Interpolation for Scalable Structured Gaussian Processes (KISS-GP)
"... We introduce a new structured kernel interpolation (SKI) framework, which generalises and unifies inducing point methods for scalable Gaussian processes (GPs). SKI methods produce kernel approximations for fast computations through kernel interpolation. The SKI framework clarifies how the quality of ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
We introduce a new structured kernel interpolation (SKI) framework, which generalises and unifies inducing point methods for scalable Gaussian processes (GPs). SKI methods produce kernel approximations for fast computations through kernel interpolation. The SKI framework clarifies how the quality of an inducing point approach depends on the number of inducing (aka interpola-tion) points, interpolation strategy, and GP covariance kernel. SKI also provides a mechanism to create new scalable kernel methods, through choosing different kernel interpolation strategies. Using SKI, with local cubic kernel interpolation, we introduce KISS-GP, which is 1) more scalable than inducing point alterna-tives, 2) naturally enables Kronecker and Toeplitz algebra for substantial addi-tional gains in scalability, without requiring any grid data, and 3) can be used for fast and expressive kernel learning. KISS-GP costs O(n) time and storage for GP inference. We evaluate KISS-GP for kernel matrix approximation, kernel learning, and natural sound modelling. 1
Online sparse Gaussian process regression using FITC and PITC approximations?
"... Gaussian processes; Non-parametric regression; System identification. Abstract: We provide a method which allows for online updating of sparse Gaussian Process (GP) regression algorithms for any set of inducing inputs. This method is derived both for the Fully Independent Training Conditional (FITC) ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Gaussian processes; Non-parametric regression; System identification. Abstract: We provide a method which allows for online updating of sparse Gaussian Process (GP) regression algorithms for any set of inducing inputs. This method is derived both for the Fully Independent Training Conditional (FITC) and the Partially Independent Training Conditional (PITC) approximation, and it allows the inclusion of a new measurement point xn+1 in O(m2) time, with m denoting the size of the set of inducing inputs. Due to the online nature of the algorithms, it is possible to forget earlier measurement data, which means that also the memory space required is O(m2), both for FITC and PITC. We show that this method is able to efficiently apply GP regression to a large data set with accurate results. 1.
Fast Gaussian Process Posteriors with Product Trees
"... Gaussian processes (GP) are a powerful tool for nonparametric regression; unfortunately, calcu-lating the posterior variance in a standard GP model requires time O(n2) in the size of the training set. Previous work by Shen et al. (2006) used a k-d tree structure to approximate the pos-terior mean in ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Gaussian processes (GP) are a powerful tool for nonparametric regression; unfortunately, calcu-lating the posterior variance in a standard GP model requires time O(n2) in the size of the training set. Previous work by Shen et al. (2006) used a k-d tree structure to approximate the pos-terior mean in certain GP models. We extend this approach to achieve efficient approximation of the posterior covariance using a tree clustering on pairs of training points, and demonstrate sig-nificant improvements in performance with neg-ligible loss of accuracy. 1
Bayesian Filtering with Online Gaussian Process Latent Variable Models
"... In this paper we present a novel non-parametric approach to Bayesian filtering, where the prediction and observation models are learned in an online fashion. Our approach is able to handle multimodal distributions over both models by employing a mixture model representation with Gaussian Processes ( ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
In this paper we present a novel non-parametric approach to Bayesian filtering, where the prediction and observation models are learned in an online fashion. Our approach is able to handle multimodal distributions over both models by employing a mixture model representation with Gaussian Processes (GP) based components. To cope with the increasing complexity of the estimation process, we explore two computationally efficient GP variants, sparse online GP and local GP, which help to manage computation requirements for each mixture component. Our experiments demonstrate that our approach can track human motion much more accurately than existing approaches that learn the prediction and observation models offline and do not update these models with the incoming data stream. 1