Results 1  10
of
26
Variational EM algorithms for nonGaussian latent variable models
 In Advances in Neural Information Processing Systems 18
, 2006
"... We consider criteria for variational representations of nonGaussian latent variables, and derive variational EM algorithms in general form. We establish a general equivalence among convex bounding methods, evidence based methods, and ensemble learning/Variational Bayes methods, which has previous ..."
Abstract

Cited by 64 (19 self)
 Add to MetaCart
(Show Context)
We consider criteria for variational representations of nonGaussian latent variables, and derive variational EM algorithms in general form. We establish a general equivalence among convex bounding methods, evidence based methods, and ensemble learning/Variational Bayes methods, which has previously been demonstrated only for particular cases. 1
Nonlinear PCA: a missing data approach
 BIOINFORMATICS
, 2005
"... Motivation: Visualising and analysing the potential nonlinear structure of a data set is becoming an important task in molecular biology. This is even more challenging when the data have missing values. Results: Here, we propose an inverse model that performs nonlinear principal component analysis ( ..."
Abstract

Cited by 33 (10 self)
 Add to MetaCart
Motivation: Visualising and analysing the potential nonlinear structure of a data set is becoming an important task in molecular biology. This is even more challenging when the data have missing values. Results: Here, we propose an inverse model that performs nonlinear principal component analysis (NLPCA) from incomplete data sets. Missing values are ignored while optimising the model, but can be estimated afterwards. Results are shown for both artificial and experimental data sets. In contrast to linear methods, nonlinear methods were able to give better missing value estimations for nonlinear structured data. Application: We applied this technique to a time course of metabolite data from a cold stress experiment on the model plant Arabidopsis thaliana, and could approximate the mapping function from any time point to the metabolite responses. Thus, the inverse NLPCA provides greatly improved information for better understanding the complex response to cold stress.
Stochastic backpropagation and approximate inference in deep generative models
, 2014
"... We marry ideas from deep neural networks and approximate Bayesian inference to derive a generalised class of deep, directed generative models, endowed with a new algorithm for scalable inference and learning. Our algorithm introduces a recognition model to represent an approximate posterior distri ..."
Abstract

Cited by 21 (2 self)
 Add to MetaCart
(Show Context)
We marry ideas from deep neural networks and approximate Bayesian inference to derive a generalised class of deep, directed generative models, endowed with a new algorithm for scalable inference and learning. Our algorithm introduces a recognition model to represent an approximate posterior distribution and uses this for optimisation of a variational lower bound. We develop stochastic backpropagation – rules for gradient backpropagation through stochastic variables – and derive an algorithm that allows for joint optimisation of the parameters of both the generative and recognition models. We demonstrate on several realworld data sets that by using stochastic backpropagation and variational inference, we obtain models that are able to generate realistic samples of data, allow for accurate imputations of missing data, and provide a useful tool for highdimensional data visualisation. 1.
The Variational Gaussian Approximation Revisited
, 2009
"... The variational approximation of posterior distributions by multivariate Gaussians has been much less popular in the Machine Learning community compared to the corresponding approximation by factorising distributions. This is for a good reason: the Gaussian approximation is in general plagued by an ..."
Abstract

Cited by 21 (0 self)
 Add to MetaCart
The variational approximation of posterior distributions by multivariate Gaussians has been much less popular in the Machine Learning community compared to the corresponding approximation by factorising distributions. This is for a good reason: the Gaussian approximation is in general plagued by an O(N 2) number of variational parameters to be optimised, N being the number of random variables. In this work, we discuss the relationship between the Laplace and the variational approximation and we show that for models with Gaussian priors and factorising likelihoods, the number of variational parameters is actually O(N). The approach is applied to Gaussian process regression with nonGaussian likelihoods. 1
Approximate riemannian conjugate gradient learning for fixedform variational bayes
 Journal of Machine Learning Research
"... Variational Bayesian (VB) methods are typically only applied to models in the conjugateexponential family using the variational Bayesian expectation maximisation (VB EM) algorithm or one of its variants. In this paper we present an efficient algorithm for applying VB to more general models. The met ..."
Abstract

Cited by 20 (3 self)
 Add to MetaCart
(Show Context)
Variational Bayesian (VB) methods are typically only applied to models in the conjugateexponential family using the variational Bayesian expectation maximisation (VB EM) algorithm or one of its variants. In this paper we present an efficient algorithm for applying VB to more general models. The method is based on specifying the functional form of the approximation, such as multivariate Gaussian. The parameters of the approximation are optimised using a conjugate gradient algorithm that utilises the Riemannian geometry of the space of the approximations. This leads to a very efficient algorithm for suitably structured approximations. It is shown empirically that the proposed method is comparable or superior in efficiency to the VB EM in a case where both are applicable. We also apply the algorithm to learning a nonlinear statespace model and a nonlinear factor analysis model for which the VB EM is not applicable. For these models, the proposed algorithm outperforms alternative gradientbased methods by a significant margin.
StateSpace Inference and Learning with Gaussian Processes
"... Statespace inference and learning with Gaussian processes (GPs) is an unsolved problem. We propose a new, general methodology for inference and learning in nonlinear statespace models that are described probabilistically by nonparametric GP models. We apply the expectation maximization algorithm ..."
Abstract

Cited by 20 (7 self)
 Add to MetaCart
(Show Context)
Statespace inference and learning with Gaussian processes (GPs) is an unsolved problem. We propose a new, general methodology for inference and learning in nonlinear statespace models that are described probabilistically by nonparametric GP models. We apply the expectation maximization algorithm to iterate between inference in the latent statespace and learning the parameters of the underlying GP dynamics model. Inference (filtering and smoothing) in linear dynamical systems (LDS) and nonlinear dynamical systems (NLDS) is frequently used in many areas, such as signal processing, state estimation, control, and finance/econometric models. Inference aims to estimate the state of a system from a stream of noisy measurements. Imagine tracking the location of a car based on odometer and GPS sensors, both of which are noisy. Sequential measurements from both sensors are combined to overcome the noise in the system and to obtain an accurate estimate of the system state. Even when the full state is only partially measured, it can still be inferred; in the car example the engine temperature is unobserved, but can be inferred via the nonlinear relationship from acceleration. To exploit this relationship appropriately, inference techniques in nonlinear models are required; they play an important role in many practical applications. LDS and NLDS belong to a class of models known as statespace models. A statespace model assumes that there exists a time sequence of latent states xt that evolve over time according to a Markovian process specified by a transition function f. The latent states are observed indirectly in y t through a measurement
Natural Conjugate Gradient in Variational Inference
"... in machine learning often adapt a parametric probability distribution to optimize a given objective function. This view is especially useful when applying variational Bayes (VB) to models outside the conjugateexponential family. For them, variational Bayesian expectation maximization (VB EM) algori ..."
Abstract

Cited by 13 (6 self)
 Add to MetaCart
(Show Context)
in machine learning often adapt a parametric probability distribution to optimize a given objective function. This view is especially useful when applying variational Bayes (VB) to models outside the conjugateexponential family. For them, variational Bayesian expectation maximization (VB EM) algorithms are not easily available, and gradientbased methods are often used as alternatives. Traditional natural gradient methods use the Riemannian structure (or geometry) of the predictive distribution to speed up maximum likelihood estimation. We propose using the geometry of the variational approximating distribution instead to speed up a conjugate gradient method for variational learning and inference. The computational overhead is small due to the simplicity of the approximating distribution. Experiments with realworld speech data show significant
Variational inference in nonconjugate models
 Journal of Machine Learning Research
, 2013
"... Meanfield variational methods are widely used for approximate posterior inference in many probabilistic models. In a typical application, meanfield methods approximately compute the posterior with a coordinateascent optimization algorithm. When the model is conditionally conjugate, the coordinate ..."
Abstract

Cited by 13 (3 self)
 Add to MetaCart
(Show Context)
Meanfield variational methods are widely used for approximate posterior inference in many probabilistic models. In a typical application, meanfield methods approximately compute the posterior with a coordinateascent optimization algorithm. When the model is conditionally conjugate, the coordinate updates are easily derived and in closed form. However, many models of interest—like the correlated topic model and Bayesian logistic regression—are nonconjugate. In these models, meanfield methods cannot be directly applied and practitioners have had to develop variational algorithms on a casebycase basis. In this paper, we develop two generic methods for nonconjugate models, Laplace variational inference and delta method variational inference. Our methods have several advantages: they allow for easily derived variational algorithms with a wide class of nonconjugate models; they extend and unify some of the existing algorithms that have been derived for specific models; and they work well on realworld data sets. We studied our methods on the correlated topic model, Bayesian logistic regression, and hierarchical Bayesian logistic regression.
Building Blocks For Variational Bayesian Learning Of Latent Variable Models
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2006
"... We introduce standardised building blocks designed to be used with variational Bayesian learning. The blocks include Gaussian variables, summation, multiplication, nonlinearity, and delay. A large variety of latent variable models can be constructed from these blocks, including variance models a ..."
Abstract

Cited by 11 (8 self)
 Add to MetaCart
We introduce standardised building blocks designed to be used with variational Bayesian learning. The blocks include Gaussian variables, summation, multiplication, nonlinearity, and delay. A large variety of latent variable models can be constructed from these blocks, including variance models and nonlinear modelling, which are lacking from most existing variational systems. The introduced blocks are designed to fit together and to yield e#cient update rules. Practical implementation of various models is easy thanks to an associated software package which derives the learning formulas automatically once a specific model structure has been fixed. Variational Bayesian learning provides a cost function which is used both for updating the variables of the model and for optimising the model structure. All the computations can be carried out locally, resulting in linear computational complexity. We present
Variational Bayes for continuoustime nonlinear statespace models
 In NIPS*2006 Workshop on Dynamical Systems, Stochastic Processes and Bayesian Inference
, 2006
"... We present an extension of the variational Bayesian nonlinear statespace model introduced by Valpola and Karhunen in 2002 [1] for continuoustime models. The model is based on using multilayer perceptron (MLP) networks to model the nonlinearities. Moving to continuoustime requires solving a stocha ..."
Abstract

Cited by 5 (4 self)
 Add to MetaCart
(Show Context)
We present an extension of the variational Bayesian nonlinear statespace model introduced by Valpola and Karhunen in 2002 [1] for continuoustime models. The model is based on using multilayer perceptron (MLP) networks to model the nonlinearities. Moving to continuoustime requires solving a stochastic differential equation (SDE) to evaluate the predictive distribution of the states, but otherwise all computation happens as in the discretetime case. The close connection between the methods allows utilising our new improved state inference method for both discretetime and continuoustime modelling. 1