Results 1  10
of
17
Stochastic Variational Inference
 JOURNAL OF MACHINE LEARNING RESEARCH (2013, IN PRESS)
, 2013
"... We develop stochastic variational inference, a scalable algorithm for approximating posterior distributions. We develop this technique for a large class of probabilistic models and we demonstrate it with two probabilistic topic models, latent Dirichlet allocation and the hierarchical Dirichlet proce ..."
Abstract

Cited by 131 (27 self)
 Add to MetaCart
(Show Context)
We develop stochastic variational inference, a scalable algorithm for approximating posterior distributions. We develop this technique for a large class of probabilistic models and we demonstrate it with two probabilistic topic models, latent Dirichlet allocation and the hierarchical Dirichlet process topic model. Using stochastic variational inference, we analyze several large collections of documents: 300K articles from Nature, 1.8M articles from The New York Times, and 3.8M articles from Wikipedia. Stochastic inference can easily handle data sets of this size and outperforms traditional variational inference, which can only handle a smaller subset. (We also show that the Bayesian nonparametric topic model outperforms its parametric counterpart.) Stochastic variational inference lets us apply complex Bayesian models to massive data sets.
Variational inference in nonconjugate models
 Journal of Machine Learning Research
, 2013
"... Meanfield variational methods are widely used for approximate posterior inference in many probabilistic models. In a typical application, meanfield methods approximately compute the posterior with a coordinateascent optimization algorithm. When the model is conditionally conjugate, the coordinate ..."
Abstract

Cited by 21 (4 self)
 Add to MetaCart
Meanfield variational methods are widely used for approximate posterior inference in many probabilistic models. In a typical application, meanfield methods approximately compute the posterior with a coordinateascent optimization algorithm. When the model is conditionally conjugate, the coordinate updates are easily derived and in closed form. However, many models of interest—like the correlated topic model and Bayesian logistic regression—are nonconjugate. In these models, meanfield methods cannot be directly applied and practitioners have had to develop variational algorithms on a casebycase basis. In this paper, we develop two generic methods for nonconjugate models, Laplace variational inference and delta method variational inference. Our methods have several advantages: they allow for easily derived variational algorithms with a wide class of nonconjugate models; they extend and unify some of the existing algorithms that have been derived for specific models; and they work well on realworld data sets. We studied our methods on the correlated topic model, Bayesian logistic regression, and hierarchical Bayesian logistic regression.
Black box variational inference
 In Proceedings of the 17th International Conference on Artificial Intelligence and Statistics
, 2014
"... Variational inference has become a widely used method to approximate posteriors in complex latent variables models. However, deriving a variational inference algorithm generally requires significant modelspecific analysis. These efforts can hinder and deter us from quickly developing and explorin ..."
Abstract

Cited by 19 (6 self)
 Add to MetaCart
(Show Context)
Variational inference has become a widely used method to approximate posteriors in complex latent variables models. However, deriving a variational inference algorithm generally requires significant modelspecific analysis. These efforts can hinder and deter us from quickly developing and exploring a variety of models for a problem at hand. In this paper, we present a “black box ” variational inference algorithm, one that can be quickly applied to many models with little additional derivation. Our method is based on a stochastic optimization of the variational objective where the noisy gradient is computed from Monte Carlo samples from the variational distribution. We develop a number of methods to reduce the variance of the gradient, always maintaining the criterion that we want to avoid difficult modelbased derivations. We evaluate our method against the corresponding black box sampling based methods. We find that our method reaches better predictive likelihoods much faster than sampling methods. Finally, we demonstrate that Black Box Variational Inference lets us easily explore a wide space of models by quickly constructing and evaluating several models of longitudinal healthcare data. 1
FixedForm Variational Posterior Approximation through Stochastic Linear Regression.” Bayesian Analysis
, 2013
"... We propose a general algorithm for approximating nonstandard Bayesian posterior distributions. The algorithm minimizes the KullbackLeibler divergence of an approximating distribution to the intractable posterior distribution. Our method can be used to approximate any posterior distribution, provid ..."
Abstract

Cited by 15 (4 self)
 Add to MetaCart
We propose a general algorithm for approximating nonstandard Bayesian posterior distributions. The algorithm minimizes the KullbackLeibler divergence of an approximating distribution to the intractable posterior distribution. Our method can be used to approximate any posterior distribution, provided that it is given in closed form up to the proportionality constant. The approximation can be any distribution in the exponential family or any mixture of such distributions, which means that it can be made arbitrarily precise. Several examples illustrate the speed and accuracy of our approximation method in practice. 1
Fast Dual Variational Inference for NonConjugate Latent Gaussian Models
"... Latent Gaussian models (LGMs) are widely used in statistics and machine learning. Bayesian inference in nonconjugate LGMs is difficult due to intractable integrals involving the Gaussian prior and nonconjugate likelihoods. Algorithms based on variational Gaussian (VG) approximations are widely emp ..."
Abstract

Cited by 11 (2 self)
 Add to MetaCart
(Show Context)
Latent Gaussian models (LGMs) are widely used in statistics and machine learning. Bayesian inference in nonconjugate LGMs is difficult due to intractable integrals involving the Gaussian prior and nonconjugate likelihoods. Algorithms based on variational Gaussian (VG) approximations are widely employed since they strike a favorable balance between accuracy, generality, speed, and ease of use. However, the structure of the optimization problems associated with these approximations remains poorly understood, and standard solvers take too long to converge. We derive a novel dual variational inference approach that exploits the convexity property of the VG approximations. We obtain an algorithm that solves a convex optimization problem, reduces the number of variational parameters, and converges much faster than previous methods. Using realworld data, we demonstrate these advantages on a variety of LGMs, including Gaussian
Affine Independent Variational Inference
"... We consider inference in a broad class of nonconjugate probabilistic models based on minimising the KullbackLeibler divergence between the given target density and an approximating ‘variational ’ density. In particular, for generalised linear models we describe approximating densities formed from ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
We consider inference in a broad class of nonconjugate probabilistic models based on minimising the KullbackLeibler divergence between the given target density and an approximating ‘variational ’ density. In particular, for generalised linear models we describe approximating densities formed from an affine transformation of independently distributed latent variables, this class including many well known densities as special cases. We show how all relevant quantities can be efficiently computed using the fast Fourier transform. This extends the known class of tractable variational approximations and enables the fitting for example of skew variational densities to the target density. 1
Conference summary
 In Marketing Science Institute Conference on Big Data
, 2012
"... Licence for publication ..."
Variational Multinomial Logit Gaussian Process
"... Gaussian process prior with an appropriate likelihood function is a flexible nonparametric model for a variety of learning tasks. One important and standard task is multiclass classification, which is the categorization of an item into one of several fixed classes. A usual likelihood function for ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Gaussian process prior with an appropriate likelihood function is a flexible nonparametric model for a variety of learning tasks. One important and standard task is multiclass classification, which is the categorization of an item into one of several fixed classes. A usual likelihood function for this is the multinomial logistic likelihood function. However, exact inference with this model has proved to be difficult because highdimensional integrations are required. In this paper, we propose a variational approximation to this model, and we describe the optimization of the variational parameters. Experiments have shown our approximation to be tight. In addition, we provide dataindependent bounds on the marginal likelihood of the model, one of which is shown to be much tighter than the existing variational meanfield bound in the experiments. We also derive a proper lower bound on the predictive likelihood that involves the KullbackLeibler divergence between the approximating and the true posterior. We combine our approach with a recently proposed sparse approximation to give a variational sparse approximation to the Gaussian process multiclass model. We also derive criteria which can be used to select the inducing set, and we show the effectiveness of these criteria over random selection in an experiment.
The language of discretion
 in C. Ricks and L. Michaels (eds), The State of Language, Faber and
, 1990
"... variational inference for largescale ..."
(Show Context)
Approximate inference via variational sampling
, 2011
"... We propose a new method to approximately integrate a function with respect to a given probability distribution when an exact computation is intractable. The method is called “variational sampling ” as it involves fitting a simplified distribution for which the integral has a closedform expression, ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
We propose a new method to approximately integrate a function with respect to a given probability distribution when an exact computation is intractable. The method is called “variational sampling ” as it involves fitting a simplified distribution for which the integral has a closedform expression, and using a set of randomly sampled control points to optimize the fit. The novelty lies in the chosen objective function, namely a Monte Carlo approximation to the generalized KullbackLeibler divergence, which differs from classical methods that implement a similar idea, such as Bayesian Monte Carlo and importance sampling. We review several attractive mathematical properties of variational sampling, including wellposedness under a simple condition on the sample size, and a central limit theorem analogous to the case of importance sampling. We then report various simulations that essentially show that variational sampling has the potential to outperform existing methods within comparable computation time in estimating moments of order up to 2. We conclude with a brief discussion of desirable enhancements. 1