Results 1  10
of
10
Hierarchical Models of Variance Sources
 SIGNAL PROCESSING
, 2003
"... In many models, variances are assumed to be constant although this assumption is often unrealistic in practice. Joint modelling of means and variances is di#cult in many learning approaches, because it can lead into infinite probability densities. We show that a Bayesian variational technique which ..."
Abstract

Cited by 33 (12 self)
 Add to MetaCart
In many models, variances are assumed to be constant although this assumption is often unrealistic in practice. Joint modelling of means and variances is di#cult in many learning approaches, because it can lead into infinite probability densities. We show that a Bayesian variational technique which is sensitive to probability mass instead of density is able to jointly model both variances and means. We consider a model structure where a Gaussian variable, called variance node, controls the variance of another Gaussian variable. Variance nodes make it possible to build hierarchical models for both variances and means. We report experiments with artificial data which demonstrate the ability of the learning algorithm to find variance sources explaining and characterizing well the variances in the multidimensional data. Experiments with biomedical MEG data show that variance sources are present in realworld signals.
Variational learning and bitsback coding: an informationtheoretic view to Bayesian learning
 IEEE Transactions on Neural Networks
"... Abstract—The bitsback coding first introduced by Wallace in 1990 and later by Hinton and van Camp in 1993 provides an interesting link between Bayesian learning and informationtheoretic minimumdescriptionlength (MDL) learning approaches. The bitsback coding allows interpreting the cost function ..."
Abstract

Cited by 17 (7 self)
 Add to MetaCart
Abstract—The bitsback coding first introduced by Wallace in 1990 and later by Hinton and van Camp in 1993 provides an interesting link between Bayesian learning and informationtheoretic minimumdescriptionlength (MDL) learning approaches. The bitsback coding allows interpreting the cost function used in the variational Bayesian method called ensemble learning as a code length in addition to the Bayesian view of misfit of the posterior approximation and a lower bound of model evidence. Combining these two viewpoints provides interesting insights to the learning process and the functions of different parts of the model. In this paper, the problem of variational Bayesian learning of hierarchical latent variable models is used to demonstrate the benefits of the two views. The codelength interpretation provides new views to many parts of the problem such as model comparison and pruning and helps explain many phenomena occurring in learning. Index Terms—Bitsback coding, ensemble learning, hierarchical latent variable models, minimum description length, variational Bayesian learning. I.
PRACTICAL APPROACHES TO PRINCIPAL COMPONENT ANALYSIS IN THE PRESENCE OF MISSING VALUES
"... Informaatio ja luonnontieteiden tiedekunta ..."
Building Blocks For Variational Bayesian Learning Of Latent Variable Models
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2006
"... We introduce standardised building blocks designed to be used with variational Bayesian learning. The blocks include Gaussian variables, summation, multiplication, nonlinearity, and delay. A large variety of latent variable models can be constructed from these blocks, including variance models a ..."
Abstract

Cited by 11 (8 self)
 Add to MetaCart
We introduce standardised building blocks designed to be used with variational Bayesian learning. The blocks include Gaussian variables, summation, multiplication, nonlinearity, and delay. A large variety of latent variable models can be constructed from these blocks, including variance models and nonlinear modelling, which are lacking from most existing variational systems. The introduced blocks are designed to fit together and to yield e#cient update rules. Practical implementation of various models is easy thanks to an associated software package which derives the learning formulas automatically once a specific model structure has been fixed. Variational Bayesian learning provides a cost function which is used both for updating the variables of the model and for optimising the model structure. All the computations can be carried out locally, resulting in linear computational complexity. We present
Bayes Blocks: An implementation of the variational Bayesian building blocks framework
 In Proceedings of the 21st Conference on Uncertainty in Artificial Intelligence, UAI 2005
, 2005
"... A software library for constructing and learning probabilistic models is presented. The library offers a set of building blocks from which a large variety of static and dynamic models can be built. These include hierarchical models for variances of other variables and many nonlinear models. The unde ..."
Abstract

Cited by 7 (5 self)
 Add to MetaCart
A software library for constructing and learning probabilistic models is presented. The library offers a set of building blocks from which a large variety of static and dynamic models can be built. These include hierarchical models for variances of other variables and many nonlinear models. The underlying variational Bayesian machinery, providing for fast and robust estimation but being mathematically rather involved, is almost completely hidden from the user thus making it very easy to use the library. The building blocks include Gaussian, rectified Gaussian and mixtureofGaussians variables and computational nodes which can be combined rather freely. 1
Approximate riemannian conjugate gradient learning for fixedform variational bayes
 Journal of Machine Learning Research
"... Variational Bayesian (VB) methods are typically only applied to models in the conjugateexponential family using the variational Bayesian expectation maximisation (VB EM) algorithm or one of its variants. In this paper we present an efficient algorithm for applying VB to more general models. The met ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
Variational Bayesian (VB) methods are typically only applied to models in the conjugateexponential family using the variational Bayesian expectation maximisation (VB EM) algorithm or one of its variants. In this paper we present an efficient algorithm for applying VB to more general models. The method is based on specifying the functional form of the approximation, such as multivariate Gaussian. The parameters of the approximation are optimised using a conjugate gradient algorithm that utilises the Riemannian geometry of the space of the approximations. This leads to a very efficient algorithm for suitably structured approximations. It is shown empirically that the proposed method is comparable or superior in efficiency to the VB EM in a case where both are applicable. We also apply the algorithm to learning a nonlinear statespace model and a nonlinear factor analysis model for which the VB EM is not applicable. For these models, the proposed algorithm outperforms alternative gradientbased methods by a significant margin.
A GradientBased Algorithm Competitive with Variational Bayesian EM for Mixture of Gaussians
"... Abstract — While variational Bayesian (VB) inference is typically done with the so called VB EM algorithm, there are models where it cannot be applied because either the Estep or the Mstep cannot be solved analytically. In 2007, Honkela et al. introduced a recipe for a gradientbased algorithm for ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Abstract — While variational Bayesian (VB) inference is typically done with the so called VB EM algorithm, there are models where it cannot be applied because either the Estep or the Mstep cannot be solved analytically. In 2007, Honkela et al. introduced a recipe for a gradientbased algorithm for VB inference that does not have such a restriction. In this paper, we derive the algorithm in the case of the mixture of Gaussians model. For the first time, the algorithm is experimentally compared to VB EM and its variant with both artificial and real data. We conclude that the algorithms are approximately as fast depending on the problem. I.
Transformations for Variational Factor Analysis to Speed up Learning
"... Abstract. We propose simple transformation of the hidden states in variational Bayesian (VB) factor analysis models to speed up the learning procedure. The transformation basically performs centering and whitening of the hidden states taking into account the posterior uncertainties. The transformati ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Abstract. We propose simple transformation of the hidden states in variational Bayesian (VB) factor analysis models to speed up the learning procedure. The transformation basically performs centering and whitening of the hidden states taking into account the posterior uncertainties. The transformation is given a theoretical justification from optimisation of the VB cost function. We derive the transformation formulae for variational Bayesian principal component analysis and show experimentally that it can significantly improve the rate of convergence. Similar transformations can be applied to other variational Bayesian factor analysis models as well. 1
ISBN 9789512287017 ISSN 17962803Natural Conjugate Gradient in Variational Inference
, 2007
"... Variational methods for approximate inference in machine learning often adapt a parametric probability distribution to optimize a given objective function. This view is especially useful when applying variational Bayes (VB) to models outside the conjugateexponential family. For them, variational EM ..."
Abstract
 Add to MetaCart
Variational methods for approximate inference in machine learning often adapt a parametric probability distribution to optimize a given objective function. This view is especially useful when applying variational Bayes (VB) to models outside the conjugateexponential family. For them, variational EM algorithms are not easily available, and gradientbased methods are often used as alternatives. However, regular gradient methods ignore the Riemannian geometry of the manifold of probability distributions, thus leading to slow convergence. We propose using the Riemannian structure of the approximations and the natural gradient to speed up a conjugate gradient method for variational learning and inference. As the form of the approximating distribution is often very simple, the natural gradient can be used for both model parameters and latent variables without significant computational overhead. Experiments in variational Bayesian learning of nonlinear statespace models for real speech data show more than tenfold speedups over alternative learning algorithms. 1
Submitted 07/2009; Published x/x Approximate Riemannian Conjugate Gradient Learning for FixedForm Variational Bayes
"... Variational Bayesian (VB) methods are typically only applied to models in the conjugateexponential family using the variational Bayesian expectation maximisation (VB EM) algorithm or one of its variants. In this paper we present an efficient algorithm for applying VB to more general models. The meth ..."
Abstract
 Add to MetaCart
Variational Bayesian (VB) methods are typically only applied to models in the conjugateexponential family using the variational Bayesian expectation maximisation (VB EM) algorithm or one of its variants. In this paper we present an efficient algorithm for applying VB to more general models. The method is based on specifying the functional form of the approximation, such as multivariate Gaussian. The parameters of the approximation are optimised using a conjugate gradient algorithm that utilises the Riemannian geometry of the space of the approximations. This leads to a very efficient algorithm for suitably structured approximations. It is shown empirically that the proposed method is comparable or superior in efficiency to the VB EM in a case where both are applicable. We also apply the algorithm to learning a nonlinear statespace model and a nonlinear factor analysis model for which the VB EM is not applicable. For these models, the proposed algorithm outperforms alternative gradientbased methods by a significant margin.