Results 1  10
of
33
An Unsupervised Ensemble Learning Method for Nonlinear Dynamic StateSpace Models
 Neural Computation
, 2001
"... A Bayesian ensemble learning method is introduced for unsupervised extraction of dynamic processes from noisy data. The data are assumed to be generated by an unknown nonlinear mapping from unknown factors. The dynamics of the factors are modeled using a nonlinear statespace model. The nonlinear map ..."
Abstract

Cited by 87 (32 self)
 Add to MetaCart
A Bayesian ensemble learning method is introduced for unsupervised extraction of dynamic processes from noisy data. The data are assumed to be generated by an unknown nonlinear mapping from unknown factors. The dynamics of the factors are modeled using a nonlinear statespace model. The nonlinear mappings in the model are represented using multilayer perceptron networks. The proposed method is computationally demanding, but it allows the use of higher dimensional nonlinear latent variable models than other existing approaches. Experiments with chaotic data show that the new method is able to blindly estimate the factors and the dynamic process which have generated the data. It clearly outperforms currently available nonlinear prediction techniques in this very di#cult test problem.
Ensemble Learning for Hidden Markov Models
, 1997
"... The standard method for training Hidden Markov Models optimizes a point estimate of the model parameters. This estimate, which can be viewed as the maximum of a posterior probability density over the model parameters, may be susceptible to overfitting, and contains no indication of parameter uncerta ..."
Abstract

Cited by 79 (0 self)
 Add to MetaCart
The standard method for training Hidden Markov Models optimizes a point estimate of the model parameters. This estimate, which can be viewed as the maximum of a posterior probability density over the model parameters, may be susceptible to overfitting, and contains no indication of parameter uncertainty. Also, this maximummay be unrepresentative of the posterior probability distribution. In this paper we study a method in which we optimize an ensemble which approximates the entire posterior probability distribution. The ensemble learning algorithm requires the same resources as the traditional BaumWelch algorithm. The traditional training algorithm for hidden Markov models is an expectation maximization (EM) algorithm (Dempster et al. 1977) known as the BaumWelch algorithm. It is a maximum likelihood method, or, with a simple modification, a penalized maximum likelihood method, which can be viewed as maximizing a posterior probability density over the model parameters. Recently, ...
Comparison of Approximate Methods for Handling Hyperparameters
 NEURAL COMPUTATION
"... I examine two approximate methods for computational implementation of Bayesian hierarchical models, that is, models which include unknown hyperparameters such as regularization constants and noise levels. In the 'evidence framework' the model parameters are integrated over, and the resulting evid ..."
Abstract

Cited by 67 (1 self)
 Add to MetaCart
I examine two approximate methods for computational implementation of Bayesian hierarchical models, that is, models which include unknown hyperparameters such as regularization constants and noise levels. In the 'evidence framework' the model parameters are integrated over, and the resulting evidence is maximized over the hyperparameters. The optimized
Ensemble Learning For Independent Component Analysis
, 1999
"... In this paper, a recently developed Bayesian method called ensemble learning is applied to independent component analysis (ICA). Ensemble learning is a computationally efficient approximation for exact Bayesian analysis. In general, the posterior probability density function (pdf) is a complex high ..."
Abstract

Cited by 45 (4 self)
 Add to MetaCart
In this paper, a recently developed Bayesian method called ensemble learning is applied to independent component analysis (ICA). Ensemble learning is a computationally efficient approximation for exact Bayesian analysis. In general, the posterior probability density function (pdf) is a complex high dimensional function whose exact treatment is diffucult. In ensemble learning, the posterior pdf is approximated by a more simple function and KullbackLeibler information is used as the criterion for minimising the misfit between the actual posterior pdf and its parametric approximation. In this paper, the posterior pdf is approximated by a diagonal Gaussian pdf. According to the ICAmodel used in this paper, the measurements are generated by a linear mapping from mutually independent source signals whose distributions are mixtures of Gaussians. The measurements are also assumed to have additive Gaussian noise with diagonal covariance. The model structure and all parameters of the distribution...
An Ensemble Learning Approach To Independent Component Analysis
 In Proc. of the IEEE Workshop on Neural Networks for Signal Processing
, 2000
"... . Independent Component Analysis (ICA) is an important tool for extracting structure from data. ICA is traditionally performed under a maximum likelihood scheme in a latent variable model and in the absence of noise. Although extensively utilised, maximum likelihood estimation has well known drawbac ..."
Abstract

Cited by 23 (8 self)
 Add to MetaCart
. Independent Component Analysis (ICA) is an important tool for extracting structure from data. ICA is traditionally performed under a maximum likelihood scheme in a latent variable model and in the absence of noise. Although extensively utilised, maximum likelihood estimation has well known drawbacks such as overfitting and sensitivity to localmaxima. In this paper, we propose a Bayesian learning scheme, Variational Bayes or Ensemble Learning, for both latent variables and parameters in the model. We extend current research in this area by utilising a wide variety of priors over model parameters, including noise, and learning the latent distribution as part of the ensemble learning procedure. We demonstrate the model by unmixing a linear mixture of musical signals. INTRODUCTION Independent Component Analysis (ICA) seeks to extract salient features and structure from a dataset where the dataset is assumed to be a linear mixture of independent underlying (hidden) features. The goal o...
Variational Mixture of Bayesian Independent Component Analysers
 Neural Computation
, 2002
"... There has been growing interest in subspace data modelling over the past few years. Methods such as Principal Component Analysis, Factor Analysis and Independent Component Analysis have gained in popularity and have found many applications in image modelling, signal processing and data compression t ..."
Abstract

Cited by 22 (5 self)
 Add to MetaCart
There has been growing interest in subspace data modelling over the past few years. Methods such as Principal Component Analysis, Factor Analysis and Independent Component Analysis have gained in popularity and have found many applications in image modelling, signal processing and data compression to name just a few. As applications and computing power grow, more and more sophisticated analyses and meaningful representations are sought. Mixture modelling methods have been proposed for principal and factor analysers which exploit local Gaussian features in the subspace manifolds. Meaningful representations may be lost, however, if these local features are nonGaussian and/or discontinuous. In this paper we propose extending the Gaussian analysers mixture model to an Independent Component Analysers mixture model. We employ recent developments in variational Bayesian inference and structure determination to construct a novel approach for modelling nonGaussian, discontinuous manifolds. We automaticaly determine the local dimensionality of each manifold and use variational inference to calculate the optimum number of ICA components needed in our mixture model. We demonstrate our framework on complex synthetic data and illustrate its application to real data by decomposing functional Magnetic Resonance Images into meaningful  and medically useful  features.
Variational learning and bitsback coding: an informationtheoretic view to Bayesian learning
 IEEE Transactions on Neural Networks
"... Abstract—The bitsback coding first introduced by Wallace in 1990 and later by Hinton and van Camp in 1993 provides an interesting link between Bayesian learning and informationtheoretic minimumdescriptionlength (MDL) learning approaches. The bitsback coding allows interpreting the cost function ..."
Abstract

Cited by 17 (7 self)
 Add to MetaCart
Abstract—The bitsback coding first introduced by Wallace in 1990 and later by Hinton and van Camp in 1993 provides an interesting link between Bayesian learning and informationtheoretic minimumdescriptionlength (MDL) learning approaches. The bitsback coding allows interpreting the cost function used in the variational Bayesian method called ensemble learning as a code length in addition to the Bayesian view of misfit of the posterior approximation and a lower bound of model evidence. Combining these two viewpoints provides interesting insights to the learning process and the functions of different parts of the model. In this paper, the problem of variational Bayesian learning of hierarchical latent variable models is used to demonstrate the benefits of the two views. The codelength interpretation provides new views to many parts of the problem such as model comparison and pruning and helps explain many phenomena occurring in learning. Index Terms—Bitsback coding, ensemble learning, hierarchical latent variable models, minimum description length, variational Bayesian learning. I.
Accelerating cyclic update algorithms for parameter estimation by pattern searches
 Neural Processing Letters
"... Abstract. A popular strategy for dealing with large parameter estimation problems is to split the problem into manageable subproblems and solve them cyclically one by one until convergence. A wellknown drawback of this strategy is slow convergence in low noise conditions. We propose using socalled ..."
Abstract

Cited by 16 (9 self)
 Add to MetaCart
Abstract. A popular strategy for dealing with large parameter estimation problems is to split the problem into manageable subproblems and solve them cyclically one by one until convergence. A wellknown drawback of this strategy is slow convergence in low noise conditions. We propose using socalled pattern searches which consist of an exploratory phase followed by a line search. During the exploratory phase, a search direction is determined by combining the individual updates of all subproblems. The approach can be used to speed up several wellknown learning methods such as variational Bayesian learning (ensemble learning) and expectationmaximization algorithm with modest algorithmic modifications. Experimental results show that the proposed method is able to reduce the required convergence time by 60–85 % in realistic variational Bayesian learning problems.
Building Blocks For Variational Bayesian Learning Of Latent Variable Models
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2006
"... We introduce standardised building blocks designed to be used with variational Bayesian learning. The blocks include Gaussian variables, summation, multiplication, nonlinearity, and delay. A large variety of latent variable models can be constructed from these blocks, including variance models a ..."
Abstract

Cited by 11 (8 self)
 Add to MetaCart
We introduce standardised building blocks designed to be used with variational Bayesian learning. The blocks include Gaussian variables, summation, multiplication, nonlinearity, and delay. A large variety of latent variable models can be constructed from these blocks, including variance models and nonlinear modelling, which are lacking from most existing variational systems. The introduced blocks are designed to fit together and to yield e#cient update rules. Practical implementation of various models is easy thanks to an associated software package which derives the learning formulas automatically once a specific model structure has been fixed. Variational Bayesian learning provides a cost function which is used both for updating the variables of the model and for optimising the model structure. All the computations can be carried out locally, resulting in linear computational complexity. We present