Results 1  10
of
56
Ensemble Learning for Hidden Markov Models
, 1997
"... The standard method for training Hidden Markov Models optimizes a point estimate of the model parameters. This estimate, which can be viewed as the maximum of a posterior probability density over the model parameters, may be susceptible to overfitting, and contains no indication of parameter uncerta ..."
Abstract

Cited by 94 (0 self)
 Add to MetaCart
The standard method for training Hidden Markov Models optimizes a point estimate of the model parameters. This estimate, which can be viewed as the maximum of a posterior probability density over the model parameters, may be susceptible to overfitting, and contains no indication of parameter uncertainty. Also, this maximummay be unrepresentative of the posterior probability distribution. In this paper we study a method in which we optimize an ensemble which approximates the entire posterior probability distribution. The ensemble learning algorithm requires the same resources as the traditional BaumWelch algorithm. The traditional training algorithm for hidden Markov models is an expectation maximization (EM) algorithm (Dempster et al. 1977) known as the BaumWelch algorithm. It is a maximum likelihood method, or, with a simple modification, a penalized maximum likelihood method, which can be viewed as maximizing a posterior probability density over the model parameters. Recently, ...
An Unsupervised Ensemble Learning Method for Nonlinear Dynamic StateSpace Models
 Neural Computation
, 2001
"... A Bayesian ensemble learning method is introduced for unsupervised extraction of dynamic processes from noisy data. The data are assumed to be generated by an unknown nonlinear mapping from unknown factors. The dynamics of the factors are modeled using a nonlinear statespace model. The nonlinear map ..."
Abstract

Cited by 91 (32 self)
 Add to MetaCart
(Show Context)
A Bayesian ensemble learning method is introduced for unsupervised extraction of dynamic processes from noisy data. The data are assumed to be generated by an unknown nonlinear mapping from unknown factors. The dynamics of the factors are modeled using a nonlinear statespace model. The nonlinear mappings in the model are represented using multilayer perceptron networks. The proposed method is computationally demanding, but it allows the use of higher dimensional nonlinear latent variable models than other existing approaches. Experiments with chaotic data show that the new method is able to blindly estimate the factors and the dynamic process which have generated the data. It clearly outperforms currently available nonlinear prediction techniques in this very di#cult test problem.
Comparison of Approximate Methods for Handling Hyperparameters
 NEURAL COMPUTATION
"... I examine two approximate methods for computational implementation of Bayesian hierarchical models, that is, models which include unknown hyperparameters such as regularization constants and noise levels. In the 'evidence framework' the model parameters are integrated over, and the resu ..."
Abstract

Cited by 87 (1 self)
 Add to MetaCart
I examine two approximate methods for computational implementation of Bayesian hierarchical models, that is, models which include unknown hyperparameters such as regularization constants and noise levels. In the 'evidence framework' the model parameters are integrated over, and the resulting evidence is maximized over the hyperparameters. The optimized
Ensemble Learning For Independent Component Analysis
, 1999
"... In this paper, a recently developed Bayesian method called ensemble learning is applied to independent component analysis (ICA). Ensemble learning is a computationally efficient approximation for exact Bayesian analysis. In general, the posterior probability density function (pdf) is a complex high ..."
Abstract

Cited by 50 (4 self)
 Add to MetaCart
In this paper, a recently developed Bayesian method called ensemble learning is applied to independent component analysis (ICA). Ensemble learning is a computationally efficient approximation for exact Bayesian analysis. In general, the posterior probability density function (pdf) is a complex high dimensional function whose exact treatment is diffucult. In ensemble learning, the posterior pdf is approximated by a more simple function and KullbackLeibler information is used as the criterion for minimising the misfit between the actual posterior pdf and its parametric approximation. In this paper, the posterior pdf is approximated by a diagonal Gaussian pdf. According to the ICAmodel used in this paper, the measurements are generated by a linear mapping from mutually independent source signals whose distributions are mixtures of Gaussians. The measurements are also assumed to have additive Gaussian noise with diagonal covariance. The model structure and all parameters of the distribution...
Ensemble learning for multilayer networks
 in Advances in Neural Information Processing Systems
, 1998
"... Bayesian treatments of learning in neural networks are typically based either on local Gaussian approximations to a mode of the posterior weight distribution, or on Markov chain Monte Carlo simulations. A third approach, called ensemble learning, was introduced by Hinton and van Camp (1993). It aim ..."
Abstract

Cited by 43 (2 self)
 Add to MetaCart
(Show Context)
Bayesian treatments of learning in neural networks are typically based either on local Gaussian approximations to a mode of the posterior weight distribution, or on Markov chain Monte Carlo simulations. A third approach, called ensemble learning, was introduced by Hinton and van Camp (1993). It aims to approximate the posterior distribution by minimizing the KullbackLeibler divergence between the true posterior and a parametric approximating distribution. However, the derivation of a deterministic algorithm relied on the use of a Gaussian approximating distribution with a diagonal covariance matrix and so was unable to capture the posterior correlations between parameters. In this paper, we show how the ensemble learning approach can be extended to fullcovariance Gaussian distributions while remaining computationally tractable. We also extend the framework to deal with hyperparameters, leading to a simple reestimation procedure. Initial results from a standard benchmark problem are encouraging. 1
Variational mixture of bayesian independent component analysers
 Neural Computat
"... ..."
(Show Context)
Roberts S.: An Ensemble Learning Approach to Independent Component Analysis
 In Proceedings of Neural Networks for Signal Processing
, 2000
"... ..."
(Show Context)
Variational learning and bitsback coding: an informationtheoretic view to Bayesian learning
 IEEE Transactions on Neural Networks
"... Abstract—The bitsback coding first introduced by Wallace in 1990 and later by Hinton and van Camp in 1993 provides an interesting link between Bayesian learning and informationtheoretic minimumdescriptionlength (MDL) learning approaches. The bitsback coding allows interpreting the cost function ..."
Abstract

Cited by 18 (7 self)
 Add to MetaCart
(Show Context)
Abstract—The bitsback coding first introduced by Wallace in 1990 and later by Hinton and van Camp in 1993 provides an interesting link between Bayesian learning and informationtheoretic minimumdescriptionlength (MDL) learning approaches. The bitsback coding allows interpreting the cost function used in the variational Bayesian method called ensemble learning as a code length in addition to the Bayesian view of misfit of the posterior approximation and a lower bound of model evidence. Combining these two viewpoints provides interesting insights to the learning process and the functions of different parts of the model. In this paper, the problem of variational Bayesian learning of hierarchical latent variable models is used to demonstrate the benefits of the two views. The codelength interpretation provides new views to many parts of the problem such as model comparison and pruning and helps explain many phenomena occurring in learning. Index Terms—Bitsback coding, ensemble learning, hierarchical latent variable models, minimum description length, variational Bayesian learning. I.
Accelerating cyclic update algorithms for parameter estimation by pattern searches
 Neural Processing Letters
"... Abstract. A popular strategy for dealing with large parameter estimation problems is to split the problem into manageable subproblems and solve them cyclically one by one until convergence. A wellknown drawback of this strategy is slow convergence in low noise conditions. We propose using socalled ..."
Abstract

Cited by 17 (9 self)
 Add to MetaCart
(Show Context)
Abstract. A popular strategy for dealing with large parameter estimation problems is to split the problem into manageable subproblems and solve them cyclically one by one until convergence. A wellknown drawback of this strategy is slow convergence in low noise conditions. We propose using socalled pattern searches which consist of an exploratory phase followed by a line search. During the exploratory phase, a search direction is determined by combining the individual updates of all subproblems. The approach can be used to speed up several wellknown learning methods such as variational Bayesian learning (ensemble learning) and expectationmaximization algorithm with modest algorithmic modifications. Experimental results show that the proposed method is able to reduce the required convergence time by 60–85 % in realistic variational Bayesian learning problems.