Results 1  10
of
18
Efficient approximations for the marginal likelihood of Bayesian networks with hidden variables
 Machine Learning
, 1997
"... We discuss Bayesian methods for learning Bayesian networks when data sets are incomplete. In particular, we examine asymptotic approximations for the marginal likelihood of incomplete data given a Bayesian network. We consider the Laplace approximation and the less accurate but more efficient BIC/MD ..."
Abstract

Cited by 176 (11 self)
 Add to MetaCart
We discuss Bayesian methods for learning Bayesian networks when data sets are incomplete. In particular, we examine asymptotic approximations for the marginal likelihood of incomplete data given a Bayesian network. We consider the Laplace approximation and the less accurate but more efficient BIC/MDL approximation. We also consider approximations proposed by Draper (1993) and Cheeseman and Stutz (1995). These approximations are as efficient as BIC/MDL, but their accuracy has not been studied in any depth. We compare the accuracy of these approximations under the assumption that the Laplace approximation is the most accurate. In experiments using synthetic data generated from discrete naiveBayes models having a hidden root node, we find that (1) the BIC/MDL measure is the least accurate, having a bias in favor of simple models, and (2) the Draper and CS measures are the most accurate. 1
Inferring Parameters and Structure of Latent Variable Models by Variational Bayes
, 1999
"... Current methods for learning graphical models with latent variables and a fixed structure estimate optimal values for the model parameters. Whereas this approach usually produces overfitting and suboptimal generalization performance, carrying out the Bayesian program of computing the full posterior ..."
Abstract

Cited by 136 (1 self)
 Add to MetaCart
Current methods for learning graphical models with latent variables and a fixed structure estimate optimal values for the model parameters. Whereas this approach usually produces overfitting and suboptimal generalization performance, carrying out the Bayesian program of computing the full posterior distributions over the parameters remains a difficult problem. Moreover, learning the structure of models with latent variables, for which the Bayesian approach is crucial, is yet a harder problem. In this paper I present the Variational Bayes framework, which provides a solution to these problems. This approach approximates full posterior distributions over model parameters and structures, as well as latent variables, in an analytical manner without resorting to sampling methods. Unlike in the Laplace approximation, these posteriors are generally nonGaussian and no Hessian needs to be computed. The resulting algorithm generalizes the standard Expectation Maximization a...
Ensemble Learning for Hidden Markov Models
, 1997
"... The standard method for training Hidden Markov Models optimizes a point estimate of the model parameters. This estimate, which can be viewed as the maximum of a posterior probability density over the model parameters, may be susceptible to overfitting, and contains no indication of parameter uncerta ..."
Abstract

Cited by 80 (0 self)
 Add to MetaCart
The standard method for training Hidden Markov Models optimizes a point estimate of the model parameters. This estimate, which can be viewed as the maximum of a posterior probability density over the model parameters, may be susceptible to overfitting, and contains no indication of parameter uncertainty. Also, this maximummay be unrepresentative of the posterior probability distribution. In this paper we study a method in which we optimize an ensemble which approximates the entire posterior probability distribution. The ensemble learning algorithm requires the same resources as the traditional BaumWelch algorithm. The traditional training algorithm for hidden Markov models is an expectation maximization (EM) algorithm (Dempster et al. 1977) known as the BaumWelch algorithm. It is a maximum likelihood method, or, with a simple modification, a penalized maximum likelihood method, which can be viewed as maximizing a posterior probability density over the model parameters. Recently, ...
Ensemble learning for independent component analysis
 in Advances in Independent Component Analysis
, 2000
"... i Abstract This thesis is concerned with the problem of Blind Source Separation. Specifically we considerthe Independent Component Analysis (ICA) model in which a set of observations are modelled by xt = Ast: (1) where A is an unknown mixing matrix and st is a vector of hidden source components atti ..."
Abstract

Cited by 49 (2 self)
 Add to MetaCart
i Abstract This thesis is concerned with the problem of Blind Source Separation. Specifically we considerthe Independent Component Analysis (ICA) model in which a set of observations are modelled by xt = Ast: (1) where A is an unknown mixing matrix and st is a vector of hidden source components attime t. The ICA problem is to find the sources given only a set of observations. In chapter 1, the blind source separation problem is introduced. In chapter 2 the methodof Ensemble Learning is explained. Chapter 3 applies Ensemble Learning to the ICA model and chapter 4 assesses the use of Ensemble Learning for model selection.Chapters 57 apply the Ensemble Learning ICA algorithm to data sets from physics (a medical imaging data set consisting of images of a tooth), biology (data sets from cDNAmicroarrays) and astrophysics (Planck image separation and galaxy spectra separation).
A family of algorithms for approximate Bayesian inference
, 2001
"... One of the major obstacles to using Bayesian methods for pattern recognition has been its computational expense. This thesis presents an approximation technique that can perform Bayesian inference faster and more accurately than previously possible. This method, "Expectation Propagation," unifies an ..."
Abstract
 Add to MetaCart
One of the major obstacles to using Bayesian methods for pattern recognition has been its computational expense. This thesis presents an approximation technique that can perform Bayesian inference faster and more accurately than previously possible. This method, "Expectation Propagation," unifies and generalizes two previous techniques: assumeddensity filtering, an extension of the Kalman filter, and loopy belief propagation, an extension of belief propagation in Bayesian networks. The unification shows how both of these algorithms can be viewed as approximating the true posterior distribution with a simpler distribution, which is close in the sense of KLdivergence. Expectation Propagation exploits the best of both algorithms: the generality of assumeddensity filtering and the accuracy of loopy belief propagation. Loopy belief propagation, because it propagates exact belief states, is useful for limited types of belief networks, such as purely discrete networks. Expectation Propagati...
Generalized Additive Models with Spatiotemporal Data Xiangming Fang
"... Summary. Generalized additive models (GAMs) have been widely used. While the procedure for fitting a generalized additive model to independent data has been well established, not as much work has been done when the data are correlated. The currently available methods are not completely satisfactory ..."
Abstract
 Add to MetaCart
Summary. Generalized additive models (GAMs) have been widely used. While the procedure for fitting a generalized additive model to independent data has been well established, not as much work has been done when the data are correlated. The currently available methods are not completely satisfactory in practice. A new approach is proposed to fit generalized additive models with spatiotemporal data via the penalized likelihood approach which estimates the smooth functions and covariance parameters by iteratively maximizing the penalized log likelihood. Both maximum likelihood (ML) and restricted maximum likelihood (REML) estimation schemes are developed. Also, conditions for asymptotic posterior normality are investigated for the case of separable spatiotemporal data with fixed spatial covariate structure and no temporal dependence. We propose a new model selection criterion for comparing models with and without spatial correlation. The proposed methods are illustrated by both simulation study and real data analysis.
Discriminative Complexity Control and Linear Projections for Large Vocabulary Speech Recognition
, 2005
"... Selecting the optimal model structure with the “appropriate” complexity is a standard problem for training large vocabulary continuous speech recognition (LVCSR) systems, and machine learning in general. Stateoftheart LVCSR systems are highly complex. A wide variety of techniques may be used wh ..."
Abstract
 Add to MetaCart
Selecting the optimal model structure with the “appropriate” complexity is a standard problem for training large vocabulary continuous speech recognition (LVCSR) systems, and machine learning in general. Stateoftheart LVCSR systems are highly complex. A wide variety of techniques may be used which alter the system complexity and word error rate (WER). Explicitly evaluating systems for all possible configurations is infeasible. Automatic model complexity control criteria are needed. Most existing complexity control schemes can be classified into two types, Bayesian learning techniques and information theory approaches. An implicit assumption is made in both that increasing the likelihood on heldout data decreases the WER. However, this correlation is found to be quite weak for current speech recognition systems. Hence it is preferable to employ discriminative methods for complexity control. In this thesis a novel discriminative model selection technique, the marginalization of a discriminative growth function, is presented. This is a closer approximation to the true WER than standard likelihood based approaches. The number of Gaussian components and feature dimensions of an HMM based LVCSR system is controlled. Experimental results on a wide rage of LVCSR tasks showed that
Solving the Probabilistic Decoding Problems Using Evolutionary Computation Techniques
"... Abstract: The number of inference problems that can be tackled using the probabilistic inference methods is enormous. This paper introduces a new algorithm for solving the probabilistic inference model of the block decoding problem of linear codes. This problem arises on a class of stream ciphers an ..."
Abstract
 Add to MetaCart
Abstract: The number of inference problems that can be tackled using the probabilistic inference methods is enormous. This paper introduces a new algorithm for solving the probabilistic inference model of the block decoding problem of linear codes. This problem arises on a class of stream ciphers and in the decoding of error correcting codes. The proposed technique is based on Evolutionary Algorithms (EAs). This proposed technique overcomes the main drawbacks of the other known techniques. The performance of this new probabilistic inference technique was tested and the results showed the superiority of this new technique over the wellknown techniques
Numerical Fault Models: a Bayesian Framework
"... Abstract. We developed a robust Bayesian inversion scheme to plan and analyze laboratory creep compaction experiments. We chose a simple creep law that features the main parameters of interest when trying to identify ratecontrolling mechanisms from experimental data. By integrating the chosen creep ..."
Abstract
 Add to MetaCart
Abstract. We developed a robust Bayesian inversion scheme to plan and analyze laboratory creep compaction experiments. We chose a simple creep law that features the main parameters of interest when trying to identify ratecontrolling mechanisms from experimental data. By integrating the chosen creep law or an approximation thereof, one can use all the data, either simultaneously or in overlapping subsets, thus making more complete use of the experiment data and propagating statistical variations in the data through to the final rate constants. Despite the nonlinearity of the problem, with this technique one can retrieve accurate estimates of both the stress exponent and the activation energy, even when the porosity time series data are noisy. Whereas adding observation points and/or experiments reduces the uncertainty on all parameters, enlarging the range of temperature or effective stress significantly reduces the covariance between stress exponent and activation energy. We apply this methodology to hydrothermal creep compaction data on quartz to obtain a quantitative, semiempirical law for fault zone compaction in the interseismic period. Incorporating this law into a simple direct rupture model, we find marginal distributions of the time to failure that are robust with respect to errors in the initial fault zone porosity. 1.