Results 1  10
of
29
Efficient approximations for the marginal likelihood of Bayesian networks with hidden variables
 Machine Learning
, 1997
"... We discuss Bayesian methods for learning Bayesian networks when data sets are incomplete. In particular, we examine asymptotic approximations for the marginal likelihood of incomplete data given a Bayesian network. We consider the Laplace approximation and the less accurate but more efficient BIC/MD ..."
Abstract

Cited by 183 (12 self)
 Add to MetaCart
We discuss Bayesian methods for learning Bayesian networks when data sets are incomplete. In particular, we examine asymptotic approximations for the marginal likelihood of incomplete data given a Bayesian network. We consider the Laplace approximation and the less accurate but more efficient BIC/MDL approximation. We also consider approximations proposed by Draper (1993) and Cheeseman and Stutz (1995). These approximations are as efficient as BIC/MDL, but their accuracy has not been studied in any depth. We compare the accuracy of these approximations under the assumption that the Laplace approximation is the most accurate. In experiments using synthetic data generated from discrete naiveBayes models having a hidden root node, we find that (1) the BIC/MDL measure is the least accurate, having a bias in favor of simple models, and (2) the Draper and CS measures are the most accurate. 1
Inferring Parameters and Structure of Latent Variable Models by Variational Bayes
, 1999
"... Current methods for learning graphical models with latent variables and a fixed structure estimate optimal values for the model parameters. Whereas this approach usually produces overfitting and suboptimal generalization performance, carrying out the Bayesian program of computing the full posterior ..."
Abstract

Cited by 149 (1 self)
 Add to MetaCart
(Show Context)
Current methods for learning graphical models with latent variables and a fixed structure estimate optimal values for the model parameters. Whereas this approach usually produces overfitting and suboptimal generalization performance, carrying out the Bayesian program of computing the full posterior distributions over the parameters remains a difficult problem. Moreover, learning the structure of models with latent variables, for which the Bayesian approach is crucial, is yet a harder problem. In this paper I present the Variational Bayes framework, which provides a solution to these problems. This approach approximates full posterior distributions over model parameters and structures, as well as latent variables, in an analytical manner without resorting to sampling methods. Unlike in the Laplace approximation, these posteriors are generally nonGaussian and no Hessian needs to be computed. The resulting algorithm generalizes the standard Expectation Maximization a...
Ensemble Learning for Hidden Markov Models
, 1997
"... The standard method for training Hidden Markov Models optimizes a point estimate of the model parameters. This estimate, which can be viewed as the maximum of a posterior probability density over the model parameters, may be susceptible to overfitting, and contains no indication of parameter uncerta ..."
Abstract

Cited by 84 (0 self)
 Add to MetaCart
(Show Context)
The standard method for training Hidden Markov Models optimizes a point estimate of the model parameters. This estimate, which can be viewed as the maximum of a posterior probability density over the model parameters, may be susceptible to overfitting, and contains no indication of parameter uncertainty. Also, this maximummay be unrepresentative of the posterior probability distribution. In this paper we study a method in which we optimize an ensemble which approximates the entire posterior probability distribution. The ensemble learning algorithm requires the same resources as the traditional BaumWelch algorithm. The traditional training algorithm for hidden Markov models is an expectation maximization (EM) algorithm (Dempster et al. 1977) known as the BaumWelch algorithm. It is a maximum likelihood method, or, with a simple modification, a penalized maximum likelihood method, which can be viewed as maximizing a posterior probability density over the model parameters. Recently, ...
Ensemble learning for independent component analysis
 in Advances in Independent Component Analysis
, 2000
"... i Abstract This thesis is concerned with the problem of Blind Source Separation. Specifically we considerthe Independent Component Analysis (ICA) model in which a set of observations are modelled by xt = Ast: (1) where A is an unknown mixing matrix and st is a vector of hidden source components atti ..."
Abstract

Cited by 51 (2 self)
 Add to MetaCart
(Show Context)
i Abstract This thesis is concerned with the problem of Blind Source Separation. Specifically we considerthe Independent Component Analysis (ICA) model in which a set of observations are modelled by xt = Ast: (1) where A is an unknown mixing matrix and st is a vector of hidden source components attime t. The ICA problem is to find the sources given only a set of observations. In chapter 1, the blind source separation problem is introduced. In chapter 2 the methodof Ensemble Learning is explained. Chapter 3 applies Ensemble Learning to the ICA model and chapter 4 assesses the use of Ensemble Learning for model selection.Chapters 57 apply the Ensemble Learning ICA algorithm to data sets from physics (a medical imaging data set consisting of images of a tooth), biology (data sets from cDNAmicroarrays) and astrophysics (Planck image separation and galaxy spectra separation).
Speech Recognition in Adverse Environments: a Probabilistic Approach
, 2002
"... I hereby declare that I am the sole author of this thesis. I authorize the University of Waterloo to lend this thesis to other institutions or individuals for the purpose of scholarly research. I further authorize the University of Waterloo to reproduce this thesis by photocopying or by other mean ..."
Abstract

Cited by 15 (3 self)
 Add to MetaCart
I hereby declare that I am the sole author of this thesis. I authorize the University of Waterloo to lend this thesis to other institutions or individuals for the purpose of scholarly research. I further authorize the University of Waterloo to reproduce this thesis by photocopying or by other means, in total or in part, at the request of other institutions or individuals for the purpose of scholarly research. ii The University of Waterloo requires the signatures of all persons using or photocopying this thesis. Please sign below, and give address and date. iii In this thesis I advocate a probabilistic view of robust speech recognition. I discuss the classification of distorted features using an optimal classifier, and I show how the generation of noisy speech can be represented as a generative graphical probability model. By doing so, my aim is to build a conceptual framework that provides a unified understanding of robust speech recognition, and to some extent bridges the gap between a purely signal processing viewpoint and the pattern classification or decoding viewpoint. The most tangible contribution of this thesis is the introduction of the Algonquin method for robust speech recognition. It exemplifies the probabilistic method and encompasses a number of novel ideas. For example, it uses a probability distribution to describe the relationship between clean speech, noise, channel and the resultant noisy speech. It employs a variational approach to find an approximation to the joint posterior distribution which can be used for the purpose of restoring the distorted observations. It also allows us to estimate the parameters of the environment using a Generalized EM method. Another important contribution of this thesis is a new paradigm for robust speech recognition, which we call uncertainty decoding. This new paradigm follows naturally from the standard way of performing inference in the graphical probability model that describes noisy speech generation. iv
Entropy Search for InformationEfficient Global Optimization
"... Contemporary global optimization algorithms are based on local measures of utility, rather than a probability measure over location and value of the optimum. They thus attempt to collect low function values, not to learn about the optimum. The reason for the absence of probabilistic global optimizer ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
Contemporary global optimization algorithms are based on local measures of utility, rather than a probability measure over location and value of the optimum. They thus attempt to collect low function values, not to learn about the optimum. The reason for the absence of probabilistic global optimizers is that the corresponding inference problem is intractable in several ways. This paper develops desiderata for probabilistic optimization algorithms, then presents a concrete algorithm which addresses each of the computational intractabilities with a sequence of approximations and explicitly addresses the decision problem of maximizing information gain from each evaluation.
Solving the Probabilistic Decoding Problems Using Evolutionary Computation Techniques
"... Abstract: The number of inference problems that can be tackled using the probabilistic inference methods is enormous. This paper introduces a new algorithm for solving the probabilistic inference model of the block decoding problem of linear codes. This problem arises on a class of stream ciphers an ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract: The number of inference problems that can be tackled using the probabilistic inference methods is enormous. This paper introduces a new algorithm for solving the probabilistic inference model of the block decoding problem of linear codes. This problem arises on a class of stream ciphers and in the decoding of error correcting codes. The proposed technique is based on Evolutionary Algorithms (EAs). This proposed technique overcomes the main drawbacks of the other known techniques. The performance of this new probabilistic inference technique was tested and the results showed the superiority of this new technique over the wellknown techniques
Discriminative Complexity Control and Linear Projections for Large Vocabulary Speech Recognition
, 2005
"... Selecting the optimal model structure with the “appropriate” complexity is a standard problem for training large vocabulary continuous speech recognition (LVCSR) systems, and machine learning in general. Stateoftheart LVCSR systems are highly complex. A wide variety of techniques may be used wh ..."
Abstract
 Add to MetaCart
Selecting the optimal model structure with the “appropriate” complexity is a standard problem for training large vocabulary continuous speech recognition (LVCSR) systems, and machine learning in general. Stateoftheart LVCSR systems are highly complex. A wide variety of techniques may be used which alter the system complexity and word error rate (WER). Explicitly evaluating systems for all possible configurations is infeasible. Automatic model complexity control criteria are needed. Most existing complexity control schemes can be classified into two types, Bayesian learning techniques and information theory approaches. An implicit assumption is made in both that increasing the likelihood on heldout data decreases the WER. However, this correlation is found to be quite weak for current speech recognition systems. Hence it is preferable to employ discriminative methods for complexity control. In this thesis a novel discriminative model selection technique, the marginalization of a discriminative growth function, is presented. This is a closer approximation to the true WER than standard likelihood based approaches. The number of Gaussian components and feature dimensions of an HMM based LVCSR system is controlled. Experimental results on a wide rage of LVCSR tasks showed that
numerical
"... [1] We developed a robust Bayesian inversion scheme to plan and analyze laboratory creep compaction experiments. We chose a simple creep law that features the main parameters of interest when trying to identify ratecontrolling mechanisms from experimental data. By integrating the chosen creep law o ..."
Abstract
 Add to MetaCart
(Show Context)
[1] We developed a robust Bayesian inversion scheme to plan and analyze laboratory creep compaction experiments. We chose a simple creep law that features the main parameters of interest when trying to identify ratecontrolling mechanisms from experimental data. By integrating the chosen creep law or an approximation thereof, one can use all the data, either simultaneously or in overlapping subsets, thus making more complete use of the experiment data and propagating statistical variations in the data through to the final rate constants. Despite the nonlinearity of the problem, with this technique one can retrieve accurate estimates of both the stress exponent and the activation energy, even when the porosity time series data are noisy. Whereas adding observation points and/or experiments reduces the uncertainty on all parameters, enlarging the range of temperature or effective stress significantly reduces the covariance between stress exponent and activation energy. We apply this methodology to hydrothermal creep compaction data on quartz to obtain a quantitative, semiempirical law for fault zone compaction in the interseismic period. Incorporating this law into a simple direct rupture model, we find marginal distributions of the time to failure that are robust with respect to