Results 1  10
of
10
Operations for Learning with Graphical Models
 Journal of Artificial Intelligence Research
, 1994
"... This paper is a multidisciplinary review of empirical, statistical learning from a graphical model perspective. Wellknown examples of graphical models include Bayesian networks, directed graphs representing a Markov chain, and undirected networks representing a Markov field. These graphical models ..."
Abstract

Cited by 253 (12 self)
 Add to MetaCart
This paper is a multidisciplinary review of empirical, statistical learning from a graphical model perspective. Wellknown examples of graphical models include Bayesian networks, directed graphs representing a Markov chain, and undirected networks representing a Markov field. These graphical models are extended to model data analysis and empirical learning using the notation of plates. Graphical operations for simplifying and manipulating a problem are provided including decomposition, differentiation, and the manipulation of probability models from the exponential family. Two standard algorithm schemas for learning are reviewed in a graphical framework: Gibbs sampling and the expectation maximization algorithm. Using these operations and schemas, some popular algorithms can be synthesized from their graphical specification. This includes versions of linear regression, techniques for feedforward networks, and learning Gaussian and discrete Bayesian networks from data. The paper conclu...
Efficient approximations for the marginal likelihood of Bayesian networks with hidden variables
 Machine Learning
, 1997
"... We discuss Bayesian methods for learning Bayesian networks when data sets are incomplete. In particular, we examine asymptotic approximations for the marginal likelihood of incomplete data given a Bayesian network. We consider the Laplace approximation and the less accurate but more efficient BIC/MD ..."
Abstract

Cited by 183 (12 self)
 Add to MetaCart
(Show Context)
We discuss Bayesian methods for learning Bayesian networks when data sets are incomplete. In particular, we examine asymptotic approximations for the marginal likelihood of incomplete data given a Bayesian network. We consider the Laplace approximation and the less accurate but more efficient BIC/MDL approximation. We also consider approximations proposed by Draper (1993) and Cheeseman and Stutz (1995). These approximations are as efficient as BIC/MDL, but their accuracy has not been studied in any depth. We compare the accuracy of these approximations under the assumption that the Laplace approximation is the most accurate. In experiments using synthetic data generated from discrete naiveBayes models having a hidden root node, we find that (1) the BIC/MDL measure is the least accurate, having a bias in favor of simple models, and (2) the Draper and CS measures are the most accurate. 1
A Guide to the Literature on Learning Probabilistic Networks From Data
, 1996
"... This literature review discusses different methods under the general rubric of learning Bayesian networks from data, and includes some overlapping work on more general probabilistic networks. Connections are drawn between the statistical, neural network, and uncertainty communities, and between the ..."
Abstract

Cited by 179 (0 self)
 Add to MetaCart
This literature review discusses different methods under the general rubric of learning Bayesian networks from data, and includes some overlapping work on more general probabilistic networks. Connections are drawn between the statistical, neural network, and uncertainty communities, and between the different methodological communities, such as Bayesian, description length, and classical statistics. Basic concepts for learning and Bayesian networks are introduced and methods are then reviewed. Methods are discussed for learning parameters of a probabilistic network, for learning the structure, and for learning hidden variables. The presentation avoids formal definitions and theorems, as these are plentiful in the literature, and instead illustrates key concepts with simplified examples. Keywords Bayesian networks, graphical models, hidden variables, learning, learning structure, probabilistic networks, knowledge discovery. I. Introduction Probabilistic networks or probabilistic gra...
Distributed sequential Bayesian estimation of a diffusive source in wireless sensor networks
 IEEE Trans. Signal Process
, 2007
"... Abstract—We develop an efficient distributed sequential Bayesian estimation method for applications relating to diffusive sources—localizing a diffusive source, determining its spacetime concentration distribution, and predicting its cloud envelope evolution using wireless sensor networks. Potentia ..."
Abstract

Cited by 11 (0 self)
 Add to MetaCart
(Show Context)
Abstract—We develop an efficient distributed sequential Bayesian estimation method for applications relating to diffusive sources—localizing a diffusive source, determining its spacetime concentration distribution, and predicting its cloud envelope evolution using wireless sensor networks. Potential applications include security, environmental and industrial monitoring, as well as pollution control. We first derive the physical model of the substance dispersion by solving the diffusion equations under different environment scenarios and then integrate the physical model into the distributed processing technologies. We propose a distributed sequential Bayesian estimation method in which the state belief is transmitted in the wireless sensor networks and updated using the measurements from the new sensor node. We propose two belief representation methods: a Gaussian density approximation and a new LPG function (linear combination of polynomial Gaussian density functions) approximation. These approximations are suitable for the distributed processing in wireless sensor networks and are applicable to different sensor network situations. We implement the idea of informationdriven sensor collaboration and select the next sensor node according to certain criterions, which provides an optimal subset and an optimal order of incorporating the measurements into our belief update, reduces response time, and saves energy consumption of the sensor network. Numerical examples demonstrate the effectiveness and efficiency of the proposed methods. Index Terms—Diffusive source, distributed estimation, sensor node scheduling, sequential Bayesian method, wireless sensor networks. I.
Information Fusion, Causal Probabilistic Network And Probanet II: Inference Algorithms and Probanet System
 Proc. 1st Intl. Workshop on Image Analysis and Information Fusion
, 1997
"... As an extension of an overview paper [Pan and McMichael, 1997] on information fusion and Causal Probabilistic Networks (CPN), this paper formalizes kernel algorithms for probabilistic inferences upon CPNs. Information fusion is realized through updating joint probabilities of the variables upon the ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
As an extension of an overview paper [Pan and McMichael, 1997] on information fusion and Causal Probabilistic Networks (CPN), this paper formalizes kernel algorithms for probabilistic inferences upon CPNs. Information fusion is realized through updating joint probabilities of the variables upon the arrival of new evidences or new hypotheses. Kernel algorithms for some dominant methods of inferences are formalized from discontiguous, mathematicsoriented literatures, with gaps lled in with regards to computability and completeness. In particular, possible optimizations on causal tree algorithm, graph triangulation and junction tree algorithm are discussed. Probanet has been designed and developed as a generic shell, or say, mother system for CPN construction and application. The design aspects and current status of Probanet are described. A few directions for research and system development are pointed out, including hierarchical structuring of network, structure decomposition and adaptive inference algorithms. This paper thus has a nature of integration including literature review, algorithm formalization and future perspective.
EÆcient Approximations for the Marginal Likelihood of Bayesian Networks with Hidden Variables
"... Abstract. We discuss Bayesian methods for model averaging and model selection among Bayesiannetwork models with hidden variables. In particular, we examine largesample approximations for the marginal likelihood of naiveBayes models in which the root node is hidden. Such models are useful for clu ..."
Abstract
 Add to MetaCart
Abstract. We discuss Bayesian methods for model averaging and model selection among Bayesiannetwork models with hidden variables. In particular, we examine largesample approximations for the marginal likelihood of naiveBayes models in which the root node is hidden. Such models are useful for clustering or unsupervised learning. We consider a Laplace approximation and the less accurate but more computationally eÆcient approximation known as the Bayesian Information Criterion (BIC), which is equivalent to Rissanen's (1987) MinimumDescription Length (MDL). Also, we consider approximations that ignore some odiagonal elements of the observed information matrix and an approximation proposed by Cheeseman and Stutz (1995). We evaluate the accuracy of these approximations using a MonteCarlo gold standard. In experiments with articial and real examples, we nd that (1) none of the approximations are accurate when used for model averaging, (2) all of the approximations, with the exception of BIC/MDL, are accurate for model selection, (3) among the accurate approximations, the Cheeseman{Stutz and Diagonal approximations are the most computationally eÆcient, (4) all of the approximations, with the exception of BIC/MDL, can be sensitive to the prior distribution over model parameters, and (5) the Cheeseman{Stutz approximation can be more accurate than the other approximations, including the Laplace approximation, in situations where the parameters in the maximum a posteriori conguration are near a boundary.
Speech Enhancement Using Gaussian Scale Mixture Models
"... Abstract—This paper presents a novel probabilistic approach to speech enhancement. Instead of a deterministic logarithmic relationship, we assume a probabilistic relationship between the frequency coefficients and the logspectra. The speech model in the logspectral domain is a Gaussian mixture mod ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract—This paper presents a novel probabilistic approach to speech enhancement. Instead of a deterministic logarithmic relationship, we assume a probabilistic relationship between the frequency coefficients and the logspectra. The speech model in the logspectral domain is a Gaussian mixture model (GMM). The frequency coefficients obey a zeromean Gaussian whose covariance equals to the exponential of the logspectra. This results in a Gaussian scale mixture model (GSMM) for the speech signal in the frequency domain, since the logspectra can be regarded as scaling factors. The probabilistic relation between frequency coefficients and logspectra allows these to be treated as two random variables, both to be estimated from the noisy signals. Expectationmaximization (EM) was used to train the GSMM and Bayesian inference was used to compute the posterior signal distribution. Because exact inference of this full probabilistic model is computationally intractable, we developed two approaches to enhance the efficiency: the Laplace method and a variational approximation. The proposed methods were applied to enhance speech corrupted by Gaussian noise and speechshaped noise (SSN). For both approximations, signals reconstructed from the estimated frequency coefficients provided higher signaltonoise ratio (SNR) and those reconstructed from the estimated logspectra produced lower word recognition error rate because the logspectra fit the inputs to the recognizer better. Our algorithms effectively reduced the SSN, which algorithms based on spectral analysis were not able to suppress. Index Terms—Gaussian scale mixture model (GSMM), Laplace method, speech enhancement, variational approximation. I.
Speech Enhancement, Gain, and Noise Spectrum Adaptation Using Approximate Bayesian Estimation
"... Abstract—This paper presents a new approximate Bayesian estimator for enhancing a noisy speech signal. The speech model is assumed to be a Gaussian mixture model (GMM) in the logspectral domain. This is in contrast to most current models in frequency domain. Exact signal estimation is a computation ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract—This paper presents a new approximate Bayesian estimator for enhancing a noisy speech signal. The speech model is assumed to be a Gaussian mixture model (GMM) in the logspectral domain. This is in contrast to most current models in frequency domain. Exact signal estimation is a computationally intractable problem. We derive three approximations to enhance the efficiency of signal estimation. The Gaussian approximation transforms the logspectral domain GMM into the frequency domain using minimal Kullback–Leiber (KL)divergency criterion. The frequency domain Laplace method computes the maximum a posteriori (MAP) estimator for the spectral amplitude. Correspondingly, the logspectral domain Laplace method computes the MAP estimator for the logspectral amplitude. Further, the gain and noise spectrum adaptation are implemented using the expectation–maximization (EM) algorithm within the GMM under Gaussian approximation. The proposed algorithms are evaluated by applying them to enhance the speeches corrupted by the speechshaped noise (SSN). The experimental results demonstrate that the proposed algorithms offer improved signaltonoise ratio, lower word recognition error rate, and less spectral distortion.
Approximate Inference
, 2008
"... When we left off with the Joint Tree Algorithm and the MaxSum Algorithm last class, we had crafted “messages ” to transverse a treestructured graphical model in order to calculate marginal and joint distributions. We are interested in finding p(zx) when p(x) is given as shown below. z x Figure 1. ..."
Abstract
 Add to MetaCart
(Show Context)
When we left off with the Joint Tree Algorithm and the MaxSum Algorithm last class, we had crafted “messages ” to transverse a treestructured graphical model in order to calculate marginal and joint distributions. We are interested in finding p(zx) when p(x) is given as shown below. z x Figure 1. Graph representing both hidden (clear) and observed (shaded) variables with their conditional dependance indicated by the arrow. In this case, x is our “observed variable ” and z is our “given variable ” for which we wish to make some inference. While we would normally wish to make some sort of exact inference about p(zx), this problem is often either impossible to solve or the required algorithm is intractable. The next few lectures will focus on deterministic approximations to a pdf and then we will move on to stochastic approximations. The general hierarchy of approximation techniques is given here for reference.