Results 1 
9 of
9
Efficient approximations for the marginal likelihood of Bayesian networks with hidden variables
 Machine Learning
, 1997
"... We discuss Bayesian methods for learning Bayesian networks when data sets are incomplete. In particular, we examine asymptotic approximations for the marginal likelihood of incomplete data given a Bayesian network. We consider the Laplace approximation and the less accurate but more efficient BIC/MD ..."
Abstract

Cited by 178 (10 self)
 Add to MetaCart
We discuss Bayesian methods for learning Bayesian networks when data sets are incomplete. In particular, we examine asymptotic approximations for the marginal likelihood of incomplete data given a Bayesian network. We consider the Laplace approximation and the less accurate but more efficient BIC/MDL approximation. We also consider approximations proposed by Draper (1993) and Cheeseman and Stutz (1995). These approximations are as efficient as BIC/MDL, but their accuracy has not been studied in any depth. We compare the accuracy of these approximations under the assumption that the Laplace approximation is the most accurate. In experiments using synthetic data generated from discrete naiveBayes models having a hidden root node, we find that (1) the BIC/MDL measure is the least accurate, having a bias in favor of simple models, and (2) the Draper and CS measures are the most accurate. 1
Probabilistic independence networks for hidden Markov probability models
, 1996
"... Graphical techniques for modeling the dependencies of random variables have been explored in a variety of different areas including statistics, statistical physics, artificial intelligence, speech recognition, image processing, and genetics. Formalisms for manipulating these models have been develop ..."
Abstract

Cited by 167 (12 self)
 Add to MetaCart
Graphical techniques for modeling the dependencies of random variables have been explored in a variety of different areas including statistics, statistical physics, artificial intelligence, speech recognition, image processing, and genetics. Formalisms for manipulating these models have been developed relatively independently in these research communities. In this paper we explore hidden Markov models (HMMs) and related structures within the general framework of probabilistic independence networks (PINs). The paper contains a selfcontained review of the basic principles of PINs. It is shown that the wellknown forwardbackward (FB) and Viterbi algorithms for HMMs are special cases of more general inference algorithms for arbitrary PINs. Furthermore, the existence of inference and estimation algorithms for more general graphical models provides a set of analysis tools for HMM practitioners who wish to explore a richer class of HMM structures. Examples of relatively complex models to handle sensor fusion and coarticulation in speech recognition are introduced and treated within the graphical model framework to illustrate the advantages of the general approach.
A Bayesian Approach to Causal Discovery
, 1997
"... We examine the Bayesian approach to the discovery of directed acyclic causal models and compare it to the constraintbased approach. Both approaches rely on the Causal Markov assumption, but the two differ significantly in theory and practice. An important difference between the approaches is that t ..."
Abstract

Cited by 79 (1 self)
 Add to MetaCart
We examine the Bayesian approach to the discovery of directed acyclic causal models and compare it to the constraintbased approach. Both approaches rely on the Causal Markov assumption, but the two differ significantly in theory and practice. An important difference between the approaches is that the constraintbased approach uses categorical information about conditionalindependence constraints in the domain, whereas the Bayesian approach weighs the degree to which such constraints hold. As a result, the Bayesian approach has three distinct advantages over its constraintbased counterpart. One, conclusions derived from the Bayesian approach are not susceptible to incorrect categorical decisions about independence facts that can occur with data sets of finite size. Two, using the Bayesian approach, finer distinctions among model structuresboth quantitative and qualitativecan be made. Three, information from several models can be combined to make better inferences and to better ...
Learning mixtures of DAG models
, 1997
"... We describe computationally efficient methods for learning mixtures in which each component is a directed acyclic graphical model (mixtures of DAGs or MDAGs). We argue that simple searchandscore algorithms are infeasible for a variety of problems, and introduce a feasible approach in which paramet ..."
Abstract

Cited by 25 (2 self)
 Add to MetaCart
We describe computationally efficient methods for learning mixtures in which each component is a directed acyclic graphical model (mixtures of DAGs or MDAGs). We argue that simple searchandscore algorithms are infeasible for a variety of problems, and introduce a feasible approach in which parameter and structure search is interleaved and expected data is treated as real data. Our approach can be viewed as a combination of (1) the Cheeseman–Stutz asymptotic approximation for model posterior probability and (2) the Expectation–Maximization algorithm. We evaluate our procedure for selecting among MDAGs on synthetic and real examples. 1
Learning Mixtures of Bayesian Networks
 in Cooper & Moral
, 1997
"... We describe a heuristic method for learning mixtures of Bayesian Networks (MBNs) from possibly incomplete data. The considered class of models is mixtures in which each mixture component is a Bayesian network encoding a conditional Gaussian distribution over a fixed set of variables. Some variables ..."
Abstract

Cited by 21 (1 self)
 Add to MetaCart
We describe a heuristic method for learning mixtures of Bayesian Networks (MBNs) from possibly incomplete data. The considered class of models is mixtures in which each mixture component is a Bayesian network encoding a conditional Gaussian distribution over a fixed set of variables. Some variables may be hidden or otherwise have missing observations. A key idea in our approach is to treat expected data as real data. This allows us to interleave structure and parameter search and to take advantage of closed form approximations for marginal likelihood. In addition, by treating expected data as real data, the search criterion factors by variable, making the search processes more efficient. We evaluate our approach on synthetic and realworld data sets. Keywords : Mixture models, Bayesian networks, structure learning, parameter learning, hidden variables, EM algorithm. 1 Introduction There is growing interest in a class of models for density estimation known as Bayesian networks. In the...
Computationally efficient methods for selecting among mixtures of graphical models
 Bayesian Statistics 6
, 1999
"... We describe computationally efficient methods for Bayesian model selection. The methods select among mixtures in which each component is a directed acyclic graphical model (mixtures of DAGs or MDAGs), and can be applied to data sets in which some of the random variables are not always observed. The ..."
Abstract

Cited by 9 (3 self)
 Add to MetaCart
We describe computationally efficient methods for Bayesian model selection. The methods select among mixtures in which each component is a directed acyclic graphical model (mixtures of DAGs or MDAGs), and can be applied to data sets in which some of the random variables are not always observed. The modelselection criterion that we consider is the posterior probability of the model (structure) given data. Our modelselection problem is difficult because (1) the number of possible model structures grows superexponentially with the number of random variables and (2) missing data necessitates the use of computationally slow approximations of model posterior probability. We argue that simple searchandscore algorithms are infeasible for a variety of problems, and introduce a feasible approach in which parameter and structure search is interleaved and expected data is treated as real data. Our approach can be viewed as the combination of (1) a modified Cheeseman–Stutz asymptotic approximation for model posterior probability and (2) the Expectation–Maximization algorithm. We evaluate our procedure for selecting among MDAGs on synthetic and real examples.
On compatible priors for Bayesian networks
 IEEE Transactions on Pattern Analysis and Machine Intelligence
, 1992
"... AbstractGiven a Bayesian network of discrete random variables with a hyperDirichlet prior, a method is proposed for assigning Dirichlet priors to the conditional probabilities of structurally different networks. It defines a distance measure between priors which is to be minimized for the assignme ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
AbstractGiven a Bayesian network of discrete random variables with a hyperDirichlet prior, a method is proposed for assigning Dirichlet priors to the conditional probabilities of structurally different networks. It defines a distance measure between priors which is to be minimized for the assignment process. Intuitively one would expect that if two models ’ priors are to qualify as being ‘close ’ in some sense, then their posteriors should also be nearby after an observation. However one does not know in advance what will be observed next. Thus we are led to propose an expectation of KullbackLeibler distances over all possible next observations to define a measure of distance between priors. In conjunction with the additional assumptions of globaland local independence of the parameters [15], a number of theorems emerge which are usually taken as reasonable assumptions in the Bayesian network literature. The method is compared to the ’expansion and contraction ’ algorithm of [14], and is also contrasted with the results obtained in [7] who employ the additional assumption of likelihood equivalence which is not made here. A simple example illustrates the technique. Index TermsBayesian networks, Dirichlet priors, KullbackLeibler distance, local independence, global independence.
unknown title
"... Abstract. A Bayesian network is a graphical model that encodes probabilistic relationships among variables of interest. When used in conjunction with statistical techniques, the graphical model has several advantages for data analysis. One, because the model encodes dependencies among all variables, ..."
Abstract
 Add to MetaCart
Abstract. A Bayesian network is a graphical model that encodes probabilistic relationships among variables of interest. When used in conjunction with statistical techniques, the graphical model has several advantages for data analysis. One, because the model encodes dependencies among all variables, it readily handles situations where some data entries are missing. Two, a Bayesian network can be used to learn causal relationships, and hence can be used to gain understanding about a problem domain and to predict the consequences of intervention. Three, because the model has both a causal and probabilistic semantics, it is an ideal representation for combining prior knowledge (which often comes in causal form) and data. Four, Bayesian statistical methods in conjunction with Bayesian networks offer an efficient and principled approach for avoiding the overfitting of data. In this paper, we discuss methods for constructing Bayesian networks from prior knowledge and summarize Bayesian statistical methods for using data to improve these models. With regard to the latter task, we describe methods for learning both the parameters and structure of a Bayesian network, including techniques for learning with incomplete data. In addition, we relate