Results 1 - 10
of
41
Improved learning of Bayesian networks
- Proc. of the Conf. on Uncertainty in Artificial Intelligence
, 2001
"... Two or more Bayesian network structures are Markov equivalent when the corresponding acyclic digraphs encode the same set of conditional independencies. Therefore, the search space of Bayesian network structures may be organized in equivalence classes, where each of them represents a different set o ..."
Abstract
-
Cited by 33 (6 self)
- Add to MetaCart
Two or more Bayesian network structures are Markov equivalent when the corresponding acyclic digraphs encode the same set of conditional independencies. Therefore, the search space of Bayesian network structures may be organized in equivalence classes, where each of them represents a different set of conditional independencies. The collection of sets of conditional independencies obeys a partial order, the so-called “inclusion order.” This paper discusses in depth the role that the inclusion order plays in learning the structure of Bayesian networks. In particular, this role involves the way a learning algorithm traverses the search space. We introduce a condition for traversal operators, the inclusion boundary condition, which, when it is satisfied, guarantees that the search strategy can avoid local maxima. This is proved under the assumptions that the data is sampled from a probability distribution which is faithful to an acyclic digraph, and the length of the sample is unbounded. The previous discussion leads to the design of a new traversal operator and two new learning algorithms in the context of heuristic search and the Markov Chain Monte Carlo method. We carry out a set of experiments with synthetic and real-world data that show empirically the benefit of striving for the inclusion order when learning Bayesian networks from data.
Tractable Bayesian Learning of Tree Belief Networks
, 2000
"... In this paper we present decomposable priors, a family of priors over structure and parameters of tree belief nets for which Bayesian learning with complete observations is tractable, in the sense that the posterior is also decomposable and can be completely determined analytically in polynomial tim ..."
Abstract
-
Cited by 33 (1 self)
- Add to MetaCart
In this paper we present decomposable priors, a family of priors over structure and parameters of tree belief nets for which Bayesian learning with complete observations is tractable, in the sense that the posterior is also decomposable and can be completely determined analytically in polynomial time. This follows from two main results: First, we show that factored distributions over spanning trees in a graph can be integrated in closed form. Second, we examine priors over tree parameters and show that a set of assumptions similar to (Heckerman and al., 1995) constrain the tree parameter priors to be a compactly parametrized product of Dirichlet distributions. Besides allowing for exact Bayesian learning, these results permit us to formulate a new class of tractable latent variable models in which the likelihood of a data point is computed through an ensemble average over tree structures. 1 Introduction In the framework of graphical models, tree distributions stand out by their spec...
Efficient stepwise selection in decomposable models
- In Proc. UAI
, 2001
"... In this paper, we present an efficient algorithm for performing stepwise selection in the class of decomposable models. We focus on the forward selection procedure, but we also discuss how backward selection and the combination of the two can be performed efficiently. The main contributions of this ..."
Abstract
-
Cited by 28 (2 self)
- Add to MetaCart
In this paper, we present an efficient algorithm for performing stepwise selection in the class of decomposable models. We focus on the forward selection procedure, but we also discuss how backward selection and the combination of the two can be performed efficiently. The main contributions of this paper are (1) a simple characterization for the edges that can be added to a decomposable model while retaining its decomposability and (2) an efficient algorithm for enumerating all such edges for a given decomposable model in O(n2) time, where n is the number of variables in the model. We also analyze the complexity of the overall stepwise selection procedure (which includes the complexity of enumerating eligible edges as well as the complexity of deciding how to “progress”). We use the KL divergence of the model from the saturated model as our metric, but the results we present here extend to many other metrics as well. 1
Modeling changing dependency structure in multivariate time series
- In International Conference in Machine Learning
, 2007
"... We show how to apply the efficient Bayesian changepoint detection techniques of Fearnhead in the multivariate setting. We model the joint density of vector-valued observations using undirected Gaussian graphical models, whose structure we estimate. We show how we can exactly compute the MAP segmenta ..."
Abstract
-
Cited by 23 (0 self)
- Add to MetaCart
We show how to apply the efficient Bayesian changepoint detection techniques of Fearnhead in the multivariate setting. We model the joint density of vector-valued observations using undirected Gaussian graphical models, whose structure we estimate. We show how we can exactly compute the MAP segmentation, as well as how to draw perfect samples from the posterior over segmentations, simultaneously accounting for uncertainty about the number and location of changepoints, as well as uncertainty about the covariance structure. We illustrate the technique by applying it to financial data and to bee tracking data. 1.
Feature-inclusion stochastic search for gaussian graphical models
, 2007
"... We describe a serial algorithm called feature-inclusion stochastic search, or FINCS, that uses online estimates of edge-inclusion probabilities to inform the process of Bayesian model determination in Gaussian graphical models. FINCS is compared to Metropolis-based search methods and found to be sup ..."
Abstract
-
Cited by 17 (3 self)
- Add to MetaCart
We describe a serial algorithm called feature-inclusion stochastic search, or FINCS, that uses online estimates of edge-inclusion probabilities to inform the process of Bayesian model determination in Gaussian graphical models. FINCS is compared to Metropolis-based search methods and found to be superior along a variety of dimensions, leading to more accurate and less volatile estimates of edge-inclusion probabilities and greater speed in finding good models. Though FINCS is conceived as a method for characterizing model uncertainty in moderate-dimensional problems, we also find that it performs well as a stochastic hill-climber in bigger problems. We illustrate its use on an example involving mutual-fund data, where we compare the model-averaged predictive performance of models discovered with FINCS to those discovered with the Metropolis algorithm.
Efficient Model Determination for Discrete Graphical Models
, 2000
"... We present a novel methodology for bayesian model determination in discrete decomposable graphical models. We assign, for each given graph, a Hyper Dirichlet distribution on the matrix of cell probabilities. To ensure compatibility across models such prior distributions are obtained by marginalis ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
We present a novel methodology for bayesian model determination in discrete decomposable graphical models. We assign, for each given graph, a Hyper Dirichlet distribution on the matrix of cell probabilities. To ensure compatibility across models such prior distributions are obtained by marginalisation from the prior conditional on the complete graph. This leads to a prior distribution automatically satisfying the hyperconsistency criterion. Our contribution is twofold. On one hand we improve an existing methodology, the MC 3 algorithm by Madigan and York (1995). On the other hand we introduce an original methodology based on the use of the Reversible jump sampler by Green (1995) and Giudici and Green (1999). Legal movement, that is leading to a decomposable graph, are identied making use of the junction tree representation of the considered graph. Keywords: Bayesian model selection; Contingency table; Dirichlet distribution; Hyper Markov distribution; Junction tree; Marko...
Convergence Assessment for Reversible Jump MCMC Simulations
, 1998
"... In this paper we introduce the problem of assessing convergence of reversible jump MCMC algorithms on the basis of simulation output. We discuss the various direct approaches which could be employed, together with their associated drawbacks. Using the example of fitting a graphical Gaussian model vi ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
In this paper we introduce the problem of assessing convergence of reversible jump MCMC algorithms on the basis of simulation output. We discuss the various direct approaches which could be employed, together with their associated drawbacks. Using the example of fitting a graphical Gaussian model via RJMCMC, we show how the simulation output for models which can be parameterised so that parameters of primary interest retain a coherent interpretation throughout the simulation, can be used to assess convergence. In the context of this example, we extend the work of Gelman and Rubin (1992) and Brooks and Gelman (1998), to provide convergence assessment procedures for graphical model determination problems, but which may be applied to any form of model choice problem and, indeed, MCMC simulations more generally.
Simulation of hyper-inverse Wishart distributions in graphical models
, 2007
"... We introduce and exemplify an efficient method for direct sampling from hyperinverse Wishart distributions. The method relies very naturally on the use of standard junction-tree representation of graphs, and couples these with matrix results for inverse Wishart distributions. We describe the theory ..."
Abstract
-
Cited by 9 (3 self)
- Add to MetaCart
We introduce and exemplify an efficient method for direct sampling from hyperinverse Wishart distributions. The method relies very naturally on the use of standard junction-tree representation of graphs, and couples these with matrix results for inverse Wishart distributions. We describe the theory and resulting computational algorithms for both decomposable and nondecomposable graphical models. An example drawn from financial time series demonstrates application in a context where inferences on a structured covariance model are required. We discuss and investigate questions of scalability of the simulation methods to higher-dimensional distributions. The paper concludes with general comments about the approach, including its use in connection with existing Markov chain Monte Carlo methods that deal with uncertainty about the graphical model structure.

