Results 1 - 10
of
22
Model selection and accounting for model uncertainty in graphical models using Occam's window
, 1993
"... We consider the problem of model selection and accounting for model uncertainty in high-dimensional contingency tables, motivated by expert system applications. The approach most used currently is a stepwise strategy guided by tests based on approximate asymptotic P-values leading to the selection o ..."
Abstract
-
Cited by 214 (42 self)
- Add to MetaCart
We consider the problem of model selection and accounting for model uncertainty in high-dimensional contingency tables, motivated by expert system applications. The approach most used currently is a stepwise strategy guided by tests based on approximate asymptotic P-values leading to the selection of a single model; inference is then conditional on the selected model. The sampling properties of such a strategy are complex, and the failure to take account of model uncertainty leads to underestimation of uncertainty about quantities of interest. In principle, a panacea is provided by the standard Bayesian formalism which averages the posterior distributions of the quantity of interest under each of the models, weighted by their posterior model probabilities. Furthermore, this approach is optimal in the sense of maximising predictive ability. However, this has not been used in practice because computing the posterior model probabilities is hard and the number of models is very large (often greater than 1011). We argue that the standard Bayesian formalism is unsatisfactory and we propose an alternative Bayesian approach that, we contend, takes full account of the true model uncertainty byaveraging overamuch smaller set of models. An efficient search algorithm is developed for nding these models. We consider two classes of graphical models that arise in expert systems: the recursive causal models and the decomposable
A characterization of Markov equivalence classes for acyclic digraphs
, 1995
"... Undirected graphs and acyclic digraphs (ADGs), as well as their mutual extension to chain graphs, are widely used to describe dependencies among variables in multivariate distributions. In particular, the likelihood functions of ADG models admit convenient recursive factorizations that often allow e ..."
Abstract
-
Cited by 71 (7 self)
- Add to MetaCart
Undirected graphs and acyclic digraphs (ADGs), as well as their mutual extension to chain graphs, are widely used to describe dependencies among variables in multivariate distributions. In particular, the likelihood functions of ADG models admit convenient recursive factorizations that often allow explicit maximum likelihood estimates and that are well suited to building Bayesian networks for expert systems. Whereas the undirected graph associated with a dependence model is uniquely determined, there may, however, be many ADGs that determine the same dependence ( = Markov) model. Thus, the family of all ADGs with a given set of vertices is naturally partitioned into Markov-equivalence classes, each class being associated with a unique statistical model. Statistical procedures, such as model selection or model averaging, that fail to take into account these equivalence classes, may incur substantial computational or other inefficiencies. Here it is shown that each Markov-equivalence class is uniquely determined by a single chain graph, the essential graph, that is itself simultaneously Markov equivalent to all ADGs in the equivalence class. Essential graphs are characterized, a polynomial-time algorithm for their construction is given, and their applications to model selection and other statistical
Decomposable Graphical Gaussian Model Determination
, 1999
"... We propose a methodology for Bayesian model determination in decomposable graphical gaussian models. To achieve this aim we consider a hyper inverse Wishart prior distribution on the concentration matrix for each given graph. To ensure compatibility across models, such prior distributions are obt ..."
Abstract
-
Cited by 53 (9 self)
- Add to MetaCart
We propose a methodology for Bayesian model determination in decomposable graphical gaussian models. To achieve this aim we consider a hyper inverse Wishart prior distribution on the concentration matrix for each given graph. To ensure compatibility across models, such prior distributions are obtained by marginalisation from the prior conditional on the complete graph. We explore alternative structures for the hyperparameters of the latter, and their consequences for the model. Model determination is carried out by implementing a reversible jump MCMC sampler. In particular, the dimension-changing move we propose involves adding or dropping an edge from the graph. We characterise the set of moves which preserve the decomposability of the graph, giving a fast algorithm for maintaining the junction tree representation of the graph at each sweep. As state variable, we propose to use the incomplete variance-covariance matrix, containing only the elements for which the correspondi...
Optimization by learning and simulation of Bayesian and Gaussian networks
, 1999
"... Estimation of Distribution Algorithms (EDA) constitute an example of stochastics heuristics based on populations of individuals every of which encode the possible solutions to the optimization problem. These populations of individuals evolve in succesive generations as the search progresses -- organ ..."
Abstract
-
Cited by 34 (6 self)
- Add to MetaCart
Estimation of Distribution Algorithms (EDA) constitute an example of stochastics heuristics based on populations of individuals every of which encode the possible solutions to the optimization problem. These populations of individuals evolve in succesive generations as the search progresses -- organized in the same way as most evolutionary computation heuristics. In opposition to most evolutionary computation paradigms which consider the crossing and mutation operators as essential tools to generate new populations, EDA replaces those operators by the estimation and simulation of the joint probability distribution of the selected individuals. In this work, after making a review of the different approaches based on EDA for problems of combinatorial optimization as well as for problems of optimization in continuous domains, we propose new approaches based on the theory of probabilistic graphical models to solve problems in both domains. More precisely, we propose to adapt algorit...
Efficient stepwise selection in decomposable models
- In Proc. UAI
, 2001
"... In this paper, we present an efficient algorithm for performing stepwise selection in the class of decomposable models. We focus on the forward selection procedure, but we also discuss how backward selection and the combination of the two can be performed efficiently. The main contributions of this ..."
Abstract
-
Cited by 27 (2 self)
- Add to MetaCart
In this paper, we present an efficient algorithm for performing stepwise selection in the class of decomposable models. We focus on the forward selection procedure, but we also discuss how backward selection and the combination of the two can be performed efficiently. The main contributions of this paper are (1) a simple characterization for the edges that can be added to a decomposable model while retaining its decomposability and (2) an efficient algorithm for enumerating all such edges for a given decomposable model in O(n2) time, where n is the number of variables in the model. We also analyze the complexity of the overall stepwise selection procedure (which includes the complexity of enumerating eligible edges as well as the complexity of deciding how to “progress”). We use the KL divergence of the model from the saturated model as our metric, but the results we present here extend to many other metrics as well. 1
A `Microscopic' Study of Minimum Entropy Search in Learning Decomposable Markov Networks
- MACHINE LEARNING
, 1995
"... Several scoring metrics are used in different search procedures for learning probabilistic networks. We study the properties of cross entropy in learning a decomposable Markov network. Though entropy and related scoring metrics were widely used, its `microscopic' properties and asymptotic behavior i ..."
Abstract
-
Cited by 17 (12 self)
- Add to MetaCart
Several scoring metrics are used in different search procedures for learning probabilistic networks. We study the properties of cross entropy in learning a decomposable Markov network. Though entropy and related scoring metrics were widely used, its `microscopic' properties and asymptotic behavior in a search have not been analyzed. We present such a `microscopic' study of a minimum entropy search algorithm, and show that it learns an I-map of the domain model when the data size is large. Search procedures that modify a network structure one link at a time have been commonly used for efficiency. Our study indicates that a class of domain models cannot be learned by such procedures. This suggests that prior knowledge about the problem domain together with a multi-link search strategy would provide an effective way to uncover many domain models.
Feature-inclusion stochastic search for gaussian graphical models
, 2007
"... We describe a serial algorithm called feature-inclusion stochastic search, or FINCS, that uses online estimates of edge-inclusion probabilities to inform the process of Bayesian model determination in Gaussian graphical models. FINCS is compared to Metropolis-based search methods and found to be sup ..."
Abstract
-
Cited by 16 (3 self)
- Add to MetaCart
We describe a serial algorithm called feature-inclusion stochastic search, or FINCS, that uses online estimates of edge-inclusion probabilities to inform the process of Bayesian model determination in Gaussian graphical models. FINCS is compared to Metropolis-based search methods and found to be superior along a variety of dimensions, leading to more accurate and less volatile estimates of edge-inclusion probabilities and greater speed in finding good models. Though FINCS is conceived as a method for characterizing model uncertainty in moderate-dimensional problems, we also find that it performs well as a stochastic hill-climber in bigger problems. We illustrate its use on an example involving mutual-fund data, where we compare the model-averaged predictive performance of models discovered with FINCS to those discovered with the Metropolis algorithm.

