Results 1  10
of
27
Model selection and accounting for model uncertainty in graphical models using Occam's window
, 1993
"... We consider the problem of model selection and accounting for model uncertainty in highdimensional contingency tables, motivated by expert system applications. The approach most used currently is a stepwise strategy guided by tests based on approximate asymptotic Pvalues leading to the selection o ..."
Abstract

Cited by 266 (46 self)
 Add to MetaCart
We consider the problem of model selection and accounting for model uncertainty in highdimensional contingency tables, motivated by expert system applications. The approach most used currently is a stepwise strategy guided by tests based on approximate asymptotic Pvalues leading to the selection of a single model; inference is then conditional on the selected model. The sampling properties of such a strategy are complex, and the failure to take account of model uncertainty leads to underestimation of uncertainty about quantities of interest. In principle, a panacea is provided by the standard Bayesian formalism which averages the posterior distributions of the quantity of interest under each of the models, weighted by their posterior model probabilities. Furthermore, this approach is optimal in the sense of maximising predictive ability. However, this has not been used in practice because computing the posterior model probabilities is hard and the number of models is very large (often greater than 1011). We argue that the standard Bayesian formalism is unsatisfactory and we propose an alternative Bayesian approach that, we contend, takes full account of the true model uncertainty byaveraging overamuch smaller set of models. An efficient search algorithm is developed for nding these models. We consider two classes of graphical models that arise in expert systems: the recursive causal models and the decomposable
A characterization of Markov equivalence classes for acyclic digraphs
, 1995
"... Undirected graphs and acyclic digraphs (ADGs), as well as their mutual extension to chain graphs, are widely used to describe dependencies among variables in multivariate distributions. In particular, the likelihood functions of ADG models admit convenient recursive factorizations that often allow e ..."
Abstract

Cited by 92 (7 self)
 Add to MetaCart
Undirected graphs and acyclic digraphs (ADGs), as well as their mutual extension to chain graphs, are widely used to describe dependencies among variables in multivariate distributions. In particular, the likelihood functions of ADG models admit convenient recursive factorizations that often allow explicit maximum likelihood estimates and that are well suited to building Bayesian networks for expert systems. Whereas the undirected graph associated with a dependence model is uniquely determined, there may, however, be many ADGs that determine the same dependence ( = Markov) model. Thus, the family of all ADGs with a given set of vertices is naturally partitioned into Markovequivalence classes, each class being associated with a unique statistical model. Statistical procedures, such as model selection or model averaging, that fail to take into account these equivalence classes, may incur substantial computational or other inefficiencies. Here it is shown that each Markovequivalence class is uniquely determined by a single chain graph, the essential graph, that is itself simultaneously Markov equivalent to all ADGs in the equivalence class. Essential graphs are characterized, a polynomialtime algorithm for their construction is given, and their applications to model selection and other statistical
Decomposable Graphical Gaussian Model Determination
, 1999
"... We propose a methodology for Bayesian model determination in decomposable graphical gaussian models. To achieve this aim we consider a hyper inverse Wishart prior distribution on the concentration matrix for each given graph. To ensure compatibility across models, such prior distributions are obt ..."
Abstract

Cited by 64 (12 self)
 Add to MetaCart
We propose a methodology for Bayesian model determination in decomposable graphical gaussian models. To achieve this aim we consider a hyper inverse Wishart prior distribution on the concentration matrix for each given graph. To ensure compatibility across models, such prior distributions are obtained by marginalisation from the prior conditional on the complete graph. We explore alternative structures for the hyperparameters of the latter, and their consequences for the model. Model determination is carried out by implementing a reversible jump MCMC sampler. In particular, the dimensionchanging move we propose involves adding or dropping an edge from the graph. We characterise the set of moves which preserve the decomposability of the graph, giving a fast algorithm for maintaining the junction tree representation of the graph at each sweep. As state variable, we propose to use the incomplete variancecovariance matrix, containing only the elements for which the correspondi...
Optimization by learning and simulation of Bayesian and Gaussian networks
, 1999
"... Estimation of Distribution Algorithms (EDA) constitute an example of stochastics heuristics based on populations of individuals every of which encode the possible solutions to the optimization problem. These populations of individuals evolve in succesive generations as the search progresses  organ ..."
Abstract

Cited by 43 (6 self)
 Add to MetaCart
Estimation of Distribution Algorithms (EDA) constitute an example of stochastics heuristics based on populations of individuals every of which encode the possible solutions to the optimization problem. These populations of individuals evolve in succesive generations as the search progresses  organized in the same way as most evolutionary computation heuristics. In opposition to most evolutionary computation paradigms which consider the crossing and mutation operators as essential tools to generate new populations, EDA replaces those operators by the estimation and simulation of the joint probability distribution of the selected individuals. In this work, after making a review of the different approaches based on EDA for problems of combinatorial optimization as well as for problems of optimization in continuous domains, we propose new approaches based on the theory of probabilistic graphical models to solve problems in both domains. More precisely, we propose to adapt algorit...
Efficient stepwise selection in decomposable models
 In Proc. UAI
, 2001
"... In this paper, we present an efficient algorithm for performing stepwise selection in the class of decomposable models. We focus on the forward selection procedure, but we also discuss how backward selection and the combination of the two can be performed efficiently. The main contributions of this ..."
Abstract

Cited by 30 (2 self)
 Add to MetaCart
In this paper, we present an efficient algorithm for performing stepwise selection in the class of decomposable models. We focus on the forward selection procedure, but we also discuss how backward selection and the combination of the two can be performed efficiently. The main contributions of this paper are (1) a simple characterization for the edges that can be added to a decomposable model while retaining its decomposability and (2) an efficient algorithm for enumerating all such edges for a given decomposable model in O(n2) time, where n is the number of variables in the model. We also analyze the complexity of the overall stepwise selection procedure (which includes the complexity of enumerating eligible edges as well as the complexity of deciding how to “progress”). We use the KL divergence of the model from the saturated model as our metric, but the results we present here extend to many other metrics as well. 1
A `Microscopic' Study of Minimum Entropy Search in Learning Decomposable Markov Networks
 MACHINE LEARNING
, 1995
"... Several scoring metrics are used in different search procedures for learning probabilistic networks. We study the properties of cross entropy in learning a decomposable Markov network. Though entropy and related scoring metrics were widely used, its `microscopic' properties and asymptotic behavior i ..."
Abstract

Cited by 23 (18 self)
 Add to MetaCart
Several scoring metrics are used in different search procedures for learning probabilistic networks. We study the properties of cross entropy in learning a decomposable Markov network. Though entropy and related scoring metrics were widely used, its `microscopic' properties and asymptotic behavior in a search have not been analyzed. We present such a `microscopic' study of a minimum entropy search algorithm, and show that it learns an Imap of the domain model when the data size is large. Search procedures that modify a network structure one link at a time have been commonly used for efficiency. Our study indicates that a class of domain models cannot be learned by such procedures. This suggests that prior knowledge about the problem domain together with a multilink search strategy would provide an effective way to uncover many domain models.
Featureinclusion stochastic search for Gaussian graphical models
 J. Comp. Graph. Statist
, 2008
"... We describe a serial algorithm called featureinclusion stochastic search, or FINCS, that uses online estimates of edgeinclusion probabilities to guide Bayesian model determination in Gaussian graphical models. FINCS is compared to MCMC, to Metropolisbased search methods, and to the popular lasso; ..."
Abstract

Cited by 20 (3 self)
 Add to MetaCart
We describe a serial algorithm called featureinclusion stochastic search, or FINCS, that uses online estimates of edgeinclusion probabilities to guide Bayesian model determination in Gaussian graphical models. FINCS is compared to MCMC, to Metropolisbased search methods, and to the popular lasso; it is found to be superior along a variety of dimensions, leading to better sets of discovered models, greater speed and stability, and reasonable estimates of edgeinclusion probabilities. We illustrate FINCS on an example involving mutualfund data, where we compare the modelaveraged predictive performance of models discovered with FINCS to those discovered by competing methods. Some key words: Covariance selection; Metropolis algorithm; lasso; Bayesian model selection; hyperinverse Wishart distribution