Results 1  10
of
42
Model selection and accounting for model uncertainty in graphical models using Occam's window
, 1993
"... We consider the problem of model selection and accounting for model uncertainty in highdimensional contingency tables, motivated by expert system applications. The approach most used currently is a stepwise strategy guided by tests based on approximate asymptotic Pvalues leading to the selection o ..."
Abstract

Cited by 266 (46 self)
 Add to MetaCart
We consider the problem of model selection and accounting for model uncertainty in highdimensional contingency tables, motivated by expert system applications. The approach most used currently is a stepwise strategy guided by tests based on approximate asymptotic Pvalues leading to the selection of a single model; inference is then conditional on the selected model. The sampling properties of such a strategy are complex, and the failure to take account of model uncertainty leads to underestimation of uncertainty about quantities of interest. In principle, a panacea is provided by the standard Bayesian formalism which averages the posterior distributions of the quantity of interest under each of the models, weighted by their posterior model probabilities. Furthermore, this approach is optimal in the sense of maximising predictive ability. However, this has not been used in practice because computing the posterior model probabilities is hard and the number of models is very large (often greater than 1011). We argue that the standard Bayesian formalism is unsatisfactory and we propose an alternative Bayesian approach that, we contend, takes full account of the true model uncertainty byaveraging overamuch smaller set of models. An efficient search algorithm is developed for nding these models. We consider two classes of graphical models that arise in expert systems: the recursive causal models and the decomposable
Markov Chain Monte Carlo Model Determination for Hierarchical and Graphical Loglinear Models
 Biometrika
, 1996
"... this paper, we will only consider undirected graphical models. For details of Bayesian model selection for directed graphical models see Madigan et al (1995). An (undirected) graphical model is determined by a set of conditional independence constraints of the form `fl 1 is independent of fl 2 condi ..."
Abstract

Cited by 55 (8 self)
 Add to MetaCart
this paper, we will only consider undirected graphical models. For details of Bayesian model selection for directed graphical models see Madigan et al (1995). An (undirected) graphical model is determined by a set of conditional independence constraints of the form `fl 1 is independent of fl 2 conditional on all other fl i 2 C'. Graphical models are so called because they can each be represented as a graph with vertex set C and an edge between each pair fl 1 and fl 2 unless fl 1 and fl 2 are conditionally independent as described above. Darroch, Lauritzen and Speed (1980) show that each graphical loglinear model is hierarchical, with generators given by the cliques (complete subgraphs) of the graph. The total number of possible graphical models is clearly given by 2 (
Optimization by learning and simulation of Bayesian and Gaussian networks
, 1999
"... Estimation of Distribution Algorithms (EDA) constitute an example of stochastics heuristics based on populations of individuals every of which encode the possible solutions to the optimization problem. These populations of individuals evolve in succesive generations as the search progresses  organ ..."
Abstract

Cited by 43 (6 self)
 Add to MetaCart
Estimation of Distribution Algorithms (EDA) constitute an example of stochastics heuristics based on populations of individuals every of which encode the possible solutions to the optimization problem. These populations of individuals evolve in succesive generations as the search progresses  organized in the same way as most evolutionary computation heuristics. In opposition to most evolutionary computation paradigms which consider the crossing and mutation operators as essential tools to generate new populations, EDA replaces those operators by the estimation and simulation of the joint probability distribution of the selected individuals. In this work, after making a review of the different approaches based on EDA for problems of combinatorial optimization as well as for problems of optimization in continuous domains, we propose new approaches based on the theory of probabilistic graphical models to solve problems in both domains. More precisely, we propose to adapt algorit...
Bayesian Model Averaging And Model Selection For Markov Equivalence Classes Of Acyclic Digraphs
 Communications in Statistics: Theory and Methods
, 1996
"... Acyclic digraphs (ADGs) are widely used to describe dependences among variables in multivariate distributions. In particular, the likelihood functions of ADG models admit convenient recursive factorizations that often allow explicit maximum likelihood estimates and that are well suited to building B ..."
Abstract

Cited by 38 (5 self)
 Add to MetaCart
Acyclic digraphs (ADGs) are widely used to describe dependences among variables in multivariate distributions. In particular, the likelihood functions of ADG models admit convenient recursive factorizations that often allow explicit maximum likelihood estimates and that are well suited to building Bayesian networks for expert systems. There may, however, be many ADGs that determine the same dependence (= Markov) model. Thus, the family of all ADGs with a given set of vertices is naturally partitioned into Markovequivalence classes, each class being associated with a unique statistical model. Statistical procedures, such as model selection or model averaging, that fail to take into account these equivalence classes, may incur substantial computational or other inefficiencies. Recent results have shown that each Markovequivalence class is uniquely determined by a single chain graph, the essential graph, that is itself Markovequivalent simultaneously to all ADGs in the equivalence clas...
Improved learning of Bayesian networks
 Proc. of the Conf. on Uncertainty in Artificial Intelligence
, 2001
"... Two or more Bayesian network structures are Markov equivalent when the corresponding acyclic digraphs encode the same set of conditional independencies. Therefore, the search space of Bayesian network structures may be organized in equivalence classes, where each of them represents a different set o ..."
Abstract

Cited by 37 (6 self)
 Add to MetaCart
Two or more Bayesian network structures are Markov equivalent when the corresponding acyclic digraphs encode the same set of conditional independencies. Therefore, the search space of Bayesian network structures may be organized in equivalence classes, where each of them represents a different set of conditional independencies. The collection of sets of conditional independencies obeys a partial order, the socalled “inclusion order.” This paper discusses in depth the role that the inclusion order plays in learning the structure of Bayesian networks. In particular, this role involves the way a learning algorithm traverses the search space. We introduce a condition for traversal operators, the inclusion boundary condition, which, when it is satisfied, guarantees that the search strategy can avoid local maxima. This is proved under the assumptions that the data is sampled from a probability distribution which is faithful to an acyclic digraph, and the length of the sample is unbounded. The previous discussion leads to the design of a new traversal operator and two new learning algorithms in the context of heuristic search and the Markov Chain Monte Carlo method. We carry out a set of experiments with synthetic and realworld data that show empirically the benefit of striving for the inclusion order when learning Bayesian networks from data.
A `Microscopic' Study of Minimum Entropy Search in Learning Decomposable Markov Networks
 MACHINE LEARNING
, 1995
"... Several scoring metrics are used in different search procedures for learning probabilistic networks. We study the properties of cross entropy in learning a decomposable Markov network. Though entropy and related scoring metrics were widely used, its `microscopic' properties and asymptotic behavior i ..."
Abstract

Cited by 23 (18 self)
 Add to MetaCart
Several scoring metrics are used in different search procedures for learning probabilistic networks. We study the properties of cross entropy in learning a decomposable Markov network. Though entropy and related scoring metrics were widely used, its `microscopic' properties and asymptotic behavior in a search have not been analyzed. We present such a `microscopic' study of a minimum entropy search algorithm, and show that it learns an Imap of the domain model when the data size is large. Search procedures that modify a network structure one link at a time have been commonly used for efficiency. Our study indicates that a class of domain models cannot be learned by such procedures. This suggests that prior knowledge about the problem domain together with a multilink search strategy would provide an effective way to uncover many domain models.
Sequential importance sampling for multiway tables
 Annals of Statistics
, 2005
"... We describe an algorithm for the sequential sampling of entries in multiway contingency tables with given constraints. The algorithm can be used for computations in exact conditional inference. To justify the algorithm, a theory relates sampling values at each step to properties of the associated to ..."
Abstract

Cited by 21 (3 self)
 Add to MetaCart
We describe an algorithm for the sequential sampling of entries in multiway contingency tables with given constraints. The algorithm can be used for computations in exact conditional inference. To justify the algorithm, a theory relates sampling values at each step to properties of the associated toric ideal using computational commutative algebra. In particular, the property of interval cell counts at each step is related to exponents on lead indeterminates of a lexicographic Gröbner basis. Also, the approximation of integer programming by linear programming for sampling is related to initial terms of a toric ideal. We apply the algorithm to examples of contingency tables which appear in the social and medical sciences. The numerical results demonstrate that the theory is applicable and that the algorithm performs well. 1. Introduction. Sampling
Bounds for Cell Entries in Contingency Tables Induced by Fixed Marginal Totals
, 2001
"... We describe new results for upper and lower bounds on the entries in multiway tables of counts based on a set of released and possibly overlapping marginal tables which have practical import for assessing disclosure risk. In particular, we present a generalized version of the shuttle algorithm pr ..."
Abstract

Cited by 20 (8 self)
 Add to MetaCart
We describe new results for upper and lower bounds on the entries in multiway tables of counts based on a set of released and possibly overlapping marginal tables which have practical import for assessing disclosure risk. In particular, we present a generalized version of the shuttle algorithm proposed by Buzzigoli and Giusti that is proven to compute sharp integer bounds for an arbitrary set of fixed marginals. Keywords: Statistical disclosure control; Loglinear models; Decomposable models; Reducible models; Integer programming. 1. Introduction. 1 The National Institute of Statistical Sciences has recently assembled a team of statistical researchers from multiple universities who, working with statisticians in U.S. statistical agencies, are developing a Webbased query system for statistical databases. Their goal is a system that allows the use of disclosure limitation methods (e.g., see Willenborg and de Waal 1996; 2000) applied sequentially in response to a series of stati...
Convex structure learning in loglinear models: Beyond pairwise potentials
 In Proceedings of International Workshop on Artificial Intelligence and Statistics
, 2010
"... Previous work has examined structure learning in loglinear models with `1regularization, largely focusing on the case of pairwise potentials. In this work we consider the case of models with potentials of arbitrary order, but that satisfy a hierarchical constraint. We enforce the hierarchical const ..."
Abstract

Cited by 18 (1 self)
 Add to MetaCart
Previous work has examined structure learning in loglinear models with `1regularization, largely focusing on the case of pairwise potentials. In this work we consider the case of models with potentials of arbitrary order, but that satisfy a hierarchical constraint. We enforce the hierarchical constraint using group `1regularization with overlapping groups. An active set method that enforces hierarchical inclusion allows us to tractably consider the exponential number of higherorder potentials. We use a spectral projected gradient method as a subroutine for solving the overlapping group `1regularization problem, and make use of a sparse version of Dykstra's algorithm to compute the projection. Our experiments indicate that this model gives equal or better test set likelihood compared to previous models. 1