Results 1  10
of
38
Being Bayesian about network structure
 Machine Learning
, 2000
"... Abstract. In many multivariate domains, we are interested in analyzing the dependency structure of the underlying distribution, e.g., whether two variables are in direct interaction. We can represent dependency structures using Bayesian network models. To analyze a given data set, Bayesian model sel ..."
Abstract

Cited by 201 (5 self)
 Add to MetaCart
Abstract. In many multivariate domains, we are interested in analyzing the dependency structure of the underlying distribution, e.g., whether two variables are in direct interaction. We can represent dependency structures using Bayesian network models. To analyze a given data set, Bayesian model selection attempts to find the most likely (MAP) model, and uses its structure to answer these questions. However, when the amount of available data is modest, there might be many models that have nonnegligible posterior. Thus, we want compute the Bayesian posterior of a feature, i.e., the total posterior probability of all models that contain it. In this paper, we propose a new approach for this task. We first show how to efficiently compute a sum over the exponential number of networks that are consistent with a fixed order over network variables. This allows us to compute, for a given order, both the marginal probability of the data and the posterior of a feature. We then use this result as the basis for an algorithm that approximates the Bayesian posterior of a feature. Our approach uses a Markov Chain Monte Carlo (MCMC) method, but over orders rather than over network structures. The space of orders is smaller and more regular than the space of structures, and has much a smoother posterior “landscape”. We present empirical results on synthetic and reallife datasets that compare our approach to full model averaging (when possible), to MCMC over network structures, and to a nonBayesian bootstrap approach.
A characterization of Markov equivalence classes for acyclic digraphs
, 1995
"... Undirected graphs and acyclic digraphs (ADGs), as well as their mutual extension to chain graphs, are widely used to describe dependencies among variables in multivariate distributions. In particular, the likelihood functions of ADG models admit convenient recursive factorizations that often allow e ..."
Abstract

Cited by 90 (7 self)
 Add to MetaCart
Undirected graphs and acyclic digraphs (ADGs), as well as their mutual extension to chain graphs, are widely used to describe dependencies among variables in multivariate distributions. In particular, the likelihood functions of ADG models admit convenient recursive factorizations that often allow explicit maximum likelihood estimates and that are well suited to building Bayesian networks for expert systems. Whereas the undirected graph associated with a dependence model is uniquely determined, there may, however, be many ADGs that determine the same dependence ( = Markov) model. Thus, the family of all ADGs with a given set of vertices is naturally partitioned into Markovequivalence classes, each class being associated with a unique statistical model. Statistical procedures, such as model selection or model averaging, that fail to take into account these equivalence classes, may incur substantial computational or other inefficiencies. Here it is shown that each Markovequivalence class is uniquely determined by a single chain graph, the essential graph, that is itself simultaneously Markov equivalent to all ADGs in the equivalence class. Essential graphs are characterized, a polynomialtime algorithm for their construction is given, and their applications to model selection and other statistical
Markov Chain Monte Carlo Model Determination for Hierarchical and Graphical Loglinear Models
 Biometrika
, 1996
"... this paper, we will only consider undirected graphical models. For details of Bayesian model selection for directed graphical models see Madigan et al (1995). An (undirected) graphical model is determined by a set of conditional independence constraints of the form `fl 1 is independent of fl 2 condi ..."
Abstract

Cited by 54 (8 self)
 Add to MetaCart
this paper, we will only consider undirected graphical models. For details of Bayesian model selection for directed graphical models see Madigan et al (1995). An (undirected) graphical model is determined by a set of conditional independence constraints of the form `fl 1 is independent of fl 2 conditional on all other fl i 2 C'. Graphical models are so called because they can each be represented as a graph with vertex set C and an edge between each pair fl 1 and fl 2 unless fl 1 and fl 2 are conditionally independent as described above. Darroch, Lauritzen and Speed (1980) show that each graphical loglinear model is hierarchical, with generators given by the cliques (complete subgraphs) of the graph. The total number of possible graphical models is clearly given by 2 (
An Alternative Markov Property for Chain Graphs
 Scand. J. Statist
, 1996
"... Graphical Markov models use graphs, either undirected, directed, or mixed, to represent possible dependences among statistical variables. Applications of undirected graphs (UDGs) include models for spatial dependence and image analysis, while acyclic directed graphs (ADGs), which are especially conv ..."
Abstract

Cited by 46 (4 self)
 Add to MetaCart
Graphical Markov models use graphs, either undirected, directed, or mixed, to represent possible dependences among statistical variables. Applications of undirected graphs (UDGs) include models for spatial dependence and image analysis, while acyclic directed graphs (ADGs), which are especially convenient for statistical analysis, arise in such fields as genetics and psychometrics and as models for expert systems and Bayesian belief networks. Lauritzen, Wermuth, and Frydenberg (LWF) introduced a Markov property for chain graphs, which are mixed graphs that can be used to represent simultaneously both causal and associative dependencies and which include both UDGs and ADGs as special cases. In this paper an alternative Markov property (AMP) for chain graphs is introduced, which in some ways is a more direct extension of the ADG Markov property than is the LWF property for chain graph. 1 INTRODUCTION Graphical Markov models use graphs, either undirected, directed, or mixed, to represent...
Bayesian model averaging
 STAT.SCI
, 1999
"... Standard statistical practice ignores model uncertainty. Data analysts typically select a model from some class of models and then proceed as if the selected model had generated the data. This approach ignores the uncertainty in model selection, leading to overcon dent inferences and decisions tha ..."
Abstract

Cited by 42 (0 self)
 Add to MetaCart
Standard statistical practice ignores model uncertainty. Data analysts typically select a model from some class of models and then proceed as if the selected model had generated the data. This approach ignores the uncertainty in model selection, leading to overcon dent inferences and decisions that are more risky than one thinks they are. Bayesian model averaging (BMA) provides a coherent mechanism for accounting for this model uncertainty. Several methods for implementing BMA haverecently emerged. We discuss these methods and present anumber of examples. In these examples, BMA provides improved outofsample predictive performance. We also provide a catalogue of
Active Learning of Causal Bayes Net Structure
, 2001
"... We propose a decision theoretic approach for deciding which interventions to perform so as to learn the causal structure of a model as quickly as possible. Without such interventions, it is impossible to distinguish between Markov equivalent models, even given infinite data. We perform online MCMC t ..."
Abstract

Cited by 39 (2 self)
 Add to MetaCart
We propose a decision theoretic approach for deciding which interventions to perform so as to learn the causal structure of a model as quickly as possible. Without such interventions, it is impossible to distinguish between Markov equivalent models, even given infinite data. We perform online MCMC to estimate the posterior over graph structures, and use importance sampling to find the best action to perform at each step. We assume the data is discretevalued and fully observed.
Improved learning of Bayesian networks
 Proc. of the Conf. on Uncertainty in Artificial Intelligence
, 2001
"... Two or more Bayesian network structures are Markov equivalent when the corresponding acyclic digraphs encode the same set of conditional independencies. Therefore, the search space of Bayesian network structures may be organized in equivalence classes, where each of them represents a different set o ..."
Abstract

Cited by 37 (6 self)
 Add to MetaCart
Two or more Bayesian network structures are Markov equivalent when the corresponding acyclic digraphs encode the same set of conditional independencies. Therefore, the search space of Bayesian network structures may be organized in equivalence classes, where each of them represents a different set of conditional independencies. The collection of sets of conditional independencies obeys a partial order, the socalled “inclusion order.” This paper discusses in depth the role that the inclusion order plays in learning the structure of Bayesian networks. In particular, this role involves the way a learning algorithm traverses the search space. We introduce a condition for traversal operators, the inclusion boundary condition, which, when it is satisfied, guarantees that the search strategy can avoid local maxima. This is proved under the assumptions that the data is sampled from a probability distribution which is faithful to an acyclic digraph, and the length of the sample is unbounded. The previous discussion leads to the design of a new traversal operator and two new learning algorithms in the context of heuristic search and the Markov Chain Monte Carlo method. We carry out a set of experiments with synthetic and realworld data that show empirically the benefit of striving for the inclusion order when learning Bayesian networks from data.
A Hybrid Anytime Algorithm for the Construction of Causal Models From Sparse Data
 PROCEEDINGS OF THE FIFTEENTH INTERNATIONAL CONFERENCE ON UNCERTAINTY IN ARTIFICIAL INTELLIGENCE
, 1999
"... We present a hybrid constraintbased/Bayesian algorithm for learning causal networks in the presence of sparse data. The algorithm searches the space of equivalence classes of models (essential graphs) using a heuristic based on conventional constraintbased techniques. Each essential graph is ..."
Abstract

Cited by 29 (3 self)
 Add to MetaCart
We present a hybrid constraintbased/Bayesian algorithm for learning causal networks in the presence of sparse data. The algorithm searches the space of equivalence classes of models (essential graphs) using a heuristic based on conventional constraintbased techniques. Each essential graph is then converted into a directed acyclic graph and scored using a Bayesian scoring metric. Two variants
Bayesian Model Averaging in proportional hazard models: Assessing the risk of a stroke
 Applied Statistics
, 1997
"... Evaluating the risk of stroke is important in reducing the incidence of this devastating disease. Here, we apply Bayesian model averaging to variable selection in Cox proportional hazard models in the context of the Cardiovascular Health Study, a comprehensive investigation into the risk factors for ..."
Abstract

Cited by 29 (5 self)
 Add to MetaCart
Evaluating the risk of stroke is important in reducing the incidence of this devastating disease. Here, we apply Bayesian model averaging to variable selection in Cox proportional hazard models in the context of the Cardiovascular Health Study, a comprehensive investigation into the risk factors for stroke. We introduce a technique based on the leaps and bounds algorithm which e ciently locates and ts the best models in the very large model space and thereby extends all subsets regression to Cox models. For each independent variable considered, the method provides the posterior probability that it belongs in the model. This is more directly interpretable than the corresponding Pvalues, and also more valid in that it takes account of model uncertainty. Pvalues from models preferred by stepwise methods tend to overstate the evidence for the predictive value of a variable. In our data Bayesian model averaging predictively outperforms standard model selection methods for assessing
Parameter priors for directed acyclic graphical models and the characterization of several probability distributions
 MICROSOFT RESEARCH, ADVANCED TECHNOLOGY DIVISION
, 1999
"... We show that the only parameter prior for complete Gaussian DAG models that satisfies global parameter independence, complete model equivalence, and some weak regularity assumptions, is the normalWishart distribution. Our analysis is based on the following new characterization of the Wishart distri ..."
Abstract

Cited by 26 (1 self)
 Add to MetaCart
We show that the only parameter prior for complete Gaussian DAG models that satisfies global parameter independence, complete model equivalence, and some weak regularity assumptions, is the normalWishart distribution. Our analysis is based on the following new characterization of the Wishart distribution: let W be an n × n, n ≥ 3, positivedefinite symmetric matrix of random variables and f(W) be a pdf of W. Then, f(W) is a Wishart distribution if and only if W11 − W12W −1 is independent 22 W ′ 12 of {W12, W22} for every block partitioning