Results 1  10
of
17
The maxmin hillclimbing bayesian network structure learning algorithm
 Machine Learning
, 2006
"... Abstract. We present a new algorithm for Bayesian network structure learning, called MaxMin HillClimbing (MMHC). The algorithm combines ideas from local learning, constraintbased, and searchandscore techniques in a principled and effective way. It first reconstructs the skeleton of a Bayesian n ..."
Abstract

Cited by 75 (7 self)
 Add to MetaCart
Abstract. We present a new algorithm for Bayesian network structure learning, called MaxMin HillClimbing (MMHC). The algorithm combines ideas from local learning, constraintbased, and searchandscore techniques in a principled and effective way. It first reconstructs the skeleton of a Bayesian network and then performs a Bayesianscoring greedy hillclimbing search to orient the edges. In our extensive empirical evaluation MMHC outperforms on average and in terms of various metrics several prototypical and stateoftheart algorithms, namely the PC, Sparse Candidate, Three Phase Dependency Analysis, Optimal Reinsertion, Greedy Equivalence Search, and Greedy Search. These are the first empirical results simultaneously comparing most of the major Bayesian network algorithms against each other. MMHC offers certain theoretical advantages, specifically over the Sparse Candidate algorithm, corroborated by our experiments. MMHC and detailed results of our study are publicly available at
On Sensitivity of the MAP Bayesian Network Structure to the Equivalent Sample Size Parameter
"... BDeu marginal likelihood score is a popular model selection criterion for selecting a Bayesian network structure based on sample data. This noninformative scoring criterion assigns same score for network structures that encode same independence statements. However, before applying the BDeu score, o ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
BDeu marginal likelihood score is a popular model selection criterion for selecting a Bayesian network structure based on sample data. This noninformative scoring criterion assigns same score for network structures that encode same independence statements. However, before applying the BDeu score, one must determine a single parameter, the equivalent sample size α. Unfortunately no generally accepted rule for determining the α parameter has been suggested. This is disturbing, since in this paper we show through a series of concrete experiments that the solution of the network structure optimization problem is highly sensitive to the chosen α parameter value. Based on these results, we are able to give explanations for how and why this phenomenon happens, and discuss ideas for solving this problem. 1
On Sensitivity of the MAP Bayesian Network Structure to the Equivalent Sample Size Parameter
, 2007
"... BDeu marginal likelihood score is a popular model selection criterion for selecting a Bayesian network structure based on sample data. This noninformative scoring criterion assigns same score for network structures that encode same independence statements. However, before applying the BDeu score, o ..."
Abstract

Cited by 7 (4 self)
 Add to MetaCart
BDeu marginal likelihood score is a popular model selection criterion for selecting a Bayesian network structure based on sample data. This noninformative scoring criterion assigns same score for network structures that encode same independence statements. However, before applying the BDeu score, one must determine a single parameter, the equivalent sample size α. Unfortunately no generally accepted rule for determining the α parameter has been suggested. This is disturbing, since in this paper we show through a series of concrete experiments that the solution of the network structure optimization problem is highly sensitive to the chosen α parameter value. Based on these results, we are able to give explanations for how and why this phenomenon happens, and discuss ideas for solving this problem.
Finding Optimal Bayesian Network Given a SuperStructure
"... Classical approaches used to learn Bayesian network structure from data have disadvantages in terms of complexity and lower accuracy of their results. However, a recent empirical study has shown that a hybrid algorithm improves sensitively accuracy and speed: it learns a skeleton with an independenc ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
Classical approaches used to learn Bayesian network structure from data have disadvantages in terms of complexity and lower accuracy of their results. However, a recent empirical study has shown that a hybrid algorithm improves sensitively accuracy and speed: it learns a skeleton with an independency test (IT) approach and constrains on the directed acyclic graphs (DAG) considered during the searchandscore phase. Subsequently, we theorize the structural constraint by introducing the concept of superstructure S, which is an undirected graph that restricts the search to networks whose skeleton is a subgraph of S. We develop a superstructure constrained optimal search (COS): its time complexity is upper bounded by O(γm n), where γm < 2 depends on the maximal degree m of S. Empirically, complexity depends on the average degree ˜m and sparse structures allow larger graphs to be calculated. Our algorithm is faster than an optimal search by several orders and even finds more accurate results when given a sound superstructure. Practically, S can be approximated by IT approaches; significance level of the tests controls its sparseness, enabling to control the tradeoff between speed and accuracy. For incomplete superstructures, a greedily postprocessed version (COS+) still enables to significantly outperform other heuristic searches. Keywords: subset Bayesian networks, structure learning, optimal search, superstructure, connected 1.
Learning the Bayesian Network Structure: Dirichlet Prior versus Data
"... In the Bayesian approach to structure learning of graphical models, the equivalent sample size (ESS) in the Dirichlet prior over the model parameters was recently shown to have an important effect on the maximumaposteriori estimate of the Bayesian network structure. In our first contribution, we t ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
In the Bayesian approach to structure learning of graphical models, the equivalent sample size (ESS) in the Dirichlet prior over the model parameters was recently shown to have an important effect on the maximumaposteriori estimate of the Bayesian network structure. In our first contribution, we theoretically analyze the case of large ESSvalues, which complements previous work: among other results, we find that the presence of an edge in a Bayesian network is favoured over its absence even if both the Dirichlet prior and the data imply independence, as long as the conditional empirical distribution is notably different from uniform. In our second contribution, we focus on realistic ESSvalues, and provide an analytical approximation to the ‘optimal ’ ESSvalue in a predictive sense (its accuracy is also validated experimentally): this approximation provides an understanding as to which properties of the data have the main effect determining the ‘optimal ’ ESSvalue. 1
A conjugate prior for discrete hierarchical loglinear models
, 2009
"... In Bayesian analysis of multiway contingency tables, the selection of a prior distribution for either the loglinear parameters or the cell probabilities parameters is a major challenge. In this paper, we define a flexible family of conjugate priors for the wide class of discrete hierarchical logl ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
In Bayesian analysis of multiway contingency tables, the selection of a prior distribution for either the loglinear parameters or the cell probabilities parameters is a major challenge. In this paper, we define a flexible family of conjugate priors for the wide class of discrete hierarchical loglinear models, which includes the class of graphical models. These priors are defined as the Diaconis–Ylvisaker conjugate priors on the loglinear parameters subject to “baseline constraints” under multinomial sampling. We also derive the induced prior on the cell probabilities and show that the induced prior is a generalization of the hyper Dirichlet prior. We show that this prior has several desirable properties and illustrate its usefulness by identifying the most probable decomposable, graphical and hierarchical loglinear models for a sixway contingency table.
Predictive Discretization during Model Selection
"... We present an approach to discretizing multivariate continuous data while learning the structure of a graphical model. We derive the joint scoring function from the principle of predictive accuracy, which inherently ensures the optimal tradeoff between goodness of fit and model complexity (includin ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
We present an approach to discretizing multivariate continuous data while learning the structure of a graphical model. We derive the joint scoring function from the principle of predictive accuracy, which inherently ensures the optimal tradeoff between goodness of fit and model complexity (including the number of discretization levels). Using the socalled finest grid implied by the data, our scoring function depends only on the number of data points in the various discretization levels. Not only can it be computed efficiently, but it is also invariant under monotonic transformations of the continuous space. Our experiments show that the discretization method can substantially impact the resulting graph structure. 1
Localizing Transient Faults Using Dynamic Bayesian Networks
"... Abstract—Transient faults are a major concern in today’s deep submicron semiconductor technology. These faults are rare but they have been known to cause catastrophic systemlevel failures. Transient errors often occur due to physical effects on deployed systems and hence, diagnosis of transient er ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Abstract—Transient faults are a major concern in today’s deep submicron semiconductor technology. These faults are rare but they have been known to cause catastrophic systemlevel failures. Transient errors often occur due to physical effects on deployed systems and hence, diagnosis of transient errors must be performed over manufactured chips or systems assembled from blackbox components where arbitrary instrumentation of the system is not possible and hence, the system state is only partially observable. Further, these systems are often composed of components that are third party IP which further adds opaqueness to the system. In this paper, we propose a probabilistic approach to localize transient faults in space and time for such partially observable systems. From a set of correct traces and a failure trace, we seek to locate the faulty component and the cycle of operation at which the fault occurred. Our technique uses correct system traces over monitored components of the system to learn a dynamic Bayesian network (DBN) summarizing the temporal dependencies across the monitored components. This DBN is augmented with different error hypotheses allowed by the fault model. The most probable explanation (MPE) among these hypotheses corresponds to the most likely location of the error. We evaluated the effectiveness of our technique on a set of ISCAS89 benchmarks and a router design used in onchip networks in a multicore design. I.
Learning Dynamic Bayesian Network Models Via CrossValidation
"... We study crossvalidation as a scoring criterion for learning dynamic Bayesian network models that generalize well. We argue that crossvalidation is more suitable than the Bayesian scoring criterion for one of the most common interpretations of generalization. We confirm this by carrying out an exp ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
We study crossvalidation as a scoring criterion for learning dynamic Bayesian network models that generalize well. We argue that crossvalidation is more suitable than the Bayesian scoring criterion for one of the most common interpretations of generalization. We confirm this by carrying out an experimental comparison of crossvalidation and the Bayesian scoring criterion, as implemented by the Bayesian Dirichlet metric and the Bayesian information criterion. The results show that crossvalidation leads to models that generalize better for a wide range of sample sizes.
Belief net structure learning from uncertain interventions
"... We show how to learn causal structure from interventions with unknown effects and/or side effects by adding the intervention variables to the graph and using Bayesian inference to learn the resulting twolayered graph structure. We show that, on a datatset consisting of protein phosphorylation level ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
We show how to learn causal structure from interventions with unknown effects and/or side effects by adding the intervention variables to the graph and using Bayesian inference to learn the resulting twolayered graph structure. We show that, on a datatset consisting of protein phosphorylation levels measured under various perturbations, learning the targets of intervention results in models that fit the data better than falsely assuming the interventions are perfect. Furthermore, learning the children of the intervention nodes is useful for such tasks as drug and disease target discovery, where we wish to distinguish direct effects from indirect effects. We illustrate the latter by correctly identifying known targets of genetic mutation in various forms of leukemia using microarray expression data.