Results 1  10
of
27
Exact Bayesian structure discovery in Bayesian networks
 J. of Machine Learning Research
, 2004
"... We consider a Bayesian method for learning the Bayesian network structure from complete data. Recently, Koivisto and Sood (2004) presented an algorithm that for any single edge computes its marginal posterior probability in O(n2 n) time, where n is the number of attributes; the number of parents per ..."
Abstract

Cited by 55 (8 self)
 Add to MetaCart
We consider a Bayesian method for learning the Bayesian network structure from complete data. Recently, Koivisto and Sood (2004) presented an algorithm that for any single edge computes its marginal posterior probability in O(n2 n) time, where n is the number of attributes; the number of parents per attribute is bounded by a constant. In this paper we show that the posterior probabilities for all the n(n−1) potential edges can be computed in O(n2 n) total time. This result is achieved by a forward–backward technique and fast Möbius transform algorithms, which are of independent interest. The resulting speedup by a factor of about n 2 allows us to experimentally study the statistical power of learning moderatesize networks. We report results from a simulation study that covers data sets with 20 to 10,000 records over 5 to 25 discrete attributes. 1
A Scoring Function for Learning Bayesian Networks based on Mutual Information and Conditional Independence Tests
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2006
"... We propose a new scoring function for learning Bayesian networks from data using score search algorithms. This is based on the concept of mutual information and exploits some wellknown properties of this measure in a novel way. Essentially, a statistical independence test based on the chisquare di ..."
Abstract

Cited by 17 (0 self)
 Add to MetaCart
We propose a new scoring function for learning Bayesian networks from data using score search algorithms. This is based on the concept of mutual information and exploits some wellknown properties of this measure in a novel way. Essentially, a statistical independence test based on the chisquare distribution, associated with the mutual information measure, together with a property of additive decomposition of this measure, are combined in order to measure the degree of interaction between each variable and its parent variables in the network. The result is a nonBayesian scoring function called MIT (mutual information tests) which belongs to the family of scores based on information theory. The MIT score also represents a penalization of the KullbackLeibler divergence between the joint probability distributions associated with a candidate network and with the available data set. Detailed results of a complete experimental evaluation of the proposed scoring function and its comparison with the wellknown K2, BDeu and BIC/MDL scores are also presented.
Searching for Bayesian Network Structures in the Space of Restricted Acyclic Aprtially Directed Graphs
 Journal of Artificial Intelligence Research
, 2003
"... Although many algorithms have been designed to construct Bayesian network structures using dierent approaches and principles, they all employ only two methods: those based on independence criteria, and those based on a scoring function and a search procedure (although some methods combine the two). ..."
Abstract

Cited by 15 (2 self)
 Add to MetaCart
Although many algorithms have been designed to construct Bayesian network structures using dierent approaches and principles, they all employ only two methods: those based on independence criteria, and those based on a scoring function and a search procedure (although some methods combine the two). Within the score+search paradigm, the dominant approach uses local search methods in the space of directed acyclic graphs (DAGs), where the usual choices for de ning the elementary modi cations (local changes) that can be applied are arc addition, arc deletion, and arc reversal. In this paper, we propose a new local search method that uses a dierent search space, and which takes account of the concept of equivalence between network structures: restricted acyclic partially directed graphs (RPDAGs). In this way, the number of dierent con gurations of the search space is reduced, thus improving eciency. Moreover, although the nal result must necessarily be a local optimum given the nature of the search method, the topology of the new search space, which avoids making early decisions about the directions of the arcs, may help to nd better local optima than those obtained by searching in the DAG space.
Robust independence testing for constraintbased learning of causal structure
 In UAI
, 2003
"... This paper considers a method that combines ideas from Bayesian learning, Bayesian network inference, and classical hypothesis testing to produce a more reliable and robust test of independence for constraintbased (CB) learning of causal structure. Our method produces a smoothed contingency table NX ..."
Abstract

Cited by 13 (1 self)
 Add to MetaCart
This paper considers a method that combines ideas from Bayesian learning, Bayesian network inference, and classical hypothesis testing to produce a more reliable and robust test of independence for constraintbased (CB) learning of causal structure. Our method produces a smoothed contingency table NXY Z that can be used with any test of independence that relies on contingency table statistics. NXY Z can be calculated in the same asymptotic time and space required to calculate a standard contingency table, allows the specification of a prior distribution over parameters, and can be calculated when the database is incomplete. We provide theoretical justification for the procedure, and with synthetic data we demonstrate its benefits empirically over both a CB algorithm using the standard contingency table, and over a greedy Bayesian algorithm. We show that, even when used with noninformative priors, it results in better recovery of structural features and it produces networks with smaller KLDivergence, especially as the number of nodes increases or the number of records decreases. Another benefit is the dramatic reduction in the probability that a CB algorithm will stall during the search, providing a remedy for an annoying problem plaguing CB learning when the database is small. 1
Finding optimal Bayesian networks by dynamic programming
, 2005
"... Finding the Bayesian network that maximizes a score function is known as structure learning or structure discovery. Most approaches use local search in the space of acyclic digraphs, which is prone to local maxima. Exhaustive enumeration requires superexponential time. In this paper we describe a “ ..."
Abstract

Cited by 12 (0 self)
 Add to MetaCart
Finding the Bayesian network that maximizes a score function is known as structure learning or structure discovery. Most approaches use local search in the space of acyclic digraphs, which is prone to local maxima. Exhaustive enumeration requires superexponential time. In this paper we describe a “merely ” exponential space/time algorithm for finding a Bayesian network that corresponds to a global maxima of a decomposable scoring function, such as BDeu or BIC. NSF IIS0325581, NSERC PGSB
Stochastic local search algorithms for learning belief networks: Searching
 in the space of orderings, Lecture Notes in Artificial Intelligence 2143
, 2001
"... A common approach for learning Bayesian networks (BNs) from data is based on the use of a scoring metric to evaluate the fitness of any given candidate network to the data and a method to explore the search space, which usually is the set of directed acyclic graphs (DAGs). The most efficient search ..."
Abstract

Cited by 7 (4 self)
 Add to MetaCart
A common approach for learning Bayesian networks (BNs) from data is based on the use of a scoring metric to evaluate the fitness of any given candidate network to the data and a method to explore the search space, which usually is the set of directed acyclic graphs (DAGs). The most efficient search methods used in this context are greedy hill climbing, either deterministic or stochastic. One of these methods that has been applied with some success is hill climbing with random restart. In this article we study a new algorithm of this type to restart a local search when it is trapped at a local optimum. It uses problemspecific knowledge about BNs and the information provided by the database itself (by testing the conditional independencies, which are true in the current solution of the search process). We also study a new definition of neighborhood for the space of DAGs by using the classical operators of arc addition and arc deletion together with a new operator for arc reversal. The proposed methods are empirically tested using two different domains: ALARM and INSURANCE. © 2003 Wiley Periodicals, Inc. 1.
Bayesian network learning by compiling to weighted MAXSAT
"... The problem of learning discrete Bayesian networks from data is encoded as a weighted MAXSAT problem and the MaxWalkSat local search algorithm is used to address it. For each dataset, the pervariable summands of the (BDeu) marginal likelihood for different choices of parents (‘family scores’) are ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
The problem of learning discrete Bayesian networks from data is encoded as a weighted MAXSAT problem and the MaxWalkSat local search algorithm is used to address it. For each dataset, the pervariable summands of the (BDeu) marginal likelihood for different choices of parents (‘family scores’) are computed prior to applying MaxWalkSat. Each permissible choice of parents for each variable is encoded as a distinct propositional atom and the associated family score encoded as a ‘soft ’ weighted singleliteral clause. Two approaches to enforcing acyclicity are considered: either by encoding the ancestor relation or by attaching a total order to each graph and encoding that. The latter approach gives better results. Learning experiments have been conducted on 21 synthetic datasets sampled from 7 BNs. The largest dataset has 10,000 datapoints and 60 variables producing (for the ‘ancestor ’ encoding) a weighted CNF input file with 19,932 atoms and 269,367 clauses. For most datasets, MaxWalkSat quickly finds BNs with higher BDeu score than the ‘true ’ BN. The effect of adding prior information is assessed. It is further shown that Bayesian model averaging can be effected by collecting BNs generated during the search. 1
A reconstruction algorithm for the essential graph
, 2008
"... A standard graphical representative of a Bayesian network structure is a special chain graph, known as an essential graph. An alternative algebraic approach to the mathematical description of this statistical model uses instead a certain integervalued vector, known as a standard imset. We give a di ..."
Abstract

Cited by 5 (3 self)
 Add to MetaCart
A standard graphical representative of a Bayesian network structure is a special chain graph, known as an essential graph. An alternative algebraic approach to the mathematical description of this statistical model uses instead a certain integervalued vector, known as a standard imset. We give a direct formula for the translation of any chain graph describing a Bayesian network structure into the standard imset. Moreover, we present a twostage algorithm which makes it possible to reconstruct the essential graph on the basis of the standard imset. The core of this paper is the proof of the correctness of the algorithm.
Learning Essential Graph Markov Models from Data
 First European Workshop on Probabilistic Graphical Models
, 2002
"... In a model selection procedure where many models are to be compared, computational e#ciency is critical. For acyclic digraph (ADG) Markov models (aka DAG models or Bayesian networks), each ADG Markov equivalence class can be represented by a unique chain graph, called an essential graph (EG). This p ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
In a model selection procedure where many models are to be compared, computational e#ciency is critical. For acyclic digraph (ADG) Markov models (aka DAG models or Bayesian networks), each ADG Markov equivalence class can be represented by a unique chain graph, called an essential graph (EG). This parsimonious representation might be used to facilitate selection among ADG models. Because EGs combine features of decomposable graphs and ADGs, a scoring metric can be developed for EGs with categorical (multinomial) data. This metric may permit the characterization of local computations directly for EGs, which in turn would yield a learning procedure that does not require transformation to representative ADGs at each step for scoring purposes, nor is the scoring metric constrained by Markov equivalence.
Sequences of regressions and their independences
, 2012
"... Ordered sequences of univariate or multivariate regressions provide statistical modelsfor analysingdata fromrandomized, possiblysequential interventions, from cohort or multiwave panel studies, but also from crosssectional or retrospective studies. Conditional independences are captured by what we ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
Ordered sequences of univariate or multivariate regressions provide statistical modelsfor analysingdata fromrandomized, possiblysequential interventions, from cohort or multiwave panel studies, but also from crosssectional or retrospective studies. Conditional independences are captured by what we name regression graphs, provided the generated distribution shares some properties with a joint Gaussian distribution. Regression graphs extend purely directed, acyclic graphs by two types of undirected graph, one type for components of joint responses and the other for components of the context vector variable. We review the special features and the history of regression graphs, prove criteria for Markov equivalence anddiscussthenotion of simpler statistical covering models. Knowledgeof Markov equivalence provides alternative interpretations of a given sequence of regressions, is essential for machine learning strategies and permits to use the simple graphical criteria of regression graphs on graphs for which the corresponding criteria are in general more complex. Under the known conditions that a Markov equivalent directed acyclic graph exists for any given regression graph, we give a polynomial time algorithm to find one such graph.