## Efficient markov network structure discovery using independence tests (2006)

### Cached

### Download Links

- [siam.org]
- [www.siam.org]
- [www.siam.org]
- [www.jair.org]
- [jair.org]
- [www.cs.iastate.edu]
- [www.cs.iastate.edu]
- [www.cs.iastate.edu]
- [www.jair.org]
- [jair.org]
- [www.cs.iastate.edu]
- DBLP

### Other Repositories/Bibliography

Venue: | In Proc SIAM Data Mining |

Citations: | 17 - 1 self |

### BibTeX

@INPROCEEDINGS{Bromberg06efficientmarkov,

author = {Facundo Bromberg and Dimitris Margaritis and Vasant Honavar},

title = {Efficient markov network structure discovery using independence tests},

booktitle = {In Proc SIAM Data Mining},

year = {2006},

pages = {06}

}

### OpenURL

### Abstract

We present two algorithms for learning the structure of a Markov network from discrete data: GSMN and GSIMN. Both algorithms use statistical conditional independence tests on data to infer the structure by successively constraining the set of structures consistent with the results of these tests. GSMN is a natural adaptation of the Grow-Shrink algorithm of Margaritis and Thrun for learning the structure of Bayesian networks. GSIMN extends GSMN by additionally exploiting Pearl’s well-known properties of conditional independence relations to infer novel independencies from known independencies, thus avoiding the need to perform these tests. Experiments on artificial and real data sets show GSIMN can yield savings of up to 70 % with respect to GSMN, while generating a Markov network with comparable or in several cases considerably improved quality. In addition

### Citations

8786 | Introduction to Algorithms - Cormen, Leiserson, et al. - 1992 |

7314 |
Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference
- Pearl
- 1988
(Show Context)
Citation Context ...complexity and accuracy achieved by GSIMN, which is the main result of this work. The GSIMN algorithm extends GSMN by using Pearl’s theorems on the properties of the conditional independence relation =-=[16]-=- to infer additional independencies from a set of independencies resulting from statistical tests and previous inferences. The rest of the paper is organized as follows: Section 2 introduces notation,... |

3892 |
Stochastic Relaxation, Gibbs Distribution, and the Bayesian Restoration of Images
- Geman, Geman
- 1984
(Show Context)
Citation Context ...iently the joint probability distribution of a domain. They have been used in numerous application domains, ranging from discovering gene expression pathways in bioinformatics [6] to computer vision (=-=[7, 2]-=-, and more recently [12]). One problem that naturally arises is the construction of such models from data [9, 3]. A solution ∗ Special thanks to Adrian Silvescu for insightful comments on accuracy mea... |

1347 |
Categorical Data Analysis
- Agresti
- 1990
(Show Context)
Citation Context ...2.1 Statistical Independence Testing. To determine conditional independence between two variables X and Y given a set S from data we use Pearson’s conditional independence chi-square (χ 2 ) test (see =-=[1]-=- for details of its calculation). The χ 2 test returns a p-value, denoted as p, which is the probability of the error of assuming that the two variables are dependent when in fact they are not. We con... |

1189 |
Spatial interaction and the statistical analysis of lattice systems
- Besag
- 1974
(Show Context)
Citation Context ... each step of the structure search, a probabilistic inference step is necessary to evaluate the score (e.g., maximum likelihood, minimum description length, Lam & Bacchus, 1994, or pseudo-likelihood, =-=Besag, 1974-=-). For Bayesian networks this inference step is tractable and therefore several practical score-based algorithms for structure learning have been developed (Lam & Bacchus, 1994, Heckerman, 1995, Acid ... |

1131 | Graphical Models
- Lauritzen
- 1996
(Show Context)
Citation Context ...⊆ V, X is independent of Y given Z if and only if the set of vertices Z separates the set of vertices X from the set of vertices Y in the graph G (this is sometimes called the global Markov property, =-=Lauritzen, 1996-=-). In other words, this means that, after removing all vertices in Z from G (including all edges incident to each of them), there exists no (undirected) path in the remaining graph between any variabl... |

941 | Learning Bayesian networks: The combination of knowledge and statistical data
- Heckerman, Geiger, et al.
- 1995
(Show Context)
Citation Context ...ing from discovering gene expression pathways in bioinformatics [6] to computer vision ([7, 2], and more recently [12]). One problem that naturally arises is the construction of such models from data =-=[9, 3]-=-. A solution ∗ Special thanks to Adrian Silvescu for insightful comments on accuracy measures and general advice on the theory of undirected graphical models. to this problem, besides being theoretica... |

803 | On the acceptability of arguments and its fundamental role in nonmonotonic reasoning, logic programming and n-person games
- Dung
- 1995
(Show Context)
Citation Context ...pendencebased algorithms as exchanging run time complexity for sample complexity. In Chapter 4 we address this problem by proposing a mechanism to improve the quality of tests based on argumentation (=-=Dung, 1995-=-; Loui, 1987; Prakken, 1997; Prakken and Vreeswijk, 2002). We model the problem of independence-based structure discovery as a knowledge base containing a set of independences related through certain ... |

772 | D'Peier, "Using Bayesian networks to analyze expression data
- Friedman, Linial, et al.
(Show Context)
Citation Context ... used to represent efficiently the joint probability distribution of a domain. They have been used in numerous application domains, ranging from discovering gene expression pathways in bioinformatics =-=[6]-=- to computer vision ([7, 2], and more recently [12]). One problem that naturally arises is the construction of such models from data [9, 3]. A solution ∗ Special thanks to Adrian Silvescu for insightf... |

680 | UCI repository of machine learning databases - Hettich, Merz - 1998 |

670 | Approximating discrete probability distributions with dependence trees
- Chow, Liu
- 1968
(Show Context)
Citation Context ...an networks, the independence-based approach has been mainly exemplified by the SGS [20], PC [20], and the Grow-Shrink (GS) [15] algorithms, as well as algorithms for restricted classes such as trees =-=[4]-=- and polytrees [17]. Markov networks have been used in the physics and computer vision communities [7, 2] where they have been historically called Markov random fields. Recently there has been interes... |

606 |
An Introduction to Computational Learning Theory
- Kearns, Vazirani
- 1994
(Show Context)
Citation Context ...the existence of an independence-query oracle that can provide information about conditional independences among the domain variables. This can be viewed as an instance of a statistical query oracle (=-=Kearns & Vazirani, 1994-=-). In practice such an oracle does not exist; however, it can be implemented approximately by a statistical test evaluated on the data set D. For example, for discrete data this can be Pearson’s condi... |

564 | Inducing features of random fields - Pietra, Pietra, et al. - 1997 |

378 | Toward optimal feature selection
- Koiler, Sahami
- 1996
(Show Context)
Citation Context ...ve higher priority to those variables Y whose p-value with variable X is smaller (line 11). This ordering is a heuristic justified by the intuition of a well-known “folktheorem” (as Koller and Sahami =-=[13]-=- put it) that states that probabilistic influence or association between attributes tends to attenuate over distance in a graphical model. This suggests that pair of variables X and Y with low uncondi... |

322 | Polynomial-time approximation algorithms for the ising model
- Jerrum, Sinclair
- 1993
(Show Context)
Citation Context ...95, Acid & de Campos, 2003). For Markov networks however, probabilistic inference requires the calculation of a normalizing constant (also known as partition function), a problem known to be NP-hard (=-=Jerrum & Sinclair, 1993-=-, Barahona, 1982). A number of approaches have considered a restricted class of graphical models (e.g. Chow & Liu, 1968, Rebane & Pearl, 1989, Srebro & Karger, 2001). However, Srebro and Karger (2001)... |

312 | Bayesian networks
- Heckerman, Wellman
- 1995
(Show Context)
Citation Context ...fields of study (e.g., social sciences) that rely more on qualitative than quantitative models. There exist two broad classes of algorithms for learning the structure of graphical models: score-based =-=[14, 8]-=- and independence-based or constraint-based [20]. Score-based approaches conduct a search in the space of legal structures (of size super-exponential in the number of variables in the domain) in an at... |

254 | Operations for learning with graphical models
- Buntine
- 1994
(Show Context)
Citation Context ...ing from discovering gene expression pathways in bioinformatics [6] to computer vision ([7, 2], and more recently [12]). One problem that naturally arises is the construction of such models from data =-=[9, 3]-=-. A solution ∗ Special thanks to Adrian Silvescu for insightful comments on accuracy measures and general advice on the theory of undirected graphical models. to this problem, besides being theoretica... |

243 |
Bayesian image restoration with two applications in spatial statistics. Annu Inst Stat Math 43
- Besag, York, et al.
- 1991
(Show Context)
Citation Context ...9 AI Access Foundation. All rights reserved.Bromberg, Margaritis, & Honavar Figure 1: Example Markov network. The nodes represent variables in the domain V = {0, 1, 2, 3, 4, 5, 6, 7}. & Geman, 1984, =-=Besag, York, & Mollie, 1991-=-, Isard, 2003, Anguelov, Taskar, Chatalbashev, Koller, Gupta, Heitz, & Ng, 2005). One problem that naturally arises is the construction of such models from data (Heckerman, Geiger, & Chickering, 1995,... |

242 | An introduction to MCMC for machine learning - Andrieu, Freitas, et al. - 2003 |

203 | Conditional Independence in Statistical Theory - DAWID - 1979 |

195 | Learning Bayesian Belief Networks: An Approach Based on
- Lam, Bacchus
- 1994
(Show Context)
Citation Context ...fields of study (e.g., social sciences) that rely more on qualitative than quantitative models. There exist two broad classes of algorithms for learning the structure of graphical models: score-based =-=[14, 8]-=- and independence-based or constraint-based [20]. Score-based approaches conduct a search in the space of legal structures (of size super-exponential in the number of variables in the domain) in an at... |

191 | Efficiently inducing features of conditional random fields. UAI - McCallum - 2003 |

144 | An empirical Bayes approach to inferring large-scale gene association networks - Schäfer, Strimmer - 2005 |

132 | Sparse graphical models for exploring gene expression data - Dobra, Hans, et al. |

130 |
On the computational complexity of Ising spin glass model
- Barahona
- 1982
(Show Context)
Citation Context ...3). For Markov networks however, probabilistic inference requires the calculation of a normalizing constant (also known as partition function), a problem known to be NP-hard (Jerrum & Sinclair, 1993, =-=Barahona, 1982-=-). A number of approaches have considered a restricted class of graphical models (e.g. Chow & Liu, 1968, Rebane & Pearl, 1989, Srebro & Karger, 2001). However, Srebro and Karger (2001) prove that find... |

118 | A model for belief revision
- Martins, Shapiro
- 1988
(Show Context)
Citation Context ...ies by removing a subset of propositions such that the resulting KB becomes consistent; this is called belief revision in the literature (Gärdenforst, 1992; Gärdenforst and Rott, 1995; Shapiro, 1998; =-=Martins, 1992-=-). A known shortcoming (Shapiro, 1998) of belief revision stems from the fact that it removes propositions, which, besides discarding potentially valuable information, has the same potential problem a... |

116 | Discriminative learning of markov random fields for segmentation of 3d scan data
- Anguelov, Taskar, et al.
- 2005
(Show Context)
Citation Context ...d.Bromberg, Margaritis, & Honavar Figure 1: Example Markov network. The nodes represent variables in the domain V = {0, 1, 2, 3, 4, 5, 6, 7}. & Geman, 1984, Besag, York, & Mollie, 1991, Isard, 2003, =-=Anguelov, Taskar, Chatalbashev, Koller, Gupta, Heitz, & Ng, 2005-=-). One problem that naturally arises is the construction of such models from data (Heckerman, Geiger, & Chickering, 1995, Buntine, 1994). A solution to this problem, besides being theoretically intere... |

109 |
An introduction to categorical data analysis (2nd Ed
- Agresti
- 2007
(Show Context)
Citation Context ...; however, it can be implemented approximately by a statistical test evaluated on the data set D. For example, for discrete data this can be Pearson’s conditional independence chi-square (χ 2 ) test (=-=Agresti, 2002-=-), a mutual information test etc. For continuous Gaussian data a statistical test that can be used to measure conditional independence is partial correlation (Spirtes et al., 2000). To determine condi... |

100 | How to reason defeasibly
- Pollock
- 1992
(Show Context)
Citation Context ...s contradicted by other rules and/or propositions in the KB (a more precise definition is given below). Argumentation is a reasoning model that belongs to the broader class of defeasi-81 ble logics (=-=Pollock, 1992-=-; Prakken, 1997). Our approach uses the argumentation framework of Amgoud and Cayrol (2002) that considers preferences over arguments, extending Dung’s more fundamental framework (Dung, 1995). Prefere... |

96 | Pampas: Real-valued graphical models for computer vision
- Isard
- 2003
(Show Context)
Citation Context ...ty distribution of a domain. They have been used in numerous application domains, ranging from discovering gene expression pathways in bioinformatics [6] to computer vision ([7, 2], and more recently =-=[12]-=-). One problem that naturally arises is the construction of such models from data [9, 3]. A solution ∗ Special thanks to Adrian Silvescu for insightful comments on accuracy measures and general advice... |

86 | A reasoning model based on the production of acceptable arguments - Amgoud, Cayrol - 2002 |

80 | The Max-Min Hill-Climbing Bayesian Network Structure Learning Algorithm
- Tsamardinos, Brown, et al.
(Show Context)
Citation Context ...mardinos, Aliferis, & Statnikov, 2003a), HITON-PC and HITON-MB (Aliferis, Tsamardinos, & Statnikov, 2003), MMPC and MMMB (Tsamardinos, Aliferis, & Statnikov, 2003b), and max-min hill climbing (MMHC) (=-=Tsamardinos, Brown, & Aliferis, 2006-=-), all of which are widely used in the field. Algorithms for restricted classes such as trees (Chow & Liu, 1968) and polytrees (Rebane & Pearl, 1989) also exist. For learning Markov networks previous ... |

73 |
Defeat among arguments: a system of defeasible inference
- LOUI
- 1987
(Show Context)
Citation Context ...d algorithms as exchanging run time complexity for sample complexity. In Chapter 4 we address this problem by proposing a mechanism to improve the quality of tests based on argumentation (Dung, 1995; =-=Loui, 1987-=-; Prakken, 1997; Prakken and Vreeswijk, 2002). We model the problem of independence-based structure discovery as a knowledge base containing a set of independences related through certain axioms (a se... |

68 |
Graphoids: a GraphBased Logic for Reasoning about Relevance Relations
- Pearl, Paz, et al.
- 1987
(Show Context)
Citation Context ...eferred to as the test’s confidence threshold. We use the standard value of α = 0.05 in all our experiments, which corresponds to a confidence threshold of 95%. In a faithful domain, it can be shown (=-=Pearl & Paz, 1985-=-) that an edge exists between two variables X ̸= Y ∈ V in the Markov network of that domain if an only if they are dependent conditioned on all remaining variables in the domain, i.e., (X, Y ) is an e... |

66 | Bayesian network induction via local neighborhoods
- Margaritis, Thrun
- 1999
(Show Context)
Citation Context ...s work we present two algorithms that belong to the latter class. For Bayesian networks, the independence-based approach has been mainly exemplified by the SGS [20], PC [20], and the Grow-Shrink (GS) =-=[15]-=- algorithms, as well as algorithms for restricted classes such as trees [4] and polytrees [17]. Markov networks have been used in the physics and computer vision communities [7, 2] where they have bee... |

63 |
Logical Tools for Modelling Legal Argument. A Study of Defeasible Reasoning in
- Prakken
- 1997
(Show Context)
Citation Context ... as exchanging run time complexity for sample complexity. In Chapter 4 we address this problem by proposing a mechanism to improve the quality of tests based on argumentation (Dung, 1995; Loui, 1987; =-=Prakken, 1997-=-; Prakken and Vreeswijk, 2002). We model the problem of independence-based structure discovery as a knowledge base containing a set of independences related through certain axioms (a set of relationsh... |

56 | 2001b. Active learning for structure in Bayesian networks - Tong, Koller |

53 | Learning Markov networks: Maximum bounded tree-width graphs
- Karger, Srebro
(Show Context)
Citation Context ...re, climatology, ecology and others [19]. Considerable work in the area of structure learning of undirected graphical models has concentrated on the learning of decomposable (also called chordal) MNs =-=[21]-=-, as they render parameters learning and inference more tractable. These approaches proceed by constraining the output network to be in the class of decomposable structures. In this work we concentrat... |

48 |
The recovery of causal polytrees from statistical data
- Rebane, Pearl
- 1987
(Show Context)
Citation Context ...ndependence-based approach has been mainly exemplified by the SGS [20], PC [20], and the Grow-Shrink (GS) [15] algorithms, as well as algorithms for restricted classes such as trees [4] and polytrees =-=[17]-=-. Markov networks have been used in the physics and computer vision communities [7, 2] where they have been historically called Markov random fields. Recently there has been interest in their use for ... |

47 | Learning factor graphs in polynomial time and sample complexity - Abbeel, Koller, et al. |

44 | 2003a) HITON: a novel Markov blanket algorithm for optimal variable selection
- Aliferis, Tsamardinos, et al.
(Show Context)
Citation Context ...tep in learning the Bayesian network structure such as Grow-Shrink (GS) algorithm (Margaritis & Thrun, 2000), IAMB and its variants (Tsamardinos, Aliferis, & Statnikov, 2003a), HITON-PC and HITON-MB (=-=Aliferis, Tsamardinos, & Statnikov, 2003-=-), MMPC and MMMB (Tsamardinos, Aliferis, & Statnikov, 2003b), and max-min hill climbing (MMHC) (Tsamardinos, Brown, & Aliferis, 2006), all of which are widely used in the field. Algorithms for restric... |

41 | Stochastic independence, causal independence and shieldability - Spohn - 1980 |

36 |
Some methods for strengthening the common χ2 tests
- Cochran
- 1954
(Show Context)
Citation Context ...table with one data point per cell we would need a data set of exponential size i.e., N = 2 n . Exacerbating this problem, more than that is typically necessary for a reliable test: As recommended by =-=[5]-=-, if more than 20% of the contingency tables has less than 5 counts the tests is deemed unreliable. Both GSMN and GSIMN algorithms (presented later in the paper) attempt to minimize the conditioning s... |

33 | Algorithms for large scale Markov blanket discovery
- Tsamardinos, Aliferis, et al.
(Show Context)
Citation Context ... et al., 2000), and algorithms that learn the Markov blanket as a step in learning the Bayesian network structure such as Grow-Shrink (GS) algorithm (Margaritis & Thrun, 2000), IAMB and its variants (=-=Tsamardinos, Aliferis, & Statnikov, 2003-=-a), HITON-PC and HITON-MB (Aliferis, Tsamardinos, & Statnikov, 2003), MMPC and MMMB (Tsamardinos, Aliferis, & Statnikov, 2003b), and max-min hill climbing (MMHC) (Tsamardinos, Brown, & Aliferis, 2006)... |

30 | 2003b) Time and sample efficient discovery of Markov blankets and direct causal relations
- Tsamardinos, Aliferis, et al.
(Show Context)
Citation Context ... et al., 2000), and algorithms that learn the Markov blanket as a step in learning the Bayesian network structure such as Grow-Shrink (GS) algorithm (Margaritis & Thrun, 2000), IAMB and its variants (=-=Tsamardinos, Aliferis, & Statnikov, 2003-=-a), HITON-PC and HITON-MB (Aliferis, Tsamardinos, & Statnikov, 2003), MMPC and MMMB (Tsamardinos, Aliferis, & Statnikov, 2003b), and max-min hill climbing (MMHC) (Tsamardinos, Brown, & Aliferis, 2006)... |

28 |
Propagation of uncertainty by probabilistic logic sampling in bayes’ networks
- HENRION
- 1988
(Show Context)
Citation Context ...s using 5 real-world domains: Hailfinder, Insurance, Alarm, Mildew, and Water. For each domain we sampled a varying number of data points from its corresponding Bayesian network using logic sampling (=-=Henrion, 1988-=-), and used it as input to the GSMN ∗ (with and without propagation) and GSIMN algorithms. We then compared the network output from each of these algorithms to the original moralized network using the... |

26 | Nonlinear markov networks for continuous variables
- Hofmann, Tresp
- 1997
(Show Context)
Citation Context ...in the domain), without any further restrictions other than the results of independence tests conducted on the data. An example of learning (non-decomposable) MNs is presented by Hofmann and Tresp in =-=[11]-=-, which is a score-based approach for learning structure in continuous domains with non-linear relationships among the domain attributes. There are no cases in the literature of independence-based str... |

26 | A robust procedure for Gaussian graphical model search from microarray data with p larger than
- Castelo, Roverato
(Show Context)
Citation Context ... in linear dependences among the variables with Gaussian noise (Whittaker, 1990, Edwards, 2000). More recent approaches are included in the works of Dobra, Hans, Jones, Nevins, Yao, and West (2004), (=-=Castelo & Roverato, 2006-=-), Peña (2008), and Schäfer and Strimmer (2005), that focus on applications of Gaussian graphical models in Bioinformatics. While we do not make the assumption of continuous Gaussian variables in this... |

26 | Random generation of Bayesian networks
- Ide, Cozman
- 2002
(Show Context)
Citation Context ...e accuracies and the line curve shows their difference i.e., a positive value correspond to an improvement in the accuracy of AITb-D over the accuracy of SIT. the resulting graphs) using BNGenerator (=-=Ide and Cozman, 2002-=-), a publicly available Java package. For n = 6 we generated ten networks with τ = 3 and ten networks with τ = 5. For n = 8 we generated ten networks for τ = 3 and another ten for τ = 7. For each data... |

20 |
An introduction to spatial point processes and Markov random fields
- Isham
- 1981
(Show Context)
Citation Context ...f constructing a probability distribution that preserves the dependency structure of an arbitrary graph G. For undirected models this problem has been addressed by the theory of Markov random fields (=-=Isham, 1981-=-; Lauritzen, 1982), which provides a method for constructing the Gibbs distribution for an arbitrary undirected graph G: 1. Identify the (maximal) cliques of G. A clique is a maximal subgraph of G who... |