## Searching for Bayesian Network Structures in the Space of Restricted Acyclic Aprtially Directed Graphs (2003)

Venue: | Journal of Artificial Intelligence Research |

Citations: | 15 - 2 self |

### BibTeX

@ARTICLE{Acid03searchingfor,

author = {Silva Acid and Luis M. de Campos},

title = {Searching for Bayesian Network Structures in the Space of Restricted Acyclic Aprtially Directed Graphs},

journal = {Journal of Artificial Intelligence Research},

year = {2003},

volume = {18},

pages = {445--490}

}

### Years of Citing Articles

### OpenURL

### Abstract

Although many algorithms have been designed to construct Bayesian network structures using dierent approaches and principles, they all employ only two methods: those based on independence criteria, and those based on a scoring function and a search procedure (although some methods combine the two). Within the score+search paradigm, the dominant approach uses local search methods in the space of directed acyclic graphs (DAGs), where the usual choices for de ning the elementary modi cations (local changes) that can be applied are arc addition, arc deletion, and arc reversal. In this paper, we propose a new local search method that uses a dierent search space, and which takes account of the concept of equivalence between network structures: restricted acyclic partially directed graphs (RPDAGs). In this way, the number of dierent con gurations of the search space is reduced, thus improving eciency. Moreover, although the nal result must necessarily be a local optimum given the nature of the search method, the topology of the new search space, which avoids making early decisions about the directions of the arcs, may help to nd better local optima than those obtained by searching in the DAG space.

### Citations

7054 |
Probabilistic reasoning in intelligent systems
- Pearl
- 1988
(Show Context)
Citation Context ...evaluation of the proposed search method on several test problems, including the well-known Alarm Monitoring System, are also presented. 1. Introduction Nowadays, the usefulness of Bayesian networks (=-=Pearl, 1988-=-) in representing knowledge with uncertainty and efficient reasoning is widely accepted. A Bayesian network consists of a qualitative part, a directed acyclic graph (DAG), and a quantitative one, a co... |

2307 |
Estimating the dimension of a model,” The
- Schwarz
- 1978
(Show Context)
Citation Context ...number of edges in the learned networks and the Hamming distances, we have collected two additional performance measures: BIC.- The value of the BIC (Bayesian Information Criterion) scoring function (=-=Schwarz, 1978-=-) for the learned network. This value measures the quality of the network using maximum likelihood and a penalty term. Note that BIC is also score-equivalent and decomposable. KL.- The Kullback-Leible... |

1155 | Information Theory and Statistics
- Kullback
- 1959
(Show Context)
Citation Context ...we have also used the following algorithms: • pc (Spirtes et al., 1993), an algorithm based on independence tests. We used an independence test based on the measure of conditional mutual information (=-=Kullback, 1968-=-), with a fixed confidence level equal to 0.99. • The K2 search method (Cooper & Herskovits, 1992), in combination with the BDeu scoring function (k2). Note that k2 needs an ordering of the variables ... |

1075 | Herskovitz: A Bayesian Method for the Induction
- Cooper, E
- 1992
(Show Context)
Citation Context ...he scoring functions are based on different principles, such as entropy (Herskovits & Cooper, 1990; Chow & Liu, 1968; de Campos, 1998; Rebane & Pearl, 1987), Bayesian approaches (Buntine, 1994, 1996; =-=Cooper & Herskovits, 1992-=-; Friedman & Koller, 2000; Friedman, Nachman & Peér, 1999; Geiger & Heckerman, 1995; Heckerman, 1996; Heckerman, Geiger & Chickering, 1995; Madigan & Raftery, 1994; Ramoni & Sebastiani, 1997; Steck, 2... |

905 |
An Introduction to Bayesian Networks
- Jensen
- 1996
(Show Context)
Citation Context ...n all, Bayesian networks provide a very intuitive graphical tool for representing available knowledge. Another attraction of Bayesian networks is their ability to efficiently perform reasoning tasks (=-=Jensen, 1996-=-; Pearl, 1988). The independences represented in the DAG are the key to this ability, reducing changes in the knowledge state to local computations. In addition, important savings in storage requireme... |

903 | Learning Bayesian networks: The combination of knowledge and statistical data
- Heckerman, Geiger, et al.
- 1995
(Show Context)
Citation Context ...+search approach. Although algorithms in this category have commonly used local search methods (Buntine, 1991; Cooper & Herskovits, 1992; Chickering, Geiger & Heckerman, 1995; de Campos et al., 2003; =-=Heckerman et al., 1995-=-), due to the exponentially large size of the search space, there is a growing interest in other heuristic search methods, i.e. simulated annealing (Chickering et al., 1995), 1. Some authors also use ... |

637 | Approximating discrete probability distributions with dependence trees
- Chow, Liu
- 1968
(Show Context)
Citation Context ... data. Each algorithm is characterized by the specific scoring function and search procedure used. The scoring functions are based on different principles, such as entropy (Herskovits & Cooper, 1990; =-=Chow & Liu, 1968-=-; de Campos, 1998; Rebane & Pearl, 1987), Bayesian approaches (Buntine, 1994, 1996; Cooper & Herskovits, 1992; Friedman & Koller, 2000; Friedman, Nachman & Peér, 1999; Geiger & Heckerman, 1995; Hecker... |

601 | Tabu search
- Glover, Laguna
- 1993
(Show Context)
Citation Context ...e aim to test the behavior of the search in the RPDAG space when used in combination with a search heuristic which is more powerful than a simple greedy search. The heuristic selected is Tabu Search (=-=Glover, 1989-=-; Bouckaert, 1995), which tries to escape from a local maximum by selecting a solution that minimally decreases the value of the scoring function; immediate re-selection of the local maximum just visi... |

265 | Models selection and accounting for model uncertainty in graphical models using occam’s window
- Madigan, Raftery
- 1994
(Show Context)
Citation Context ...n approaches (Buntine, 1994, 1996; Cooper & Herskovits, 1992; Friedman & Koller, 2000; Friedman, Nachman & Peér, 1999; Geiger & Heckerman, 1995; Heckerman, 1996; Heckerman, Geiger & Chickering, 1995; =-=Madigan & Raftery, 1994-=-; Ramoni & Sebastiani, 1997; Steck, 2000), or the Minimum Description Length (Bouckaert, 1993; Friedman & Goldszmidt, 1996; Lam & Bacchus, 1994; Suzuki, 1993, 1996; Tian, 2000). There are also hybrid ... |

248 | Operations for learning with graphical model
- Buntine
- 1994
(Show Context)
Citation Context ...rch procedure used. The scoring functions are based on different principles, such as entropy (Herskovits & Cooper, 1990; Chow & Liu, 1968; de Campos, 1998; Rebane & Pearl, 1987), Bayesian approaches (=-=Buntine, 1994-=-, 1996; Cooper & Herskovits, 1992; Friedman & Koller, 2000; Friedman, Nachman & Peér, 1999; Geiger & Heckerman, 1995; Heckerman, 1996; Heckerman, Geiger & Chickering, 1995; Madigan & Raftery, 1994; Ra... |

239 |
The ALARM monitoring system: A case study with two probabilistic inference techniques for belief networks
- Beinlich, Suermondt, et al.
- 1989
(Show Context)
Citation Context ...riginal submission of this paper. 472Searching Bayesian network structures in the space of RPDAGs The Alarm network displays the relevant variables and relationships for the Alarm Monitoring System (=-=Beinlich et al., 1989-=-), a diagnostic application for patient monitoring. This network, which contains 37 variables and 46 arcs, has been considered as a benchmark for evaluating Bayesian network learning algorithms. The i... |

234 | Learning bayesian networks with local structure
- Friedman, Goldszmidt
- 1998
(Show Context)
Citation Context ... Geiger & Heckerman, 1995; Heckerman, 1996; Heckerman, Geiger & Chickering, 1995; Madigan & Raftery, 1994; Ramoni & Sebastiani, 1997; Steck, 2000), or the Minimum Description Length (Bouckaert, 1993; =-=Friedman & Goldszmidt, 1996-=-; Lam & Bacchus, 1994; Suzuki, 1993, 1996; Tian, 2000). There are also hybrid algorithms that use a combination of constraint-based and scoringbased methods: In several works (Singh & Valtorta, 1993, ... |

217 |
Equivalence and synthesis of causal models
- Verma, Pearl
- 1990
(Show Context)
Citation Context ...pos & Huete, 2000b; de Campos & Puerta, 2001b; Friedman & Koller, 2000; Larrañaga, Kuijpers & Murga, 1996). In this paper, however, we are more interested in the space of equivalence classes of DAGs (=-=Pearl & Verma, 1990-=-), i.e. classes of DAGs with each representing a different set of probability distributions. There is also a number of learning algorithms that carry out the search in this space (Andersson, Madigan &... |

202 | Being bayesian about network structure
- Friedman, Koller
- 2000
(Show Context)
Citation Context ...sed on different principles, such as entropy (Herskovits & Cooper, 1990; Chow & Liu, 1968; de Campos, 1998; Rebane & Pearl, 1987), Bayesian approaches (Buntine, 1994, 1996; Cooper & Herskovits, 1992; =-=Friedman & Koller, 2000-=-; Friedman, Nachman & Peér, 1999; Geiger & Heckerman, 1995; Heckerman, 1996; Heckerman, Geiger & Chickering, 1995; Madigan & Raftery, 1994; Ramoni & Sebastiani, 1997; Steck, 2000), or the Minimum Desc... |

188 | Learning Bayesian Belief Networks : An Approach Based on
- Lam, Bacchus
- 1994
(Show Context)
Citation Context ...ckerman, 1996; Heckerman, Geiger & Chickering, 1995; Madigan & Raftery, 1994; Ramoni & Sebastiani, 1997; Steck, 2000), or the Minimum Description Length (Bouckaert, 1993; Friedman & Goldszmidt, 1996; =-=Lam & Bacchus, 1994-=-; Suzuki, 1993, 1996; Tian, 2000). There are also hybrid algorithms that use a combination of constraint-based and scoringbased methods: In several works (Singh & Valtorta, 1993, 1995; Spirtes & Meek,... |

183 | Theory refinement on Bayesian networks
- Buntine
- 1991
(Show Context)
Citation Context ...rch process is limited by the results of some independence tests. In this paper, we focus on the scoring+search approach. Although algorithms in this category have commonly used local search methods (=-=Buntine, 1991-=-; Cooper & Herskovits, 1992; Chickering, Geiger & Heckerman, 1995; de Campos et al., 2003; Heckerman et al., 1995), due to the exponentially large size of the search space, there is a growing interest... |

180 | Learning of Bayesian network structure from massive datasets: The “sparse candidate” algorithm - Friedman, Nachman, et al. - 1999 |

172 | A guide to the literature on learning probabilistic networks from data - Buntine - 1996 |

158 | Adaptive probabilistic networks with hidden variables
- Binder, Koller, et al.
- 1997
(Show Context)
Citation Context ...ases in the Alarm database, for k = 3000,5000 and 10000). 10 21 22 13 15 19 20 31 23 16 6 5 4 27 11 32 34 35 36 37 28 29 12 24 17 7 8 9 33 14 25 18 26 30 1 2 3 Figure 18: The Alarm network Insurance (=-=Binder et al., 1997-=-) is a network for evaluating car insurance risks. The Insurance network contains 27 variables and 52 arcs. In our experiments, we have used five databases containing 10000 cases, generated from the I... |

129 | Learning equivalence classes of Bayesian-network structure
- Chickering
- 2002
(Show Context)
Citation Context ...of DAGs with each representing a different set of probability distributions. There is also a number of learning algorithms that carry out the search in this space (Andersson, Madigan & Perlman, 1997; =-=Chickering, 1996-=-; Dash & Druzdzel, 1999; Madigan, Anderson, Perlman & Volinsky, 1996; Spirtes & Meek, 1995). This feature reduces the size of the search space, although recent results (Gillispie & Perlman, 2001) conf... |

98 | eld, \Machine learning library in C - Kohavi, Sommer - 1996 |

91 | A characterization of Markov equivalence classes for acyclic digraphs
- ANDERSSON, MADIGAN, et al.
- 1997
(Show Context)
Citation Context ...lence class, and a link for every reversible arc. This kind of representation has been given several names: pattern (Spirtes & Meek, 1995), completed PDAG (CPDAG) (Chickering, 1996), essential graph (=-=Andersson et al., 1997-=-; Dash & Druzdzel, 1999). As a consequence of theorem 1, a completed PDAG possesses an arc x → y if and only if a triplet of nodes (x,y,z) forms a v-structure or the arc x → y is required to be direct... |

91 | A transformational characterization of equivalent Bayesian network structures
- Chickering
- 1995
(Show Context)
Citation Context ...rl, 1987), Bayesian approaches (Buntine, 1994, 1996; Cooper & Herskovits, 1992; Friedman & Koller, 2000; Friedman, Nachman & Peér, 1999; Geiger & Heckerman, 1995; Heckerman, 1996; Heckerman, Geiger & =-=Chickering, 1995-=-; Madigan & Raftery, 1994; Ramoni & Sebastiani, 1997; Steck, 2000), or the Minimum Description Length (Bouckaert, 1993; Friedman & Goldszmidt, 1996; Lam & Bacchus, 1994; Suzuki, 1993, 1996; Tian, 2000... |

82 |
Causal inference and causal explanation with background knowledge
- Meek
- 1995
(Show Context)
Citation Context ...s (de Campos, 1998; de Campos & Huete, 1997; Geiger, Paz & Pearl, 1990, 1993; Huete & de Campos, 1993), whereas other are designed for general DAGs (de Campos & Huete, 2000a; Cheng, Bell & Liu, 1997; =-=Meek, 1995-=-; Spirtes, Glymour & Scheines, 1993; Verma & Pearl, 1990; Wermuth & Lauritzen, 1983). The algorithms based on a scoring function attempt to find a graph that maximizes the selected score; the scoring ... |

77 | An algorithm for the construction of Bayesian network structures from data
- Singh, Valtorta
- 1993
(Show Context)
Citation Context ...edman & Goldszmidt, 1996; Lam & Bacchus, 1994; Suzuki, 1993, 1996; Tian, 2000). There are also hybrid algorithms that use a combination of constraint-based and scoringbased methods: In several works (=-=Singh & Valtorta, 1993-=-, 1995; Spirtes & Meek, 1995; Dash & Druzdzel, 1999; de Campos, Fernández-Luna & Puerta, 2003) the independence-based and scoring-based algorithms are maintained as separate processes, which are combi... |

73 |
D.: Learning Bayesian networks: Search methods and experimental results
- Chickering, Geiger, et al.
- 1995
(Show Context)
Citation Context ...995; de Campos et al., 2003; Heckerman et al., 1995), due to the exponentially large size of the search space, there is a growing interest in other heuristic search methods, i.e. simulated annealing (=-=Chickering et al., 1995-=-), 1. Some authors also use the term scoring metric. 446Searching Bayesian network structures in the space of RPDAGs tabu search (Bouckaert, 1995; Muntenau & Cau, 2000), branch and bound (Tian, 2000)... |

71 | Structure learning of bayesian networks by genetical algorithms - LARRAGANA, POZA - 1994 |

60 |
Bayesian networks for knowledge discovery
- Heckerman
- 1996
(Show Context)
Citation Context ..., 1968; de Campos, 1998; Rebane & Pearl, 1987), Bayesian approaches (Buntine, 1994, 1996; Cooper & Herskovits, 1992; Friedman & Koller, 2000; Friedman, Nachman & Peér, 1999; Geiger & Heckerman, 1995; =-=Heckerman, 1996-=-; Heckerman, Geiger & Chickering, 1995; Madigan & Raftery, 1994; Ramoni & Sebastiani, 1997; Steck, 2000), or the Minimum Description Length (Bouckaert, 1993; Friedman & Goldszmidt, 1996; Lam & Bacchus... |

59 | KUTATO: An Entropy-Driven System for Construction of Probabilistic Expert Systems from Databases - Herskovits, Cooper - 1990 |

58 |
Bayesian belief networks: from construction to evidence
- Bouckaert
- 1995
(Show Context)
Citation Context ...tic search methods, i.e. simulated annealing (Chickering et al., 1995), 1. Some authors also use the term scoring metric. 446Searching Bayesian network structures in the space of RPDAGs tabu search (=-=Bouckaert, 1995-=-; Muntenau & Cau, 2000), branch and bound (Tian, 2000), genetic algorithms and evolutionary programming (Larrañaga, Poza, Yurramendi, Murga & Kuijpers, 1996; Myers, Laskey & Levitt, 1999; Wong, Lam & ... |

57 | P.Sebastiani,“Learning Bayesian networks from incomplete data
- Ramoni
- 1997
(Show Context)
Citation Context ...94, 1996; Cooper & Herskovits, 1992; Friedman & Koller, 2000; Friedman, Nachman & Peér, 1999; Geiger & Heckerman, 1995; Heckerman, 1996; Heckerman, Geiger & Chickering, 1995; Madigan & Raftery, 1994; =-=Ramoni & Sebastiani, 1997-=-; Steck, 2000), or the Minimum Description Length (Bouckaert, 1993; Friedman & Goldszmidt, 1996; Lam & Bacchus, 1994; Suzuki, 1993, 1996; Tian, 2000). There are also hybrid algorithms that use a combi... |

54 | Learning Bayesian network structures by searching for the best ordering with genetic algorithms - Larrañaga, Kuijpers, et al. - 1996 |

47 |
The recovery of causal poly-trees from statistical data
- Rebane, Pearl
- 1989
(Show Context)
Citation Context ...zed by the specific scoring function and search procedure used. The scoring functions are based on different principles, such as entropy (Herskovits & Cooper, 1990; Chow & Liu, 1968; de Campos, 1998; =-=Rebane & Pearl, 1987-=-), Bayesian approaches (Buntine, 1994, 1996; Cooper & Herskovits, 1992; Friedman & Koller, 2000; Friedman, Nachman & Peér, 1999; Geiger & Heckerman, 1995; Heckerman, 1996; Heckerman, Geiger & Chickeri... |

43 | W.: An algorithm for bayesian belief network construction from data
- Cheng, Bell, et al.
- 1997
(Show Context)
Citation Context ...ordering of the variables as the input. We used an ordering consistent with the topology of the corresponding networks. • Another algorithm, BN Power Constructor (bnpc), that uses independence tests (=-=Cheng et al., 1997-=-; Cheng, Bell & Liu, 1998). The two independence-based algorithms, pc and bnpc, operate on the space of equivalence classes, whereas k2 explores the space of DAGs which are compatible with a given ord... |

39 |
Hailfinder: a Bayesian system for forecasting severe weather
- Abramson, Brown, et al.
- 1996
(Show Context)
Citation Context ...ance risks. The Insurance network contains 27 variables and 52 arcs. In our experiments, we have used five databases containing 10000 cases, generated from the Insurance Bayesian network. Hailfinder (=-=Abramson et al., 1996-=-) is a normative system that forecasts severe summer hail in northeastern Colorado. The Hailfinder network contains 56 variables and 66 arcs. In this case, we have also used five databases with 10000 ... |

38 | Bayesian model averaging and model selection for Markov equivalence classes of acyclic digraphs - MADIGAN, ANDERSSON, et al. - 1996 |

37 | Improved learning of bayesian networks
- Kocka, Castelo
- 2001
(Show Context)
Citation Context ...d (Tian, 2000), genetic algorithms and evolutionary programming (Larrañaga, Poza, Yurramendi, Murga & Kuijpers, 1996; Myers, Laskey & Levitt, 1999; Wong, Lam & Leung, 1999), Markov chain Monte Carlo (=-=Kocka & Castelo, 2001-=-; Myers et al., 1999), variable neighborhood search (de Campos & Puerta, 2001a; Puerta, 2001), ant colony optimization (de Campos, Fernández-Luna, Gámez & Puerta, 2002; Puerta, 2001), greedy randomize... |

35 | Learning Bayesian Networks from Data: An Efficient Approach Based on Information Theory - Cheng, Bell, et al. - 1998 |

29 | A hybrid anytime algorithm for the construction of causal models from sparse data
- Dash, Druzdzel
- 1999
(Show Context)
Citation Context ...i, 1993, 1996; Tian, 2000). There are also hybrid algorithms that use a combination of constraint-based and scoringbased methods: In several works (Singh & Valtorta, 1993, 1995; Spirtes & Meek, 1995; =-=Dash & Druzdzel, 1999-=-; de Campos, Fernández-Luna & Puerta, 2003) the independence-based and scoring-based algorithms are maintained as separate processes, which are combined in some way, whereas the hybridization proposed... |

28 | Probabilistic network construction using the minimum description length principle
- Bouckaert
- 1993
(Show Context)
Citation Context ...man & Peér, 1999; Geiger & Heckerman, 1995; Heckerman, 1996; Heckerman, Geiger & Chickering, 1995; Madigan & Raftery, 1994; Ramoni & Sebastiani, 1997; Steck, 2000), or the Minimum Description Length (=-=Bouckaert, 1993-=-; Friedman & Goldszmidt, 1996; Lam & Bacchus, 1994; Suzuki, 1993, 1996; Tian, 2000). There are also hybrid algorithms that use a combination of constraint-based and scoringbased methods: In several wo... |

28 | Learning causal trees from dependence information - Geiger, Paz, et al. - 1990 |

27 | A new approach for learning belief networks using independence criteria - Campos, Huete - 2000 |

27 | Computer-based probabilistic networks construction - Herskovits - 1991 |

24 | A characterization of the Dirichlet distribution with application to learning Bayesian networks
- Geiger, Heckerman
- 1995
(Show Context)
Citation Context ...& Cooper, 1990; Chow & Liu, 1968; de Campos, 1998; Rebane & Pearl, 1987), Bayesian approaches (Buntine, 1994, 1996; Cooper & Herskovits, 1992; Friedman & Koller, 2000; Friedman, Nachman & Peér, 1999; =-=Geiger & Heckerman, 1995-=-; Heckerman, 1996; Heckerman, Geiger & Chickering, 1995; Madigan & Raftery, 1994; Ramoni & Sebastiani, 1997; Steck, 2000), or the Minimum Description Length (Bouckaert, 1993; Friedman & Goldszmidt, 19... |

22 | Learning simple causal structures - Geiger, Paz, et al. - 1993 |

22 | Learning Bayesian Networks from Incomplete Data Using Evolutionary Algorithms
- Myers, Laskey, et al.
- 1999
(Show Context)
Citation Context ... algorithms and evolutionary programming (Larrañaga, Poza, Yurramendi, Murga & Kuijpers, 1996; Myers, Laskey & Levitt, 1999; Wong, Lam & Leung, 1999), Markov chain Monte Carlo (Kocka & Castelo, 2001; =-=Myers et al., 1999-=-), variable neighborhood search (de Campos & Puerta, 2001a; Puerta, 2001), ant colony optimization (de Campos, Fernández-Luna, Gámez & Puerta, 2002; Puerta, 2001), greedy randomized adaptive search pr... |

19 | Ant colony optimization for learning Bayesian networks - Campos, Fernndez-Luna, et al. - 2002 |

18 | Independency relationships and learning algorithms for singly connected networks - Campos - 1998 |

16 | A Simple Algorithm to Construct a Consistent Extension of a Partially Oriented Graph
- Dor, Tarsi
- 1992
(Show Context)
Citation Context ...tion of G ′ is then recovered from its consistent extension H ′ . The process of checking the existence of a consistent extension and generating it is carried out with a procedure called PDAG-to-DAG (=-=Dor & Tarsi, 1992-=-), which runs in time O(n·e) in the worst case, where e denotes the number of edges in the PDAG. Another procedure, called DAG-to-PDAG, is invoked in order to obtain the completed PDAG representation ... |

13 | Enumerating markov equivalence classes of acyclic digraph models
- Gillispie, Perlman
- 2001
(Show Context)
Citation Context ...Perlman, 1997; Chickering, 1996; Dash & Druzdzel, 1999; Madigan, Anderson, Perlman & Volinsky, 1996; Spirtes & Meek, 1995). This feature reduces the size of the search space, although recent results (=-=Gillispie & Perlman, 2001-=-) confirm that this reduction is not as important in terms of the DAG space as previously hoped (the ratio of the number of DAGs to the number of equivalence classes is lower than four). The price we ... |