## Improved search for structure learning of large bayesian networks

Citations: | 1 - 0 self |

### BibTeX

@TECHREPORT{Herscovici_improvedsearch,

author = {Avi Herscovici and Oliver Brock},

title = {Improved search for structure learning of large bayesian networks},

institution = {},

year = {}

}

### OpenURL

### Abstract

The problem of Bayesian network structure learning is defined as an optimization problem over the space of all possible network structures. For low-dimensional data, optimal structure learning approaches exist. For high-dimensional data, structure learning remains a significant challenge. Most commonly, approaches to high-dimensional structure learning employ a reduced search space and apply hill climbing methods to find high-scoring network structures. But even the reduced search space contains many local optima so that local search methods are unable to find near-optimal network structures. Instead of focusing on search space reduction, as most of the previous work in this area, we propose to replace the greedy search schemes with more effective search methods. We show that for high-dimensional data the proposed search method finds significantly better structures than other leading approaches to structure learning. 1

### Citations

7556 |
Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference
- Pearl
- 1988
(Show Context)
Citation Context ...ut gene regulation by understanding statistical dependencies in gene expression data [10]. An effective method for extracting such causal relationships is to learn the structure of a Bayesian network =-=[22]-=- from the data. Structure learning for Bayesian networks is commonly cast as an optimization problem. Within the space of all possible network structures, we search for the structure that explains the... |

962 | Learning Bayesian networks: The combination of knowledge and statistical data
- Heckerman, Geiger, et al.
- 1995
(Show Context)
Citation Context ...ing for Bayesian networks is commonly cast as an optimization problem. Within the space of all possible network structures, we search for the structure that explains the data best. A scoring function =-=[17, 14]-=- is used to evaluate the quality of the network, given the data. For data sets of low dimensionality, the structure learning problem can be solved optimally [16, 15, 24]. In higher dimensions, however... |

933 |
Comprehensive Identification of Cell Cycle-regulated Genes of the Yeast Saccharomyces cerevisiae by Microarray Hybridization
- Spellman
- 1998
(Show Context)
Citation Context ...ness of search in high-dimensional spaces. Our experimental results show that structure learning using model-based search outperforms other leading structure learning approaches. On the Gene data set =-=[25]-=- with 801-dimensional data, for example, our method produces on average a network that is 50.7% more accurate than the structure obtained with another leading structure learning method (compared to th... |

823 | Using Bayesian networks to analyze expression data
- Friedman
- 2000
(Show Context)
Citation Context ...these dependencies, it is possible to postulate a causal relationship. For example, we can formulate hypotheses about gene regulation by understanding statistical dependencies in gene expression data =-=[10]-=-. An effective method for extracting such causal relationships is to learn the structure of a Bayesian network [22] from the data. Structure learning for Bayesian networks is commonly cast as an optim... |

673 |
Probabilistic Networks and Expert Systems: Exact Computational Methods for Bayesian Networks
- Cowell, Dawid, et al.
- 2007
(Show Context)
Citation Context ...e learning algorithms. For this comparison we used six data sets. The first four (data 1 through data 4 in Figure 2) were generated using the Child10 network (10 connected copies of the Child network =-=[7]-=-, as described in [28]). The Child10 network contains 220 nodes connected by 257 edges. The fifth data set was generated from the Alarm10 network (10 connected copies of the Alarm network [2]) with 37... |

342 |
The beginning of the Monte Carlo method
- Metropolis
- 1987
(Show Context)
Citation Context .... Note that this score includes both the BDeu score of s as well as the estimated size of its region. Sampling: In order to sample in the allocated regions, we start a Metropolis Hastings Monte Carlo =-=[19]-=- (MHMC) run from the sample representing the region. MHMC performs a random walk which will accept moves that reduce the score with a probability based on the Boltzmann equation. The Monte Carlo run i... |

258 | No Free Lunch Theorems for Search - Wolpert, Macready - 1995 |

248 |
The ALARM monitoring system: A case study with two probabilistic inference techniques for belief networks
- Beinlich, Suermondt, et al.
- 1989
(Show Context)
Citation Context ... network [7], as described in [28]). The Child10 network contains 220 nodes connected by 257 edges. The fifth data set was generated from the Alarm10 network (10 connected copies of the Alarm network =-=[2]-=-) with 370 variables and 570 edges. The sixth data set was generated from the Gene network (as constructed in [11]), which contains 801 variables and 972 edges. Every data set is made up of 1000 insta... |

220 | Being bayesian about network structure
- Friedman
- 2000
(Show Context)
Citation Context ...k, a number of different search strategies were employed to find good orderings; these strategies include genetic algorithms [18], an optimized hill climbing method [26], and Markov chain Monte Carlo =-=[9]-=-. The search space can also be reduced by limiting the connectivity of the graph. The sparse candidate algorithm [11] only considers connections between variables with high mutual information. This al... |

203 | Learning bayesian belief networks: An approach based on the mdl principle
- Lam, Bacchus
- 1994
(Show Context)
Citation Context ...ing for Bayesian networks is commonly cast as an optimization problem. Within the space of all possible network structures, we search for the structure that explains the data best. A scoring function =-=[17, 14]-=- is used to evaluate the quality of the network, given the data. For data sets of low dimensionality, the structure learning problem can be solved optimally [16, 15, 24]. In higher dimensions, however... |

193 | Learning Bayesian network structure from massive datasets: The ‘sparse candidate’ algorithm
- Friedman, Nachman, et al.
- 1999
(Show Context)
Citation Context ...gorithms [18], an optimized hill climbing method [26], and Markov chain Monte Carlo [9]. The search space can also be reduced by limiting the connectivity of the graph. The sparse candidate algorithm =-=[11]-=- only considers connections between variables with high mutual information. This algorithm was successfully applied to the Gene data set with 801 variables [25]. This idea was later extended to includ... |

176 | Optimal structure identification with greedy search
- Chickering
(Show Context)
Citation Context ...ocus on work that reduces the search space to facilitate optimization by local search. We also discuss methods that have been applied to high-dimensional data sets. The equivalence class search space =-=[6]-=- represents structures as partially directed graphs (PDAGs). This search space results in relatively smooth quality landscape and can thus be searched well with greedy methods. Nevertheless, the PDAG ... |

137 | MIMIC: Finding optima by estimating probability densities
- Bonet, Isbell, et al.
- 1997
(Show Context)
Citation Context ... of the search space. By biasing the starting point selection towards promising regions, STAGE can outperform other global optimization algorithms in a variety of problem domains. The MIMIC algorithm =-=[8]-=- uses information obtained during global sampling to determine a probability distribution over the search space. This distribution estimates the probability that the optimization criterion evaluates t... |

96 | A transformational characterization of equivalent Bayesian network structures
- Chickering
- 1995
(Show Context)
Citation Context ...ic between two PDAGs that measures how many local moves are necessary to convert one PDAG into the other. We convert the DAGs returned by the structure learning algorithms into their equivalent PDAGs =-=[5]-=-. The comparison of PDAGs is useful, since Bayesian networks can be statistically equivalent yet structurally different, formally known as Markov equivalent. We now compare the structure learning perf... |

84 | The max-min hill-climbing Bayesian network structure learn- ing algorithm
- Tsamardinos, Brown, et al.
- 2006
(Show Context)
Citation Context ...utual information. This algorithm was successfully applied to the Gene data set with 801 variables [25]. This idea was later extended to include 2sa sound method for learning parent and children sets =-=[28]-=-. The resulting method, Max-Min Hill Climbing (MMHC), compares favorably with the sparse candidate algorithm when applied to large data sets [28]. Both of these methods rely on local search (hill clim... |

80 | Structure learning of bayesian networks by genetic algorithms: A performance analysis of control parameters
- Larrañaga, Poza, et al.
- 1996
(Show Context)
Citation Context ...t likely structure that observes the ordering, given the data. In prior work, a number of different search strategies were employed to find good orderings; these strategies include genetic algorithms =-=[18]-=-, an optimized hill climbing method [26], and Markov chain Monte Carlo [9]. The search space can also be reduced by limiting the connectivity of the graph. The sparse candidate algorithm [11] only con... |

62 | Learning evaluation functions for global optimization and boolean satisfiability
- Boyan, Moore
- 1998
(Show Context)
Citation Context ... assumptions, these algorithms will perform poorly [29]. In practice, however, several algorithms have shown that this active learning approach to search can achieve good results. The STAGE algorithm =-=[3]-=- uses the information obtained during local search to guide future local searches. STAGE learns a function that predicts the outcome of local searches started in a particular region of the search spac... |

61 | Advances in exact bayesian structure discovery in bayesian networks
- Koivisto
- 2006
(Show Context)
Citation Context ... the data best. A scoring function [17, 14] is used to evaluate the quality of the network, given the data. For data sets of low dimensionality, the structure learning problem can be solved optimally =-=[16, 15, 24]-=-. In higher dimensions, however, it becomes increasingly difficult, as the number of possible network structures grows super-exponentially in the dimensionality of the data [23]. Structure learning fo... |

57 | Active learning for structure in Bayesian networks
- Tong, Koller
- 2001
(Show Context)
Citation Context ..., it is interesting to mention another application of active learning to structure learning. Tong and Koller use active learning to obtain accurate structures from as little training data as possible =-=[27]-=-. 2.2 Search The literature on search and optimization is vast and we cannot hope to adequately review it here. Instead, we focus on those search methods that have been applied in structure learning a... |

50 | Ordering-based search: A simple and effective algorithm for learning Bayesian networks
- Teyssier, Koller
- 2005
(Show Context)
Citation Context ...ering, given the data. In prior work, a number of different search strategies were employed to find good orderings; these strategies include genetic algorithms [18], an optimized hill climbing method =-=[26]-=-, and Markov chain Monte Carlo [9]. The search space can also be reduced by limiting the connectivity of the graph. The sparse candidate algorithm [11] only considers connections between variables wit... |

49 | A simple approach for finding the globally optimal Bayesian network structure
- Silander, Myllymäki
- 2006
(Show Context)
Citation Context ... the data best. A scoring function [17, 14] is used to evaluate the quality of the network, given the data. For data sets of low dimensionality, the structure learning problem can be solved optimally =-=[16, 15, 24]-=-. In higher dimensions, however, it becomes increasingly difficult, as the number of possible network structures grows super-exponentially in the dimensionality of the data [23]. Structure learning fo... |

42 |
Counting labeled acyclic digraphs
- Robinson
- 1973
(Show Context)
Citation Context ...lved optimally [16, 15, 24]. In higher dimensions, however, it becomes increasingly difficult, as the number of possible network structures grows super-exponentially in the dimensionality of the data =-=[23]-=-. Structure learning for highdimensional data sets thus remains a substantial challenge with far-reaching implications for practical applications. To make structure learning tractable in high-dimensio... |

31 | Tractable learning of large Bayes net structures from sparse data
- Goldenberg, Moore
- 2004
(Show Context)
Citation Context ...main susceptible to local optima. Another structure learning algorithm focuses on situations in which only very sparse data is available. The algorithm relies on frequent sets to learn large networks =-=[13]-=-. Although unrelated to the work presented here, it is interesting to mention another application of active learning to structure learning. Tong and Koller use active learning to obtain accurate struc... |

21 | Adaptive importance sampling for estimation in structured domains
- Ortiz, Kaelbling
- 2000
(Show Context)
Citation Context ...n this distribution and only retaining samples of increasing quality, the estimate of the distribution is successively refined until the algorithm converges. An adaptive importance sampling algorithm =-=[21]-=- actively learns the optimal importance sampling distribution from the previously taken samples. This algorithm has been applied to action evaluation in influence diagrams. These three aforementioned ... |

17 | On Local Optima in Learning Bayesian Networks
- Nielsen, Kočka, et al.
- 2003
(Show Context)
Citation Context ...landscape and can thus be searched well with greedy methods. Nevertheless, the PDAG search space still contains many local optima. Some of them can be overcome by including randomness into the search =-=[20]-=-. An alternative search space considers orderings of nodes, which in turn limit the connectivity of nodes in the network. The optimization problem now consists of determining the best ordering of node... |

14 |
Causal explorer: A causal probabilistic network learning toolkit for biomedical discovery
- Aliferis, Tsamardinos, et al.
- 2003
(Show Context)
Citation Context ...ning algorithms identified MMHC and SC as the leading structure learning algorithms for high-dimensional data [28]. The implementations of algorithms other than MBS were obtained from Causal Explorer =-=[1]-=-. Note that HC operates in the full search space (and thus cannot be applied to the high-dimensional Gene data), whereas MBS and MMHC operate on the reduced max-min parents and children (MMPC) search ... |

13 |
Improving protein structure prediction with model-based search
- Brunette, Brock
- 2005
(Show Context)
Citation Context ...the true dependencies present in the data. We propose to advance Bayesian network structure learning by employing an improved search method. This search method, which we call model-based search (MBS) =-=[4]-=-, combines the computational efficiency of local search with the advantageous properties of global search. If we view local search as exploitation and global search as exploration, model-based search ... |