## Sequential Update of Bayesian Network Structure (1997)

### Cached

### Download Links

Venue: | In Proc. 13th Conference on Uncertainty in Artificial Intelligence (UAI’97 |

Citations: | 46 - 4 self |

### BibTeX

@INPROCEEDINGS{Friedman97sequentialupdate,

author = {Nir Friedman and Moises Goldszmidt},

title = {Sequential Update of Bayesian Network Structure},

booktitle = {In Proc. 13th Conference on Uncertainty in Artificial Intelligence (UAI’97},

year = {1997},

pages = {165--174},

publisher = {Morgan Kaufmann}

}

### Years of Citing Articles

### OpenURL

### Abstract

There is an obvious need for improving the performance and accuracy of a Bayesian network as new data is observed. Because of errors in model construction and changes in the dynamics of the domains, we cannot afford to ignore the information in new data. While sequential update of parameters for a fixed structure can be accomplished using standard techniques, sequential update of network structure is still an open problem. In this paper, we investigate sequential update of Bayesian networks were both parameters and structure are expected to change. We introduce a new approach that allows for the flexible manipulation of the tradeoff between the quality of the learned networks and the amount of information that is maintained about past observations. We formally describe our approach including the necessary modifications to the scoring functions for learning Bayesian networks, evaluate its effectiveness through and empirical study, and extend it to the case of missing data. 1 Introductio...

### Citations

2699 |
Estimating the dimension of a model
- Schwarz
- 1978
(Show Context)
Citation Context ... )) = S BDe (X i ;pa(X i )) P x i ;pa(x i ) N(x i ;pa(x i )) . This modification is motivated by the asymptotic equivalence between the Bayesian score and the MDL score. A a general result by Schwarz =-=[14]-=- shows that S BDe (G j D) = \GammaS MDL (G j D) +O(1): Thus, by Lemma 3.1, the average Bayesian score is also asymptotically correct. It is not clear to us, at this stage, whether this average score h... |

1133 | A Bayesian method for the induction of probabilistic networks from data
- Cooper, Herskovits
- 1992
(Show Context)
Citation Context ...ks with respect to the training data, and then to search for the best network (according to this score). The two main scoring functions commonly used to learn Bayesian networks are the Bayesian score =-=[3, 8]-=-, and the one based on the principle of minimal description length (MDL) [9, 5]. These scores are asymptotically equivalent as the sample size increases. Furthermore they are both asymptotically corre... |

950 | Learning Bayesian networks: The combination of knowledge and statistical data
- Heckerman, Geiger, et al.
- 1994
(Show Context)
Citation Context ...ks with respect to the training data, and then to search for the best network (according to this score). The two main scoring functions commonly used to learn Bayesian networks are the Bayesian score =-=[3, 8]-=-, and the one based on the principle of minimal description length (MDL) [9, 5]. These scores are asymptotically equivalent as the sample size increases. Furthermore they are both asymptotically corre... |

801 | A view of the EM algorithm that justifies incremental, sparse, and other variants
- Neal, Hinton
- 1999
(Show Context)
Citation Context ...e how to extend these methods to deal with incomplete data in sequential update. To this end, we propose a combination of two generalizations of the expectation maximization algorithm: incremental EM =-=[12]-=- and model-selection EM [4]. The rest of this paper is organized as follows: In Section 2, we briefly review the current practice of learning Bayesian networks. In Section 3, we describe our approach ... |

327 | Bayesian Networks
- Heckerman, Wellman
- 1995
(Show Context)
Citation Context ...a. 1 Introduction Recently, there has been a great deal of effort in developing methods for learning Bayesian networks from data for density estimation, data analysis, and pattern classification (see =-=[7]-=- for a tutorial and an overview). This body of work, which includes both theoretical and experimental results, has concentrated mostly on batch learning methods. In this setting, the total corpus of d... |

244 | Learning Bayesian networks with local structure
- Friedman, Goldszmidt
- 1996
(Show Context)
Citation Context ...according to this score). The two main scoring functions commonly used to learn Bayesian networks are the Bayesian score [3, 8], and the one based on the principle of minimal description length (MDL) =-=[9, 5]-=-. These scores are asymptotically equivalent as the sample size increases. Furthermore they are both asymptotically correct: with probability equal to one the learned distribution converges to the und... |

227 |
The EM algorithm for graphical association models with missing data
- Lauritzen
- 1995
(Show Context)
Citation Context ... Equation 1. Moreover, in order to evaluate the optimal choice of parameters for a given candidate network structure, we must perform a nonlinear optimization using either Expectation Maximization EM =-=[11]-=- or gradient descent [13]. In this paper we focus on the EM procedure. The standard use of EM is for batch learning. In addition, it is restricted to induce the parameters under the assumption of a fi... |

206 |
1990. Sequential updating of conditional probabilities on directed graphical structures
- Spiegelhalter, Lauritzen
(Show Context)
Citation Context ...e make the assumption that the structure of the network is fixed and we use conjugate priors, we can efficiently represent the posterior and update it after each iteration using a closed-form formula =-=[15]-=-. This approach, however, is infeasible when we also attempt to update the structure of the network. The BDe score is based on assumptions that allow us to compactly represent a prior using a single n... |

199 | Theory refinement on Bayesian networks
- Buntine
- 1991
(Show Context)
Citation Context ...ous work on the sequential update of Bayesian networks have been mostly restricted to updating the parameters assuming a fixed structure [15]. The two notable exceptions are the approaches by Buntine =-=[2]-=- and by Lam and Bacchus [10]. Buntine's method assumes that a total order on the variables is given, and it maintains sufficient statistics for the possible parents of each node using lattice structur... |

199 | Learning Bayesian belief networks. An approach based on the MDL principle
- Lam, Bacchus
- 1994
(Show Context)
Citation Context ...according to this score). The two main scoring functions commonly used to learn Bayesian networks are the Bayesian score [3, 8], and the one based on the principle of minimal description length (MDL) =-=[9, 5]-=-. These scores are asymptotically equivalent as the sample size increases. Furthermore they are both asymptotically correct: with probability equal to one the learned distribution converges to the und... |

131 | Learning Belief Networks in the presence of Missing Values and Hidden Variables
- Friedman
- 1997
(Show Context)
Citation Context ...s to deal with incomplete data in sequential update. To this end, we propose a combination of two generalizations of the expectation maximization algorithm: incremental EM [12] and model-selection EM =-=[4]-=-. The rest of this paper is organized as follows: In Section 2, we briefly review the current practice of learning Bayesian networks. In Section 3, we describe our approach for incremental update and ... |

79 | Local learning in probabilistic networks with hidden variables
- Russell, Binder, et al.
- 1995
(Show Context)
Citation Context ...tistics. This procedure uses the neighbor frontier set we described above. The datasets used in the experiments were generated from two networks: the alarm network of [1] and the insurance network of =-=[13]-=-. The alarm networks contains 37 variables, and the insurance network contains 26 variables. From each network we sampled 5 training sets, each consisting of 10,000 instances. The results reported in ... |

25 |
The ALARM monitoring system
- Beinlich, Suermondt, et al.
- 1989
(Show Context)
Citation Context ... linearly with the number of instances collected, and will become infeasible when the network is expected to perform for long periods of time. A good example of such a network, is the "alarm"=-=; network [1]-=- which is part of a system for monitoring of intensive care patients. In this example, the domain consists of 37 variables that can have 2 53:95 distinct instantiations. Clearly, we cannot store count... |

19 | F.: Using new data to refine a bayesian network
- Lam, Bacchus
- 1994
(Show Context)
Citation Context ...s space efficient since we we only need to store the new instances that have been observed since we last performed the update of the MAP. An approach similar in spirit was proposed by Lam and Bacchus =-=[10]-=- in the context of the MDL score. Unfortunately, by using the MAP model as the prior for the next iteration of learning, we are loosing information, and are strongly biasing the learning process towar... |

4 |
Asymptotic model selection for directed graphs with hidden variables
- Geiger, Heckerman, et al.
- 1996
(Show Context)
Citation Context ...le size increases. Furthermore they are both asymptotically correct: with probability equal to one the learned distribution converges to the underlying distribution as the number of samples increases =-=[7, 6]. In this -=-paper we use the MDL score described in [5], which we denote as SMDL , and the "BDe" variant of the Bayesian introduced by Heckerman et. al [8], which we denote as S BDe . Details about thes... |