## BAYESIAN NETWORK STRUCTURAL LEARNING AND INCOMPLETE DATA

### BibTeX

@MISC{François_bayesiannetwork,

author = {Olivier François},

title = {BAYESIAN NETWORK STRUCTURAL LEARNING AND INCOMPLETE DATA},

year = {}

}

### OpenURL

### Abstract

The Bayesian network formalism is becoming increasingly popular in many areas such as decision aid, diagnosis and complex systems control, in particular thanks to its inference capabilities, even when data are incomplete. Besides, estimating the parameters of a fixed-structure Bayesian network is easy. However, very few methods are capable of using incomplete cases as a base to determine the structure of a Bayesian network. In this paper, we take up the structural EM algorithm principle [9, 10] to propose an algorithm which extends the Maximum Weight Spanning Tree algorithm to deal with incomplete data. We also propose to use this extension in order to (1) speed up the structural EM algorithm or (2) in classification tasks extend the Tree Augmented Naive classifier in order to deal with incomplete data. 1.

### Citations

8919 | Maximum likelihood from incomplete data via the EM algorithm (with discussion
- Dempster, Laird, et al.
- 1977
(Show Context)
Citation Context ...Xi’s are measured) as in the previous approach. Many methods try to rely more on all the observed data. Among them are sequential updating [30], Gibbs sampling [12], and expectation maximisation (EM) =-=[7, 18]-=- algorithms which use the missing data MAR properties. More recently, bound and collapse [26] and robust bayesian estimator [27] algorithms try to resolve this task whatever the nature of missing data... |

4017 |
Stochastic relaxation, Gibbs distribution, and the Bayesian restoration of images
- Geman, Geman
- 1984
(Show Context)
Citation Context ...re measured, not only in Dco (where all Xi’s are measured) as in the previous approach. Many methods try to rely more on all the observed data. Among them are sequential updating [30], Gibbs sampling =-=[12]-=-, and expectation maximisation (EM) [7, 18] algorithms which use the missing data MAR properties. More recently, bound and collapse [26] and robust bayesian estimator [27] algorithms try to resolve th... |

968 |
Introduction to Bayesian Networks
- Jensen
- 1996
(Show Context)
Citation Context ... to deal with incomplete data. 1. INTRODUCTION Bayesian networks introduced by [17] are a formalism of probabilistic reasoning used increasingly in decision aid, diagnosis and complex systems control =-=[15, 25, 24]-=-. Let X = {X1, . . . , Xn} be a set of discrete random variables. A Bayesian network B =< G, Θ > is defined by • a directed acyclic graph (DAG) G =< N, U > where N represents the set of nodes (one nod... |

948 | Learning bayesian networks: The combination of knowledge and statistical data
- Heckerman, Geiger, et al.
- 1995
(Show Context)
Citation Context ...most of the time- completely observed [24, 8]. We are here more specifically interested in score-based methods, primarily GS algorithm [1] and MWST proposed by [4] and applied to Bayesian networks in =-=[14]-=-. GS is a greedy search carried out in DAG spaces where the interest of each structure located near the current structure is assessed by means of a BIC/MDL type measurement (equation 2) 1 or a Bayesia... |

682 | Approximating discrete probability distributions with dependence trees
- Chow, Liu
- 1968
(Show Context)
Citation Context ...wledge, others use real data which are -most of the time- completely observed [24, 8]. We are here more specifically interested in score-based methods, primarily GS algorithm [1] and MWST proposed by =-=[4]-=- and applied to Bayesian networks in [14]. GS is a greedy search carried out in DAG spaces where the interest of each structure located near the current structure is assessed by means of a BIC/MDL typ... |

357 |
Inference and missing data
- Rubin
- 1976
(Show Context)
Citation Context ...ations: Dm = {X l i / Mil = 1} 1�i�n,1�l�N Do = {X l i / Mil = 0} 1�i�n,1�l�N Dco = {[X l 1 . . . Xl n ] / [M1l . . . Mnl] = [0 . . .0]} 1�l�N Dealing with missing data depends on their nature. Rubin =-=[29]-=- identified several types of missing data: • MCAR (Missing Completly At Random) : P(M|D) = P(M), the probability for data to be missing does not depend on D, • MAR (Missing At Random) : P(M|D) = P(M|D... |

227 |
The EM algorithm for graphical association models with missing data
- Lauritzen
- 1995
(Show Context)
Citation Context ...Xi’s are measured) as in the previous approach. Many methods try to rely more on all the observed data. Among them are sequential updating [30], Gibbs sampling [12], and expectation maximisation (EM) =-=[7, 18]-=- algorithms which use the missing data MAR properties. More recently, bound and collapse [26] and robust bayesian estimator [27] algorithms try to resolve this task whatever the nature of missing data... |

224 | The Bayesian structural EM algorithm
- Friedman
- 1998
(Show Context)
Citation Context ...G ∗ ,Θ ∗ [Nijk] = N ∗ P(Xi = xk, Pi = paj|G ∗ , Θ ∗ ) obtained by inference in the network {G ∗ , Θ ∗ } if {Xi,Pi} are not completely measured, or else only by mere counting. With the same reasoning, =-=[10]-=- proposes the adaptation of the BDe score to incomplete data. 4. MWST-EM, A STRUCTURAL EM IN THE SPACE OF TREES 4.1. General principle Following [14]’s recommendations, [8] have shown that, in complet... |

206 |
1990. Sequential updating of conditional probabilities on directed graphical structures
- Spiegelhalter, Lauritzen
(Show Context)
Citation Context ...where Xi and Pa(Xi) are measured, not only in Dco (where all Xi’s are measured) as in the previous approach. Many methods try to rely more on all the observed data. Among them are sequential updating =-=[30]-=-, Gibbs sampling [12], and expectation maximisation (EM) [7, 18] algorithms which use the missing data MAR properties. More recently, bound and collapse [26] and robust bayesian estimator [27] algorit... |

184 | The Bayes Net Toolbox for Matlab
- Murphy
(Show Context)
Citation Context ... for AMS-EM (AMS-EM+T). We present the experimental protocol, the results and the first interpretations of the results below. 5.1. Protocol We used Matlab, and more specifically the Bayes Net Toolbox =-=[22]-=-. We are developing and distributing a structure learning package (cf. [19]) based on this toolbox with the function codes implemented in the tests. (a) Toy 1 (b) Toy 2 (c) Toy 3 (d) Toy 4 (e) Toy 5 (... |

183 | Efficient approximations for the marginal likelihood of incomplete data given a Bayesian network
- Chickering, Heckerman
- 1997
(Show Context)
Citation Context ...rongly connected structures. Moreover, it is impossible to calculate marginal likelihood when data are incomplete, so that it is necessary to rely on an efficient approximation like those reviewed by =-=[2]-=-. In complete data cases, the most frequently used measurements are the BIC/MDL score and the Bayesian Algorithm 3 : Detailed EM for structural learning 1: Init : finished = false, i = 0 Random or heu... |

131 | Learning Belief Networks in the presence of Missing Values and Hidden Variables
- Friedman
- 1997
(Show Context)
Citation Context ...rk is easy. However, very few methods are capable of using incomplete cases as a base to determine the structure of a Bayesian network. In this paper, we take up the structural EM algorithm principle =-=[9, 10]-=- to propose an algorithm which extends the Maximum Weight Spanning Tree algorithm to deal with incomplete data. We also propose to use this extension in order to (1) speed up the structural EM algorit... |

77 |
Learning bayesian networks: Search methods and experimental results
- Chickering, Geiger, et al.
- 1995
(Show Context)
Citation Context ... rely on human expert knowledge, others use real data which are -most of the time- completely observed [24, 8]. We are here more specifically interested in score-based methods, primarily GS algorithm =-=[1]-=- and MWST proposed by [4] and applied to Bayesian networks in [14]. GS is a greedy search carried out in DAG spaces where the interest of each structure located near the current structure is assessed ... |

63 | Semisupervised learning of classifiers: theory, algorithms, and their application to humancomputer interaction
- Cohen, Cozman, et al.
- 2004
(Show Context)
Citation Context ...t a new variable is introduced in order to take into account the weight of each tree in the mixture. This variable isn’t measured so EM is used to determine the corresponding parameters. Cohen et al. =-=[5]-=- deal with TANB classifiers and EM principle for partially unlabeled data. In there work, only the variable corresponding to the class can be partially missing whereas any variable can be partially mi... |

60 | Learning augmented Bayesian classifiers: a comparison of distributionbased and classification-based approaches
- Keogh, Pazzani
- 1999
(Show Context)
Citation Context ...iedman. This variant of the structural EM algorithm will be called AMS-EM+T. 4.5. Extension to classification problems For classification tasks (where data are incomplete), many studies like those of =-=[16, 20]-=- use a structure based on an augmented nave Bayesian network, where observations (i.e. all the variables except class) are linked to the very best tree (TANB, Tree Augmented Naive Bayes). [11] showed ... |

54 | Robust learning with missing data
- Ramoni, Sebastiani
(Show Context)
Citation Context ...pdating [30], Gibbs sampling [12], and expectation maximisation (EM) [7, 18] algorithms which use the missing data MAR properties. More recently, bound and collapse [26] and robust bayesian estimator =-=[27]-=- algorithms try to resolve this task whatever the nature of missing data. Algorithm 1 explains in detail how EM works. EM has been proposed by [7] and adapted by [18] to the learning of the parameters... |

50 |
Counting unlabeled acyclic digraphs
- Robinson
- 1977
(Show Context)
Citation Context ...ing and testing a few ideas for improvement based on the extension of the Maximum Weight Spanning Tree algorithm to deal with incomplete data. 2. PRELIMINARY REMARKS 2.1. Structural learning Robinson =-=[28]-=- showed that r(n), the number of possible structures for Bayesian network having n nodes, is given by the recurrence formula of equation 1. n∑ r(n) = (−1) i+1 ( ) n 2 i i(n−i) r(n − i) = n 2O(n) (1) i... |

36 |
Finding optimal Bayesian networks
- Chickering, Meek
- 2002
(Show Context)
Citation Context ...related to classification. The MWST-EM and AMS-EM methods are respective adaptation of MWST and GS to incomplete data. Both structural search algorithms are applying in DAG space. Chickering and Meek =-=[3]-=- recently proposed an optimal search algorithm (GES) which deals with Markov equivalent space. Logically enough, the next step in our research is to adapt GES to incomplete datasets. 7. ACKNOWLEDGEMEN... |

32 |
An entropy-based learning algorithm of Bayesian conditional trees
- Geiger
- 1992
(Show Context)
Citation Context ... of [16, 20] use a structure based on an augmented nave Bayesian network, where observations (i.e. all the variables except class) are linked to the very best tree (TANB, Tree Augmented Naive Bayes). =-=[11]-=- showed it was the tree obtained by running MWST on the observations. It is therefore possible to extend this specific structure to classification problems when data are incomplete by running our MWST... |

23 | Learning bayesian network from incomplete data with stochastic search algorithms
- Myers, Laskey, et al.
- 1999
(Show Context)
Citation Context ...We can also cite the Hybrid Independence Test proposed in [6] that can use EM to estimate the essential sufficient statistics that are then used for an independence test in a constraint-based method. =-=[23]-=- proposes a structural learning method based on genetic algorithm and MCMC. We will now explain the structural EM algorithm principle in details, and then we will put forward some ideas for improvemen... |

23 | Parameter estimation in Bayesian networks from incomplete databases. Intelligent Data Analysis
- Ramoni, Sebastiani
- 1998
(Show Context)
Citation Context ...d data. Among them are sequential updating [30], Gibbs sampling [12], and expectation maximisation (EM) [7, 18] algorithms which use the missing data MAR properties. More recently, bound and collapse =-=[26]-=- and robust bayesian estimator [27] algorithms try to resolve this task whatever the nature of missing data. Algorithm 1 explains in detail how EM works. EM has been proposed by [7] and adapted by [18... |

18 | Learning with Mixtures of Trees
- Meila-Predoviciu
- 1999
(Show Context)
Citation Context ...re possible to extend this specific structure to classification problems when data are incomplete by running our MWST-EM algorithm, and this algorithm will be called TANB-EM. 4.6. Related works Meila =-=[21]-=- applies MWST algorithm and EM principle, but in another framework, learning mixtures of trees. In this work, the data is complete, but a new variable is introduced in order to take into account the w... |

16 | BNT structure learning package: Documentation and experiments
- Francois, Leray
- 2004
(Show Context)
Citation Context ...nd the first interpretations of the results below. 5.1. Protocol We used Matlab, and more specifically the Bayes Net Toolbox [22]. We are developing and distributing a structure learning package (cf. =-=[19]-=-) based on this toolbox with the function codes implemented in the tests. (a) Toy 1 (b) Toy 2 (c) Toy 3 (d) Toy 4 (e) Toy 5 (f) Asia Figure 1. Reference toy structures We tested the three following al... |

14 | Robust independence testing for constraint-based learning of causal structure
- Dash, Druzdzel
- 2003
(Show Context)
Citation Context ...ral learning with incomplete data use the EM principle : Alternative Model Selection EM (AMS-EM) [9] and Bayesian Structural EM (BS-EM) [10]. We can also cite the Hybrid Independence Test proposed in =-=[6]-=- that can use EM to estimate the essential sufficient statistics that are then used for an independence test in a constraint-based method. [23] proposes a structural learning method based on genetic a... |

14 |
Convice; a conversational inference consolidation engine
- Kim, Pearl
- 1987
(Show Context)
Citation Context ...) speed up the structural EM algorithm or (2) in classification tasks extend the Tree Augmented Naive classifier in order to deal with incomplete data. 1. INTRODUCTION Bayesian networks introduced by =-=[17]-=- are a formalism of probabilistic reasoning used increasingly in decision aid, diagnosis and complex systems control [15, 25, 24]. Let X = {X1, . . . , Xn} be a set of discrete random variables. A Bay... |

8 |
Evaluation d'algorithmes d'apprentissage de structure pour les réseaux bayésiens
- Francois, Leray
- 2004
(Show Context)
Citation Context ...uristic methods have been proposed to determine the structure of a Bayesian network. Some of them rely on human expert knowledge, others use real data which are -most of the time- completely observed =-=[24, 8]-=-. We are here more specifically interested in score-based methods, primarily GS algorithm [1] and MWST proposed by [4] and applied to Bayesian networks in [14]. GS is a greedy search carried out in DA... |

7 |
Structural extension to logistic regression
- Greiner, Zhou
- 2002
(Show Context)
Citation Context ...EM principle for partially unlabeled data. In there work, only the variable corresponding to the class can be partially missing whereas any variable can be partially missing in our TANB-EM extension. =-=[13]-=- propose maximising conditional likelihood for BN parameter learning. They apply their method to MCAR incomplete data by using available case analysis in order to find the best TANB classifier. 5. EXP... |

6 |
Réseau Bayésiens
- Naïm, Wuillemin, et al.
- 2002
(Show Context)
Citation Context ... to deal with incomplete data. 1. INTRODUCTION Bayesian networks introduced by [17] are a formalism of probabilistic reasoning used increasingly in decision aid, diagnosis and complex systems control =-=[15, 25, 24]-=-. Let X = {X1, . . . , Xn} be a set of discrete random variables. A Bayesian network B =< G, Θ > is defined by • a directed acyclic graph (DAG) G =< N, U > where N represents the set of nodes (one nod... |

2 |
2004b. Réseaux bayésiens pour la classification - méthodologie et illustration dans le cadre du diagnostic médical. Revue d’Intelligence Artificielle
- Leray, François
(Show Context)
Citation Context ...iedman. This variant of the structural EM algorithm will be called AMS-EM+T. 4.5. Extension to classification problems For classification tasks (where data are incomplete), many studies like those of =-=[16, 20]-=- use a structure based on an augmented nave Bayesian network, where observations (i.e. all the variables except class) are linked to the very best tree (TANB, Tree Augmented Naive Bayes). [11] showed ... |