## Learning Bayesian belief networks with neural network estimators (1997)

Venue: | In Neural Information Processing Systems 9 |

Citations: | 6 - 2 self |

### BibTeX

@INPROCEEDINGS{Monti97learningbayesian,

author = {Stefano Monti and Gregory F. Cooper},

title = {Learning Bayesian belief networks with neural network estimators},

booktitle = {In Neural Information Processing Systems 9},

year = {1997},

pages = {579--584},

publisher = {MIT Press}

}

### OpenURL

### Abstract

In this paper we propose a method for learning Bayesian belief networks from data. The method uses artificial neural networks as probability estimators, thus avoiding the need for making prior assumptions on the nature of the probability distributions governing the relationships among the participating variables. This new method has the potential for being applied to domains containing both discrete and continuous variables arbitrarily distributed. We compare the learning performance of this new method with the performance of the method proposed by Cooper and Herskovits in [10]. The experimental results show that, although the learning scheme based on the use of ANN estimators is slower, the learning accuracy of the two methods is comparable. y To appear in Advances in Neural Information Processing Systems, 1996. 1 Introduction Bayesian belief networks (BBN), often referred to as probabilistic networks, are a powerful formalism for representing and reasoning under uncertainty. This...

### Citations

7441 |
Probabilistic Reasoning in Intelligent Systems
- Pearl
- 1988
(Show Context)
Citation Context ...n belief networks (BBN), often referred to as probabilistic networks, are a powerful formalism for representing and reasoning under uncertainty. This representation has a solid theoretical foundation =-=[19]-=-, and its practical value is suggested by the rapidly growing number of areas to which it is being applied. BBNs concisely represent the joint probability distribution over a set of random variables, ... |

5297 |
Neural Network for Pattern Recognition
- Bishop
- 1995
(Show Context)
Citation Context ...making prior assumptions on the nature of the probability distribution governing the relationships among the participating variables. The use of ANNs as probability distribution estimators is not new =-=[3, 18, 20]-=-, and its application to the task of learning Bayesian belief networks from data has been recently explored in [15]. However, in [15] the ANN estimators were used in the parametrization of the belief ... |

1132 | A Bayesian method for the induction of probabilistic networks from data
- Cooper, Herskovits
- 1992
(Show Context)
Citation Context ...ntaining both discrete and continuous variables arbitrarily distributed. We compare the learning performance of this new method with the performance of the method proposed by Cooper and Herskovits in =-=[10]-=-. The experimental results show that, although the learning scheme based on the use of ANN estimators is slower, the learning accuracy of the two methods is comparable. y To appear in Advances in Neur... |

949 | Learning Bayesian networks: The combination of knowledge and statistical data
- Heckerman, Geiger, et al.
- 1995
(Show Context)
Citation Context ...hods is based on the definition of a scoring metric measuring the fitness of a network structure to the data, and on the search for high-scoring network structures based on the defined scoring metric =-=[6, 10, 14]-=-. We focus on these methods, and in particular on the definition of Bayesian scoring metrics. In a Bayesian framework, ideally classification and prediction would be performed by taking a weighted ave... |

529 |
Causation, Prediction and Search
- Spirtes, Glymour, et al.
- 2000
(Show Context)
Citation Context ...h to account for hidden variables and for the presence of data points with missing values. Different approaches have been successfully applied to the task of learning probabilistic networks from data =-=[6, 9, 10, 16, 21, 22, 23, 24]-=-. In all these approaches, simplifying assumptions are made to circumvent practical problems in the implementation of the theory. One common assumption that is made is that all variables are discrete,... |

336 | A scaled conjugate gradient algorithm for fast supervised learning
- Moller
- 1993
(Show Context)
Citation Context ...oint probability distribution of ALARM. The learning performance of ANN-K2 is also compared with the performance of K2. To train the ANNs, we used the conjugate-gradient search algorithm described in =-=[17]-=-. Architecture of the ANN estimators Since all the variables in the ALARM network are discrete, the ANN estimators to be included in the scoring metric are defined based on the softmax model [5] that ... |

282 |
Neural network classifiers estimate Bayesian a posterior probabilities
- Richard, Lippmann
- 1991
(Show Context)
Citation Context ...making prior assumptions on the nature of the probability distribution governing the relationships among the participating variables. The use of ANNs as probability distribution estimators is not new =-=[3, 18, 20]-=-, and its application to the task of learning Bayesian belief networks from data has been recently explored in [15]. However, in [15] the ANN estimators were used in the parametrization of the belief ... |

245 |
The ALARM monitoring system: A case study with two probabilistic inference techniques for belief networks
- Beinlich, Suermondt, et al.
- 1989
(Show Context)
Citation Context ...us section. Methodology All the experiments are performed on the belief network ALARM, a multiply-connected network originally developed to model anesthesiology problems that may occur during surgery =-=[2]-=-. It contains 37 nodes/variables and 46 arcs. The variables are all discrete, and take between 2 and 4 distinct values. The database used in the experiments was generated from ALARM, and it is the sam... |

242 |
Probabilistic interpretation of feedforward classification network outputs, with relationships to statistical pattern recognition
- Bridle
- 1990
(Show Context)
Citation Context ...d in [17]. Architecture of the ANN estimators Since all the variables in the ALARM network are discrete, the ANN estimators to be included in the scoring metric are defined based on the softmax model =-=[5] that we-=- now describe. Given a variable x i , with n i values and set of parents �� i , the conditional probability distribution P (x i j �� i ) is approximated by a neural network with n i output uni... |

206 |
1990. Sequential updating of conditional probabilities on directed graphical structures
- Spiegelhalter, Lauritzen
(Show Context)
Citation Context ...h to account for hidden variables and for the presence of data points with missing values. Different approaches have been successfully applied to the task of learning probabilistic networks from data =-=[6, 9, 10, 16, 21, 22, 23, 24]-=-. In all these approaches, simplifying assumptions are made to circumvent practical problems in the implementation of the theory. One common assumption that is made is that all variables are discrete,... |

199 | Learning Bayesian belief networks. An approach based on the MDL principle
- Lam, Bacchus
- 1994
(Show Context)
Citation Context ...h to account for hidden variables and for the presence of data points with missing values. Different approaches have been successfully applied to the task of learning probabilistic networks from data =-=[6, 9, 10, 16, 21, 22, 23, 24]-=-. In all these approaches, simplifying assumptions are made to circumvent practical problems in the implementation of the theory. One common assumption that is made is that all variables are discrete,... |

198 | Theory of Refinement on Bayesian Networks
- Buntine
- 1991
(Show Context)
Citation Context |

179 | A guide to the literature on learning probabilistic networks from data
- Buntine
- 1996
(Show Context)
Citation Context ...obability distribution over the network structures is uniform and can be ignored in comparing network structures. 2 For a comprehensive guide to the literature on learning probabilistic networks, see =-=[7]-=-. The Bayesian scoring metrics developed so far either assume discrete variables [6, 10, 14], or continuous variables normally distributed [13]. In the next section, we propose a possible generalizati... |

116 |
Learning Gaussian networks
- Geiger, Heckerman
- 1994
(Show Context)
Citation Context ...to the literature on learning probabilistic networks, see [7]. The Bayesian scoring metrics developed so far either assume discrete variables [6, 10, 14], or continuous variables normally distributed =-=[13]-=-. In the next section, we propose a possible generalization which allows for the inclusion of both discrete and continuous variables with arbitrary probability distributions. 3 An ANN-based scoring me... |

77 |
Learning bayesian networks: Search methods and experimental results
- Chickering, Geiger, et al.
- 1995
(Show Context)
Citation Context ...e corresponding s(x i ; �� i ; D). Once a scoring metric is defined, a search for a high-scoring network structure can be carried out. This search task (in several forms) has been shown to be NP-h=-=ard [4, 8]-=-. Various heuristics have been proposed to find network structures with a high score. One such heuristic is known as K2 [10], and it implements a greedy search over the space of network structures. Th... |

70 |
A bayesian method for constructing bayesian belief networks from databases
- Cooper, Herskovits
(Show Context)
Citation Context |

69 |
Present position and potential developments: Some personal views, statistical theory, the prequential approach
- Dawid
- 1984
(Show Context)
Citation Context ... i can be neglected if we assume a uniform prior over the network structures). Notice that the computation of Equation 4 corresponds to the application of the prequential method discussed by Dawid in =-=[11]. Th-=-e problem now lies with how to estimate each term P (x i j �� i ; D l ; BS ). This can be done by means of neural network estimators. Several schemes are available for training a neural network to... |

30 | Building probabilistic networks: where do the numbers come from
- Druzdzel, Gaag
- 2000
(Show Context)
Citation Context ...ly suitable for being used in tasks such as diagnosis, planning, control, and explanation. Construction of probabilistic networks with domain experts often remains a difficult and time consuming task =-=[12]-=-. Knowledge acquisition from experts is difficult because the experts have problems in making their knowledge explicit. Furthermore, it is time consuming because the information needs to be collected ... |

25 |
An evaluation of an algorithm for inductive learning of bayesian belief networks using simulated data sets
- Aliferis, Cooper
- 1994
(Show Context)
Citation Context ...on, we have compared the performance of the new algorithm with the performance of K2, a well established learning algorithm for discrete domains, for which extensive empirical evaluation is available =-=[1, 10]-=-. With regard to the learning accuracy of the new method, the results are encouraging, being comparable to state-of-the-art results for the chosen domain. The next step is the application of this meth... |

22 | Discovering structure in continuous variables using bayesian networks
- Hofmann, Tresp
- 1996
(Show Context)
Citation Context ... variables. The use of ANNs as probability distribution estimators is not new [3, 18, 20], and its application to the task of learning Bayesian belief networks from data has been recently explored in =-=[15]-=-. However, in [15] the ANN estimators were used in the parametrization of the belief network structure only, and cross validation was the method of choice for comparing different network structures. I... |

20 |
Estimation of conditional densities: A comparison of neural network approaches
- Neuneier, Hergert, et al.
- 1994
(Show Context)
Citation Context ...making prior assumptions on the nature of the probability distribution governing the relationships among the participating variables. The use of ANNs as probability distribution estimators is not new =-=[3, 18, 20]-=-, and its application to the task of learning Bayesian belief networks from data has been recently explored in [15]. However, in [15] the ANN estimators were used in the parametrization of the belief ... |

2 |
Properties of learning algorithms for Bayesian belief networks
- Bouckaert
- 1994
(Show Context)
Citation Context ...e corresponding s(x i ; �� i ; D). Once a scoring metric is defined, a search for a high-scoring network structure can be carried out. This search task (in several forms) has been shown to be NP-h=-=ard [4, 8]-=-. Various heuristics have been proposed to find network structures with a high score. One such heuristic is known as K2 [10], and it implements a greedy search over the space of network structures. Th... |

1 |
A construction of Bayesian networks from databases based on an MDL principle
- Suzuky
- 1993
(Show Context)
Citation Context |

1 |
An algorithm for deciding if a set of observerd independencies has a causal explanation
- Verma, Pearl
- 1992
(Show Context)
Citation Context |