## The Performance of Bayesian Network Classifiers Constructed Using Different Techniques (2003)

Venue: | In Working notes of the ECML/PKDD-03 workshop on |

Citations: | 8 - 0 self |

### BibTeX

@INPROCEEDINGS{Madden03theperformance,

author = {Michael G Madden},

title = {The Performance of Bayesian Network Classifiers Constructed Using Different Techniques},

booktitle = {In Working notes of the ECML/PKDD-03 workshop on},

year = {2003},

pages = {59--70}

}

### OpenURL

### Abstract

This paper presents empirical results for classification using Bayesian networks constructed using the K2 Bayesian metric, and compares these results with those of other researchers who have used Bayesian networks constructed using the MDL score and using conditional independence tests. There are significant disparities in these results, which is somewhat paradoxical as it is has been shown that the MDL score is asymptotically equivalent to the Bayesian metric, and that structure search based on maximising a score is equivalent to structure search based on conditional independence tests. To resolve this paradox, we analyse the differences in methods used by different researchers to identify the source of the disparities. We conclude that differences in performance are attributable to differences in parameter estimation and structure search heuristics, rather than to differences in the scores/tests used.

### Citations

7493 |
Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference
- Pearl
- 1988
(Show Context)
Citation Context ...cedure of Cheng & Greiner comes with guarantees: given a dataset that is large enough and has a DAG-isomorphic probability distribution, their CBL1 algorithm is guaranteed to generate the perfect map =-=[17]-=- of the underlying dependency model. Node ordering can of course affect heuristic search, particularly for K2 and CBL1 as they restrict candidate parents for a node to appear before it in the ordering... |

3085 |
UCI repository of machine learning databases
- Blake, Merz
- 1998
(Show Context)
Citation Context ...x 3 (c) General BN Figure 1: Illustration of Naive Bayes, TAN and General BN Structures x 4s3 Experiments For this work, 18 datasets were selected from the UCI repository of machine learning datasets =-=[1]-=-, as listed in Table 1. Datasets 1-15 are included in the analyses of Friedman et al. [10] and datasets 13-18 are included in the analyses of Cheng & Greiner [3]. A training set with 2/3 or 4/5 of the... |

1140 | A Bayesian Method for the Induction of Probabilistic Networks from Data
- Cooper, Herskovits
- 1992
(Show Context)
Citation Context ... an equivalent procedure based on maximising a logarithmic score can be specified. Furthermore, as reported in Section 3 below, we have found that Bayesian networks constructed using the K2 algorithm =-=[7]-=- perform well in classification on benchmark datasets, in contrast with the results of Friedman et al., even though the MDL score is asymptotically equivalent to K2's Bayesian score [10, 12]. As well ... |

903 | A Tutorial on Learning With Bayesian Networks
- Heckerman
- 1995
(Show Context)
Citation Context ...e K2 algorithm [7] perform well in classification on benchmark datasets, in contrast with the results of Friedman et al., even though the MDL score is asymptotically equivalent to K2's Bayesian score =-=[10, 12]-=-. As well as evaluating the classification performance of the general Bayesian network (GBN), Friedman et al. [10] and Cheng & Greiner [3] evaluate thesperformance of Naive Bayes (NB) and Tree-Augment... |

638 | Bayesian network classifiers
- Friedman, Geiger, et al.
- 1997
(Show Context)
Citation Context ...erences in parameter estimation and structure search heuristics, rather than to differences in the scores/tests used. 1 Introduction In their well-known paper on Bayesian classifiers, Friedman et al. =-=[10]-=- showed that general Bayesian networks constructed using the minimal description length (MDL) score tend to perform badly in benchmark classification tasks, in several cases performing worse than Naiv... |

457 | Supervised and Unsupervised Discretization of Continuous Features
- Dougherty, Kohavi, et al.
- 1995
(Show Context)
Citation Context ...preprocessed in the same way as those authors did: where datasets had continuous variables, they were discretized using the discretization utility in MLC++ [14] with its default entropy-based setting =-=[9], and any -=-cases with missing values were removed from datasets, except for the Mushroom dataset where they were assigned the value "unknown" (as done by Cheng & Greiner). Table 1: Data sets, training ... |

173 | Optimal structure identification with greedy search
- Chickering
- 2002
(Show Context)
Citation Context ...been proposed in the last decade for inductive learning of Bayesian networks. Recent developments include the Three-Phase Dependency Analysis algorithm [5] and the Greedy Equivalence Search algorithm =-=[6]-=-. Cheng et al. [5] provide a good summary and comparison of earlier algorithms. This section briefly summarises three approaches that have been used as the basis for Bayesian network classifiers, and ... |

83 | Comparing bayesian network classifiers
- Cheng, Greiner
- 1999
(Show Context)
Citation Context ...ructed using the minimal description length (MDL) score tend to perform badly in benchmark classification tasks, in several cases performing worse than Naive Bayes. On the other hand, Cheng & Greiner =-=[3, 4]-=- have presented results demonstrating that Bayesian networks constructed using conditional independence (CI) tests perform well at classification tasks. While it is tempting to conclude that the CI ap... |

79 | Building Classifiers Using Bayesian Networks - Friedman, Goldszmidt - 1996 |

66 | Learning belief networks from data: An information theory based approach
- Cheng, Bell, et al.
- 1997
(Show Context)
Citation Context ...ng of Bayesian Networks Several algorithms have been proposed in the last decade for inductive learning of Bayesian networks. Recent developments include the Three-Phase Dependency Analysis algorithm =-=[5]-=- and the Greedy Equivalence Search algorithm [6]. Cheng et al. [5] provide a good summary and comparison of earlier algorithms. This section briefly summarises three approaches that have been used as ... |

60 | Learning Bayesian Belief Network Classifiers: Algorithms and System
- Cheng, Greiner
- 2001
(Show Context)
Citation Context ...lihood of B given D. To calculate LL(B | D), let P ˆ D( ⋅) be the empirical probability measure defined by frequencies of events in D. Then: LL( B | D) = N ˆ ( , ) log( ˆ ∑ ∑ PD Xi Π i PD( X=-=i Π i)) . (4) -=-i Xi , Π i The search heuristic used by Friedman et al. is to start with the empty network and successively apply local operations that greedily reduce the MDL score until a local minimum is found. T... |

51 | An algorithm for Bayesian belief network construction from data
- Cheng, Bell, et al.
- 1997
(Show Context)
Citation Context ...ependence Approach Cheng & Greiner [3] construct a Bayesian network structure by identifying the conditional independence relationships among the nodes in the network. ijksThey use the CBL1 algorithm =-=[2]-=-, a precursor to the TPDA algorithm [5]. The basis for this algorithm is testing whether two nodes xi and xj are conditionally independent given a set of nodes c. This is determined by testing whether... |

47 | Graphical Models: Selecting Causal and Statistical Models - Meek - 1997 |

40 |
Data Mining Using MLC++: A
- Kohavi, Sommerfield, et al.
- 1996
(Show Context)
Citation Context ...ner ("CG"). In this work, all datasets were preprocessed in the same way as those authors did: where datasets had continuous variables, they were discretized using the discretization utility=-= in MLC++ [14] with its -=-default entropy-based setting [9], and any cases with missing values were removed from datasets, except for the Mushroom dataset where they were assigned the value "unknown" (as done by Chen... |

11 | Learning the structure of augmented bayesian classifiers
- Keogh, Pazzani
(Show Context)
Citation Context ...ken, Friedman et al. and Cheng & Greiner propose and analyse alternative structures based on Bayesian networks that outperform the algorithms discussed in this paper. More recently, Keogh and Pazzani =-=[13]-=- propose an algorithm for constructing TAN-type classifiers using classification accuracy rather than maximumlikelihood scores. Nonetheless, the results presented here challenge the claim, quite often... |

5 |
Conditions under which conditional independence and scoring methods lead to identical selection of Bayesian network models
- Cowell
- 2001
(Show Context)
Citation Context ... TAN classifiers built using the different scores produce remarkably similar results, provided that equivalent forms of parameter estimation are used. This supports the theoretical analysis of Cowell =-=[8]-=-, showing that structure search based on CI tests and structure search based on maximising scoring metrics are equivalent. 4.4 Structure Search The methods for structure search used by the different a... |