## Comparing Bayesian Network Classifiers (1999)

### Cached

### Download Links

- [www.cs.ualberta.ca]
- [www.cs.ualberta.ca]
- [www.cs.ualberta.ca]
- DBLP

### Other Repositories/Bibliography

Citations: | 84 - 6 self |

### BibTeX

@INPROCEEDINGS{Cheng99comparingbayesian,

author = {Jie Cheng and Russell Greiner},

title = {Comparing Bayesian Network Classifiers},

booktitle = {},

year = {1999},

pages = {101--108},

publisher = {Morgan Kaufmann Publishers}

}

### Years of Citing Articles

### OpenURL

### Abstract

In this paper, we empirically evaluate algorithms for learning four types of Bayesian network (BN) classifiers -- Naïve-Bayes, tree augmented Naïve-Bayes, BN augmented Naïve-Bayes and general BNs, where the latter two are learned using two variants of a conditional-independence (CI) based BNlearning algorithm. Experimental results show the obtained classifiers, learned using the CI based algorithms, are competitive with (or superior to) the best known classifiers, based on both Bayesian networks and other formalisms; and that the computational time for learning and using these classifiers is relatively small. Moreover, these results also suggest a way to learn yet more effective classifiers; we demonstrate empirically that this new algorithm does work as expected. Collectively, these results argue that BN classifiers deserve more attention in machine learning and data mining communities. 1 INTRODUCTION Many tasks -- including fault diagnosis, pattern recognition and forecasting -- c...

### Citations

7407 |
Probabilistic reasoning in intelligent systems: Networks of plausible inference
- Pearl
- 1988
(Show Context)
Citation Context ...earch topic in machine learning and data mining. In the past two decades, many algorithms have been developed for learning decision-tree and neural-network classifiers. While Bayesian networks (BNs) (=-=Pearl 1988) ar-=-e powerful tools for knowledge representation and inference under conditions of uncertainty, they were not considered as classifiers until the discovery that Na��ve-Bayes, a very simple kind of BN... |

4120 |
Pattern classification and scene analysis
- Duda, Hart
- 1973
(Show Context)
Citation Context ...unately, for unrestricted BN learning, no such connection can be found between the scoring-based and CI-based methods. 2.3 SIMPLE BN CLASSIFIERS 2.3.1 Na��ve-Bayes A Na��ve-Bayes BN, as discus=-=sed in (Duda and Hart, 1973), i-=-s a simple structure that has the classification node as the parent node of all other nodes (see Figure 1). No other connections are allowed in a Na��ve-Bayes structure. c x1 x2 x3 x4 Figure 1: A ... |

1127 | A Bayesian method for the induction of probabilistic networks from data. Machine Learning 9:309–347
- Cooper, Herskovits
- 1992
(Show Context)
Citation Context ... that structure. As it is trivial to learn the parameters for a given structure that are optimal for a given corpus of complete data -- simply use the empirical conditional frequencies from the data (=-=Cooper and Herskovits 1992-=-) -- we will focus on learning the BN structure. There are two ways to view a BN, each suggesting a particular approach to learning. First, a BN is a structure that encodes the joint distribution of t... |

1116 | Wrappers for feature subset selection - Kohavi, John - 1997 |

770 |
UCI Repository of Machine Learning Databases Available at: http:// www. ics.uci.edu/∼mlearn/MLRepository.html
- Murphy, Aha
- 1992
(Show Context)
Citation Context ...hm (based on conditional-independence tests) to learn GBNs and BANs. We empirically compared these classifiers with TAN and Na��ve-Bayes using eight datasets from the UCI Machine Learning Reposito=-=ry (Murphy and Aha, 1995-=-). Our results motivate a new type of classifier (a wrapper of the GBN and the BAN), which is also empirically evaluated. 3 LEARNING BAYESIAN NETWORK CLASSIFIERS This section presents algorithms for l... |

679 | Approximating Discrete Probability Distributions with Dependence Trees
- Chow, Liu
- 1968
(Show Context)
Citation Context ...mutual information tests (N is the number of attributes in the dataset) and is linear in the number of cases. The efficiency is achieved by directly extending the ChowLiu tree construction algorithm (=-=Chow and Liu 1968-=-) to a three-phase BN learning algorithm: drafting, which is essentially the Chow-Liu algorithm, thickening, which adds edges to the draft, and thinning, which verifies the necessity of each edge. Giv... |

632 | Bayesian network classifiers
- Friedman, Geiger, et al.
- 1997
(Show Context)
Citation Context ... are particularly interested in the following questions. 1. Since "using MDL (or other nonspecialized scoring functions) for learning unrestricted Bayesian networks may result in poor classifier.=-=.." (Friedman et al. 1997), a natur-=-al question is "Will non-scoring methods (i.e., condition independence (CI) test based methods, such as mutual information test and chi-squared test based methods) learn good classifiers?" 2... |

358 | An analysis of Bayesian classifiers
- Langley, Iba, et al.
- 1992
(Show Context)
Citation Context ...e not considered as classifiers until the discovery that Na��ve-Bayes, a very simple kind of BNs that assumes the attributes are independent given the classification node, are surprisingly effecti=-=ve (Langley et al. 1992-=-). This paper further explores this role of BNs. Section 2 provides the framework of our research, describing standard approaches to learning simple Bayesian networks, then motivating our exploration ... |

322 | Learning Bayesian Networks: The
- Heckerman, Geiger, et al.
- 1995
(Show Context)
Citation Context ...st BN is the one that best fits the data, and leads to the scoring-based learning algorithms, that seek a structure that maximizes the Bayesian, MDL or Kullback-Leibler (KL) entropy scoring function (=-=Heckerman 1995-=-; Cooper and Herskovits 1992). Second, the BN structure encodes a group of conditional independence relationships among the nodes, according to the concept of d-separation (Pearl 1988). This suggests ... |

228 | Induction of selective Bayesian classifiers - Langley, Sage - 1994 |

111 |
Semi-naive Bayesian classifier
- Kononenko
- 1991
(Show Context)
Citation Context ...ifier, etc. Pazzani's algorithm (Pazzani 1995) performs feature joining as well as feature selection to improve the Na��ve-Bayesian classifier. Relaxing Independence Assumption Kononenko's algorit=-=hm (Kononenko 1991-=-) partitions the attributes into disjoint groups and assumes independence only between attributes of different groups. Friedman et al. (1997) studied TAN, which allows tree-like structures to be used ... |

98 | MLC++: A machine learning library in C
- Kohavi, John, et al.
- 1994
(Show Context)
Citation Context ...loss in discretization and to be able to compare the learning accuracy with other algorithms fairly. When we needed to discretize the continuous features, we used the discretization utility of MLC++ (=-=Kohavi et al. 1994-=-) on the default setting. The datasets we used are summarized in Table 1. (CV5 stands for five-fold cross validation.) Table 1: Datasets used in the experiments. Instances Dataset Attributes. Classes ... |

84 | A bayesian approach to causal discovery - Heckerman, Meek, et al. - 1997 |

73 | Search for dependencies in Bayesian classifiers
- Pazzani
- 1996
(Show Context)
Citation Context ... estimates, to find a subset of attributes. Their algorithm can wrap around any classifiers, including either the decision tree classifiers or the Na��veBayesian classifier, etc. Pazzani's algorit=-=hm (Pazzani 1995) pe-=-rforms feature joining as well as feature selection to improve the Na��ve-Bayesian classifier. Relaxing Independence Assumption Kononenko's algorithm (Kononenko 1991) partitions the attributes int... |

67 | Learning Belief Networks from Data: An Information Theory Based Approach
- Cheng, Bell, et al.
- 1997
(Show Context)
Citation Context ...nships among the attributes and use these relationships as constraints to construct a BN. These algorithms are referred as CIbasedsalgorithms or constraint-based algorithms (Spirtes and Glymour 1996; =-=Cheng et al. 1997-=-a). Heckerman et al. (1997) compare these two general learning, and show that the scoring-based methods often have certain advantages over the CI-based methods, in terms of modeling a distribution. Ho... |

52 | An algorithm for Bayesian belief network construction from data
- Cheng, Bell, et al.
- 1997
(Show Context)
Citation Context ...nships among the attributes and use these relationships as constraints to construct a BN. These algorithms are referred as CIbasedsalgorithms or constraint-based algorithms (Spirtes and Glymour 1996; =-=Cheng et al. 1997-=-a). Heckerman et al. (1997) compare these two general learning, and show that the scoring-based methods often have certain advantages over the CI-based methods, in terms of modeling a distribution. Ho... |

52 | Efficient learning of selective Bayesian network classifiers
- Singh, Provan
- 1995
(Show Context)
Citation Context ...algorithm. Given the theoretical analysis (Friedman et al. 1997) and the empirical comparison (results using scoringbased methods on some of the data sets we use are reported in Friedman et al. 1997; =-=Singh and Provan 1996-=-), we believe that methods based on CI tests (such as mutual information tests) are more suitable for BN classifier learning than the more-standard scoringbased methods. Note, in addition, that such m... |

50 | Learning Bayesian nets that perform well - Greiner, Grove, et al. - 1997 |

42 | Constructor: A system for the induction of probabilistic models
- Fung, Crawford
- 1990
(Show Context)
Citation Context ...hold selection based on the prediction accuracy. . Wrapping the GBN and BAN together and returning the winner. (Another algorithm for automatic threshold selection on BN construction is presented in [=-=Fung and Crawford 1990-=-].) We therefore propose a wrapper algorithm that incorporates these two ideas. 1. Partition the input training set into internal training set and internal holdout set. 2. Call GBN-learner using diffe... |