## Building Classifiers Using Bayesian Networks (1996)

### Cached

### Download Links

Venue: | In Proceedings of the thirteenth national conference on artificial intelligence |

Citations: | 79 - 2 self |

### BibTeX

@INPROCEEDINGS{Friedman96buildingclassifiers,

author = {Nir Friedman},

title = {Building Classifiers Using Bayesian Networks},

booktitle = {In Proceedings of the thirteenth national conference on artificial intelligence},

year = {1996},

pages = {1277--1284},

publisher = {AAAI Press}

}

### Years of Citing Articles

### OpenURL

### Abstract

Recent work in supervised learning has shown that a surprisingly simple Bayesian classifier with strong assumptions of independence among features, called naive Bayes, is competitive with state of the art classifiers such as C4.5. This fact raises the question of whether a classifier with less restrictive assumptions can perform even better. In this paper we examine and evaluate approaches for inducing classifiers from data, based on recent results in the theory of learning Bayesian networks. Bayesian networks are factored representations of probability distributions that generalize the naive Bayes classifier and explicitly represent statements about independence. Among these approaches we single out a method we call Tree Augmented Naive Bayes (TAN), which outperforms naive Bayes, yet at the same time maintains the computational simplicity (no search involved) and robustness which are characteristic of naive Bayes. We experimentally tested these approaches using benchmark problems from the U. C. Irvine repository, and compared them against C4.5, naive Bayes, and wrapper-based feature selection methods. 1

### Citations

9061 | Introduction to Algorithms - Cormen, Leiserson, et al. - 2001 |

7493 |
Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference
- Pearl
- 1988
(Show Context)
Citation Context ...istic assumptions about independence. In order to effectively tackle this problem we need an appropriate language and effective machinery to represent and manipulate independences. Bayesian networks (=-=Pearl 1988-=-) provide both. Bayesian networks are directed acyclic graphs that allow for efficient and effective representation of the joint probability distributions over a set of random variables. Each vertex i... |

5438 |
C4.5: Programs for Machine Learning
- Quinlan
- 1993
(Show Context)
Citation Context ...ckering 1995). 9 In our experiments we also tried smoothed version of naive Bayes. This did not lead to significant improvement over the unsmoothed naive Bayes. Finally, we also compared TAN to C4.5 (=-=Quinlan 1993-=-), a state of the art decision-tree learning system, and to the selective naive Bayesian classifier (Langley & Sage 1994; John, Kohavi, & Pfleger 1995). The later approach searches for the subset of a... |

953 | Learning Bayesian networks: The combination of knowledge and statistical data - Heckerman, Geiger, et al. - 1995 |

853 | A study of cross-validation and bootstrap for accuracy estimation and model selection
- Kohavi
- 1995
(Show Context)
Citation Context ...his accuracy using the MLC ��� system (Kohavi et al. 1994). Accuracy was evaluated using the holdout method for the larger datasets, and using 5-fold cross validation (using the methods described in (=-=Kohavi 1995-=-)) for the smaller ones. Since we do not deal, at the current time, with missing data we had removed instances with missing values from the datasets. Currently we also do not handle continuous attribu... |

787 |
UCI repository of machine learning databases
- Murphy, Aha
- 1994
(Show Context)
Citation Context ...ate fellowship and NSF Grant IRI-9503109. A Experimental Methodology and Results We run our experiments on the 22 datasets listed in Table 1. All of the datasets are from the U. C. Irvine repository (=-=Murphy & Aha 1995), with the exceptio-=-n of "mofn-3-710 " and "corral". These two artificial datasets were used for the evaluation of feature subset selection methods by (John, Kohavi, & Pfleger 1995). All these dataset... |

703 |
Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning
- Fayyad, Irani
- 1993
(Show Context)
Citation Context ...values from the datasets. Currently we also do not handle continuous attributes. Instead, in each invocation of the learning routine, the dataset was pre-discretized using a variant of the method of (=-=Fayyad & Irani 1993-=-) using only the training data, in the manner described in (Dougherty, Kohavi, & Sahami 1995). These preprocessing stages where carried out by the MLC++ system. We note that experiments with the vario... |

685 | Approximating discrete probability distribution with dependence trees
- Chow, Liu
- 1968
(Show Context)
Citation Context ...w and Liu show that there is a simple procedure that constructs the maximal log-probability tree. Let n be the number of random variables and N be the number of training instances. Then Theorem 4.1: (=-=Chow & Lui 1968-=-) There is a procedure of time complexity O(n 2 \Delta N ), that constructs the tree structure BT that maximizes LL(BT jD). The procedure of Chow and Liu can be summarized as follows. 1. Compute the m... |

639 | Irrelevant Features and the Subset Selection Problem - John, Kohavi, et al. - 1994 |

457 | Supervised and Unsupervised Discretization of Continuous Features - Dougherty, Kohavi, et al. - 1995 |

362 | An Analysis of Bayesian Classifiers - Langley, Wayne, et al. - 1992 |

330 | A tutorial on learning Bayesian networks
- Heckerman
- 1995
(Show Context)
Citation Context ...sed as a classifier maximizes the prediction rate? Learning Bayesian networks from data is a rapidly growing field of research that has seen a great deal of activity in recent years, see for example (=-=Heckerman 1995-=-; Heckerman, Geiger, & Chickering 1995; Lam & Bacchus 1994). This is a form unsupervised learning in the sense that the learner is not guided by a set of informative examples. The objective is to indu... |

233 | Induction of selective Bayesian classifiers
- Langley, Sage
- 1994
(Show Context)
Citation Context ...nt improvement over the unsmoothed naive Bayes. Finally, we also compared TAN to C4.5 (Quinlan 1993), a state of the art decision-tree learning system, and to the selective naive Bayesian classifier (=-=Langley & Sage 1994-=-; John, Kohavi, & Pfleger 1995). The later approach searches for the subset of attributes over which naive Bayes has the best performance. The results displayed in Figure 5 and Table 1, show that TAN ... |

199 | Learning Bayesian belief networks: An approach based on the MDL principle
- Lam, Bacchus
- 1994
(Show Context)
Citation Context ...rning Bayesian networks from data is a rapidly growing field of research that has seen a great deal of activity in recent years, see for example (Heckerman 1995; Heckerman, Geiger, & Chickering 1995; =-=Lam & Bacchus 1994). This is-=- a form unsupervised learning in the sense that the learner is not guided by a set of informative examples. The objective is to induce a network (or a set of networks) that "best describes" ... |

100 | MLC++: A machine learning library in C
- Kohavi, John, et al.
- 1994
(Show Context)
Citation Context ... on the percentage of successful predictions on the test sets of each dataset. We estimate the prediction accuracy for each classifier as well as the variance of this accuracy using the MLC++ system (=-=Kohavi et al. 1994-=-). Accuracy was evaluated using the holdout method for the larger datasets, and using 5-fold cross validation (using the methods described in (Kohavi 1995)) for the smaller ones. Since we do not deal,... |

74 | Searching for dependencies in Bayesian classifiers - Pazzani - 1995 |

70 | Discretization of continuous attributes while learning Bayesian networks
- Friedman, Goldszmidt
- 1996
(Show Context)
Citation Context ...the MDL method is to find a compact encoding of the training set D. We do not reproduce the derivation of the the MDL scoring function here, but merely state it. The interested reader should consult (=-=Friedman & Goldszmidt 1996-=-; Lam & Bacchus 1994). The MDL score of a network B given D, written MDL(BjD) is MDL(BjD) = 1 2 log N jBj \Gamma LL(BjD) (4) where jBj is the number of parameters in the network. The first term simply... |

32 |
An entropy-based learning algorithm of Bayesian conditional trees
- Geiger
- 1992
(Show Context)
Citation Context ...hird step has complexity of O(n 2 logn). Since we usually have that N ? log n, we get the resulting complexity. This result can be adapted to learn the maximum likelihoodsTAN structure. Theorem 4.2: (=-=Geiger 1992-=-) There is a procedure of time complexity O(n 2 \Delta N ) that constructs the TAN structure B T that maximize LL(B T jD). The procedure is very similar to the procedure described above when applied t... |

2 |
A study of cross-validationand bootstrap for accuracy estimation and model selection
- Kohavi
- 1995
(Show Context)
Citation Context ... this accuracy using the MLC++ system (Kohavi et al. 1994). Accuracy was evaluated using the holdout method for the larger datasets, and using 5-fold cross validation (using the methods described in (=-=Kohavi 1995-=-)) for the smaller ones. Since we do not deal, at the current time, with missing data we had removed instances with missing values from the datasets. Currently we also do not handle continuous attribu... |