## Learning Bayesian Belief Network Classifiers: Algorithms and System (2001)

### Cached

### Download Links

Venue: | Proceedings of 14 th Biennial conference of the |

Citations: | 60 - 3 self |

### BibTeX

@INPROCEEDINGS{Cheng01learningbayesian,

author = {Jie Cheng and Russell Greiner},

title = {Learning Bayesian Belief Network Classifiers: Algorithms and System},

booktitle = {Proceedings of 14 th Biennial conference of the},

year = {2001},

pages = {141--151}

}

### Years of Citing Articles

### OpenURL

### Abstract

This paper investigates the methods for learning predictive classifiers based on Bayesian belief networks (BN) -- primarily unrestricted Bayesian networks and Bayesian multinets. We present our algorithms for learning these classifiers, and discuss how these methods address the overfitting problem and provide a natural method for feature subset selection. Using a set of standard classification problems, we empirically evaluate the performance of various BN-based classifiers. The results show that the proposed BN and Bayes multi-net classifiers are competitive with (or superior to) the best known classifiers, based on both BN and other formalisms; and that the computational time for learning and using these classifiers is relatively small. These results argue that BN based classifiers deserve more attention in the data mining community. 1 In t roduct i on Many tasks -- including fault diagnosis, pattern recognition and forecasting -- can be viewed as classification, as each r...

### Citations

7488 |
Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference
- Pearl
- 1988
(Show Context)
Citation Context ...earch topic in machine learning and data mining. In the past two decades, many algorithms have been developed for learning decision-tree and neural-network classifiers. While Bayesian networkss(BNs) (=-=Pearl 1988)-=- are powerful tools for knowledge representation and inference under conditions of uncertainty, they were not considered as classifiers until the discovery that NaïveBayes, a very simple kind of BNs ... |

4172 |
Pattern Classification and Scene Analysis
- Duda, Hart
- 1973
(Show Context)
Citation Context ...Bayes, Tree augmented Naïve-Bayes (TANs), Bayesian network augmented Naïve-Bayes (BANs), Bayesian multi-nets and general Bayesian networks (GBNs). 2.3.1 Naïve-Bayes A Naïve-Bayes BN, as discussed =-=in (Duda and Hart, 1973),-=- is a simple structure that has the class node as the parent node of all other nodes (see Figure 1). No other connections are allowed in a Naïve-Bayes structure. c x1 x2 x3 x4 Figure 1: A simple Naï... |

1139 | A Bayesian Method for the induction of probabilistic networks from data
- Cooper, Herskovits
- 1992
(Show Context)
Citation Context ...r that structure. As it is trivial to learn the parameters for a given structure that are optimal for a given corpus of complete data – simply use the empirical conditional frequencies from the data=-= (Cooper and Herskovits 1992) -=-– we will focus on learning the BN structure. There are two ways to view a BN, each suggesting a particular approach to learning. First, a BN is a structure that encodes the joint distribution of th... |

1133 | Wrappers for Feature Subset Selection - Kohavi, John - 1997 |

787 |
Uci repository of machine learning databases, machine-readable data repository
- Murphy, Aha
- 1996
(Show Context)
Citation Context ...voiding overfitting and feature subset selection. Section 4 presents and analyzes the experimental results, over a set of standard learning problems obtained from the UCI Machine Learning Repository (=-=Murphy and Aha, 1995)-=-. In Section 5, we give a brief introduction to our BN classifier learning system. Finally, we conclude our work in Section 6. 2 FRAMEWORK 2.1 BAYESIAN NETWORKS A Bayesian network B =< N, A, Θ > is a... |

684 | Approximating discrete probability distributions with dependence trees
- Chow, Liu
- 1968
(Show Context)
Citation Context ...s to form a tree. (Note that in Figure 2, features x1 , x2 , x3 , x4 form a tree; c is the class node.) Learning such structures can be easily achieved by using a variation of the Chow-Liu algorithm (=-=Chow and Liu 1968)-=-. The performance of TAN classifiers is studied in Friedman et al. (1997) and Cheng and Greiner (1999). 2.3.3 BN Augmented Naïve-Bayes (BAN) c x1 x2 x3 x4 Figure 2: A simple TAN structure BAN classif... |

637 | Bayesian network classifiers
- Friedman, Geiger, et al.
- 1997
(Show Context)
Citation Context ...ïve-Bayesian classifiers, following two general approaches: selecting feature subset (Langley and Sage 1994; Kohavi andsJohn 1997; Pazzani 1995) and relaxing independence assumptions (Kononenko 1991;=-= Friedman et al. 1997). -=-Section 2.3.2 to Section 2.3.4 introduce BN models that extend Naïve-Bayes by allowing dependencies among the features. 2.3.2 Tree Augmented Naïve-Bayes (TAN) TAN classifiers extend Naïve-Bayes by ... |

608 |
The computational complexity of probabilistic inference using Bayes belief networks
- Cooper
- 1990
(Show Context)
Citation Context ... a model and BN inference to classify instances. In Section 4, we will demonstrate that learning BN models can be very efficient. As for Bayesian network inference, although it is NP-hard in general (=-=Cooper, 1990-=-), it reduces to simple multiplication when all the values of the dataset attributes are known. 2.2 LEARNING BAYESIAN NETWORKS The two major tasks in learning a BN are: learning the graphical structur... |

362 | An Analysis of Bayesian Classifiers
- Langley, Wayne, et al.
- 1992
(Show Context)
Citation Context ...y, they were not considered as classifiers until the discovery that NaïveBayes, a very simple kind of BNs that assumes the attributes are independent given the class node, are surprisingly effective =-=(Langley et al. 1992-=-). This paper further explores this role of BNs. Section 2 provides the framework of our research, introducing Bayesian networks and describing standard approaches to learning simple Bayesian networks... |

330 | A tutorial on learning Bayesian networks
- Heckerman
- 1995
(Show Context)
Citation Context ...st BN is the one that best fits the data, and leads to the scoring-based learning algorithms, that seek a structure that maximizes the Bayesian, MDL or Kullback-Leibler (KL) entropy scoring function (=-=Heckerman 1995-=-; Cooper and Herskovits 1992). Second, the BN structure encodes a group of conditional independence relationships among the nodes, according to the concept of d-separation (Pearl 1988). This suggests ... |

233 | Induction of selective Bayesian classifiers
- Langley, Sage
- 1994
(Show Context)
Citation Context ...s are not strongly correlated (Langley et al. 1992). In recent years, a lot of effort has focussed on improving Naïve-Bayesian classifiers, following two general approaches: selecting feature subset =-=(Langley and Sage 1994;-=- Kohavi andsJohn 1997; Pazzani 1995) and relaxing independence assumptions (Kononenko 1991; Friedman et al. 1997). Section 2.3.2 to Section 2.3.4 introduce BN models that extend Naïve-Bayes by allowi... |

204 |
Probabilistic reasoning in expert systems: theory and algorithms
- Neapolitan
- 1990
(Show Context)
Citation Context ...stand the network structures and modify them to obtain better predictive models. Bysadding decision nodes and utility nodes, BN models can also be extended to decision networks for decision analysis (=-=Neapolitan, 1990-=-). Applying Bayesian network techniques to classification involves two sub-tasks: BN learning (training) to get a model and BN inference to classify instances. In Section 4, we will demonstrate that l... |

111 |
Semi-naive Bayesian classifier
- Kononenko
- 1991
(Show Context)
Citation Context ... on improving Naïve-Bayesian classifiers, following two general approaches: selecting feature subset (Langley and Sage 1994; Kohavi andsJohn 1997; Pazzani 1995) and relaxing independence assumptions =-=(Kononenko 1991; -=-Friedman et al. 1997). Section 2.3.2 to Section 2.3.4 introduce BN models that extend Naïve-Bayes by allowing dependencies among the features. 2.3.2 Tree Augmented Naïve-Bayes (TAN) TAN classifiers ... |

100 | MLC++: A machine learning library in C
- Kohavi, John, et al.
- 1994
(Show Context)
Citation Context ...loss in discretization and to be able to compare the learning accuracy with other algorithms fairly. When we needed to discretize the continuous features, we used the discretization utility of MLC++ (=-=Kohavi et al. 1994-=-) on the default setting. The datasets we used are summarized in Table 1. Brief descriptions of the five datasets are given below. Table 1: Datasets used in the experiments. Dataset Attributes. Classe... |

94 |
Knowledge representation and inference in similarity networks and Bayesian multinets
- Geiger, Heckerman
(Show Context)
Citation Context ...endence (CI) test. Both papers also investigate the performance of BAN classifiers. x1 x2 c x3 x4 Figure 3: A simple BAN structures2.3.4 Bayesian Multi-net Bayesian Multi-net was first introduced in (=-=Geiger and Heckerman, 1996-=-) and then studied in (Friedman et al., 1997) as a type of classifiers. A Bayesian multi-net is composed of the prior probability distribution of the class node and a set of local networks each corres... |

85 | A Bayesian approach to causal discovery - Heckerman, Meek, et al. - 1999 |

83 | Comparing Bayesian Network Classifiers
- Cheng, Greiner
- 1999
(Show Context)
Citation Context ...des as an ordinary node (see Figure 5), it is not necessary a parent of all the feature nodes. The learning methods and thesperformance of GBN for classification are studied in (Friedman et al. 1997; =-=Cheng and Greiner 1999-=-). By comparing GBN and Bayesian multi-net we can see that GBN assumes that there is a single underlying joint probability distribution of the dataset; while multi-net assumes that there are different... |

74 | Searching for dependencies in Bayesian classifiers
- Pazzani
- 1995
(Show Context)
Citation Context ... 1992). In recent years, a lot of effort has focussed on improving Naïve-Bayesian classifiers, following two general approaches: selecting feature subset (Langley and Sage 1994; Kohavi andsJohn 1997;=-= Pazzani 1995)-=- and relaxing independence assumptions (Kononenko 1991; Friedman et al. 1997). Section 2.3.2 to Section 2.3.4 introduce BN models that extend Naïve-Bayes by allowing dependencies among the features. ... |

66 | Learning belief networks from data: An information theory based approach
- Cheng, Bell, et al.
- 1997
(Show Context)
Citation Context ...ships among the attributes and use these relationships as constraints to construct a BN. These algorithms are referred as CI-based algorithms or constraint-based algorithms (Spirtes and Glymour 1996; =-=Cheng et al. 1997-=-a). Heckerman et al. (1997) compare these two general learning, and show that the scoring-based methods often have certain advantages over the CI-based methods, in terms of modeling a distribution. Ho... |

51 | An algorithm for Bayesian belief network construction from data
- Cheng, Bell, et al.
- 1997
(Show Context)
Citation Context ...ships among the attributes and use these relationships as constraints to construct a BN. These algorithms are referred as CI-based algorithms or constraint-based algorithms (Spirtes and Glymour 1996; =-=Cheng et al. 1997-=-a). Heckerman et al. (1997) compare these two general learning, and show that the scoring-based methods often have certain advantages over the CI-based methods, in terms of modeling a distribution. Ho... |

48 | Learning Bayesian Nets that perform well - Greiner, Grove, et al. - 1996 |