## Finite Mixture Model of Bounded Semi-Naive Bayesian Networks for Classification (2003)

Venue: | In Joint 13th International Conference on Artificial Neural Network (ICANN-2003) and 10th International Conference on Neural Information Processing (ICONIP-2003), Long paper, Lecture Notes in Computer Science |

Citations: | 3 - 1 self |

### BibTeX

@INPROCEEDINGS{Huang03finitemixture,

author = {Kaizhu Huang and Irwin King and Michael R. Lyu},

title = {Finite Mixture Model of Bounded Semi-Naive Bayesian Networks for Classification},

booktitle = {In Joint 13th International Conference on Artificial Neural Network (ICANN-2003) and 10th International Conference on Neural Information Processing (ICONIP-2003), Long paper, Lecture Notes in Computer Science},

year = {2003},

pages = {115--122},

publisher = {Springer}

}

### OpenURL

### Abstract

The Naive Bayesian (NB) network classifier, a probabilistic model with a strong assumption of conditional independence among features, shows a surprisingly competitive prediction performance even when compared with some state-of-the-art classifiers. With a looser assumption of conditional independence, the Semi-Naive Beyesian (SNB) network classifier is superior to NB classifiers when features are combined. However, the problem for SNB is that its structure is still strongly constrained which may generate inaccurate distributions for some datasets. A natural progression to improve SNB is to extend it using the mixture approach. However, in obtaining the final structure, traditional SNBs use the heuristic approaches to learn the structure from data locally. On the other hand, ExpectationMaximization (EM) method is used in the mixture approach to obtain the structure iteratively. The extension is difficult to integrate the local heuristic into the maximization step since it may not convergence. In this paper we firstly develop a Bounded Semi-Naive Bayesian network (B-SNB) model, which contains the restriction on the number of variables that can be joined in a combined feature. As opposed to local property of the traditional SNB models, our model enjoys a global nature and maintains a polynomial time cost. Overcoming the difficulty of integrating SNBs into the mixture model, we then propose an algorithm to extend it into a finite mixture structure, named Mixture of Bounded Semi-Naive Bayesian network (MBSNB). We give theoretical derivations, outline of the algorithm, analysis of algo- rithm and a set of experiments to demonstrate the usefulness of MBSNB in some classification tasks. The novel finite MBSNB network shows good speed up, ability to converge and ...

### Citations

8953 | The Nature of Statistical Learning Theory
- Vapnik
- 2000
(Show Context)
Citation Context ...o find a mapping function F : ℜ n → Ω to satisfy F (x i ) = C i . To handle this problem, many methods have been proposed. Among them are Statistical Neural Networks [23], Support Vector Machine=-=s [2] [36]-=- and Decision trees [29]. Naive Bayesian Network (NB) [8] [20] shows a good performance in dealing with this problem even when compared with the state-of-the-art classifiers such as C4.5. With an inde... |

8074 | Maximum likelihood from incomplete data via the EM algorithm
- Dempster, Laird, et al.
- 1977
(Show Context)
Citation Context ... Bayesian network (MBSNB) model, then we give the optimization problem of the MBSNB model. Finally we conduct theoretical induction to provide the optimization algorithm for this problem under the EM =-=[18] framewor-=-k. Definition 2: Mixture of Bounded Semi-Naive Bayesian network model is defined as a distribution of the form: i=1 r� Q(x) = λkS k=1 k (x) (12)swhere λk ≥ 0, k = 1, . . . , r, � rk=1 λk = 1,... |

7042 |
Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference
- Pearl
- 1988
(Show Context)
Citation Context ... search an independence or dependence relationship among the attributes rather than impose a strong assumption on the attributes. This is the main idea of so-called unrestricted Bayesian Network (BN) =-=[28]-=-. Unfortunately, empirical results have demonstrated that searching an unrestricted BN structure does not show a better result than NB. This is partly because that unrestricted BN structures are prone... |

4920 |
C4.5: Program for Machine Learning
- Quinlan
- 1993
(Show Context)
Citation Context ...n F : ℜ n → Ω to satisfy F (x i ) = C i . To handle this problem, many methods have been proposed. Among them are Statistical Neural Networks [23], Support Vector Machines [2] [36] and Decision =-=trees [29]-=-. Naive Bayesian Network (NB) [8] [20] shows a good performance in dealing with this problem even when compared with the state-of-the-art classifiers such as C4.5. With an independency assumption amon... |

2277 | A tutorial on support vector machines for pattern recognition
- Burges
- 1998
(Show Context)
Citation Context ...is to find a mapping function F : ℜ n → Ω to satisfy F (x i ) = C i . To handle this problem, many methods have been proposed. Among them are Statistical Neural Networks [23], Support Vector Mac=-=hines [2]-=- [36] and Decision trees [29]. Naive Bayesian Network (NB) [8] [20] shows a good performance in dealing with this problem even when compared with the state-of-the-art classifiers such as C4.5. With an... |

1075 | Herskovitz: A Bayesian Method for the Induction
- Cooper, E
- 1992
(Show Context)
Citation Context ... networks classifiers [11], Limited Bayesian network classifiers [30] and Adjusted probability Naive Bayesian networks classifiers [37]; And for inducing unrestricted Bayesian network classifiers, K2 =-=[5] i-=-s a popular algorithm. Since our focus is in the restricted BNs, in the following, we will first give a short review about restricted ones and then we shift the focus on the mixture issues. NB’s suc... |

752 | A study of cross-validation and bootstrap for accuracy estimation and model selection
- Kohavi
- 1995
(Show Context)
Citation Context ...ed datasets in this paper. Detailed information about these datasets can be seen in [26]. To examine the performance of our approaches in this paper, we take the 5-folder Cross Validation (CV) method =-=[15] to perf-=-orm testing for some small or medium size datasets. TABLE II DESCRIPTION OF DATA SETS USED IN THE EXPERIMENTS Dataset ♯Variables ♯Class ♯Train ♯Test Xor 6 2 2000 CV-5 Vote 15 2 435 CV-5 Tic-ta... |

738 |
UCI repository of machine learning databases
- Murphy, Aha
- 1992
(Show Context)
Citation Context .... Experimental Setup 1) Datasets: To evaluate the performance of our B-SNB and MBSNB models, we conduct a series of experiments on 7 databases, among which 6 come from UCI Machine learning Repository =-=[26]-=- and the others1 dataset called Xor is generated synthetically. Xor dataset is synthetically generated, in which, the class variable is determined by first two binary attributes and other four binary ... |

637 | Approximating discrete probability distributions with dependence trees
- Chow, Liu
- 1968
(Show Context)
Citation Context ...e a significant benefit on naturally occurring databases [25]. Friedman et al. [11] developed so-called Tree Augmented Naive Bayesian (TAN) network classifier which integrated the Chow-Liu tree (CLT) =-=[4]-=- techniques with NB. Chow-Liu tree is a kind of tree structure in which each node (attribute) is assumed to have only one node (attribute) as its parents. In such a configuration, a global optimal tre... |

587 | Bayesian network classifiers
- Friedman, Geiger, et al.
(Show Context)
Citation Context ...fiers are Semi-Naive Bayesian networks classifiers [17] [25], Selective Naive Bayesian network classifiers [21], Recursive Bayesian classifier [19], Tree Augmented Naive Bayesian networks classifiers =-=[11]-=-, Limited Bayesian network classifiers [30] and Adjusted probability Naive Bayesian networks classifiers [37]; And for inducing unrestricted Bayesian network classifiers, K2 [5] is a popular algorithm... |

466 |
Mixture Models: Inference and Applications to Clustering
- McLachlan, Basford
- 1988
(Show Context)
Citation Context ...nate its components: SNB structures. Mixture approaches have achieved great successes in expanding its restricted components expressive power and bring in a better performance. Gaussian Mixture Model =-=[22]-=- is such an example. To our best knowledge, compared with the popularity in seaching unrestricted BN to relax the constraints of SNB, there is no one to do mixture upgrading work on SNB structure. Thi... |

408 | Supervised and unsupervised discretization of continuous features
- Dougherty, Kohavi, et al.
- 1995
(Show Context)
Citation Context ... our algorithms can only handle discrete attributes. We discretized these numeric attributes into five equal intervals. Although this approach performs slightly less accurate than a more informed one =-=[7], -=-it is sufficient to evaluate the performance of main approaches in this paper. • Zero counts are obtained when a given class and attribute value never occur in the training dataset. This may cause s... |

334 | An analysis of Bayesian classifiers
- Langley, Iba, et al.
- 1992
(Show Context)
Citation Context ...i . To handle this problem, many methods have been proposed. Among them are Statistical Neural Networks [23], Support Vector Machines [2] [36] and Decision trees [29]. Naive Bayesian Network (NB) [8] =-=[20]-=- shows a good performance in dealing with this problem even when compared with the state-of-the-art classifiers such as C4.5. With an independency assumption among the attributes, when given the class... |

224 | The EM algorithm for mixtures of factor analyzers
- Ghahramani, Hinton
- 1996
(Show Context)
Citation Context ...re. See Figure 3, Z is a choice variable, which is used to condition the component restricted Bayesian networks. Learning the mixture structure and parameters, is often done through the EM algorithms =-=[12]-=- [22]. To maintain the convergence of EM we need to find a globally optimal or at least sub-optimal algorithm for constructing the component restricted Bayesian networks. This global optimality will m... |

210 | Induction of selective Bayesian classifiers
- Langley, Sage
- 1994
(Show Context)
Citation Context ... developed, including restricted types and unrestricted types. Among the restricted BN classifiers are Semi-Naive Bayesian networks classifiers [17] [25], Selective Naive Bayesian network classifiers =-=[21]-=-, Recursive Bayesian classifier [19], Tree Augmented Naive Bayesian networks classifiers [11], Limited Bayesian network classifiers [30] and Adjusted probability Naive Bayesian networks classifiers [3... |

155 | Learning Bayesian networks is np-complete
- Chickering
- 1996
(Show Context)
Citation Context ...e classifier can classify the training dataset perfect while it shows a low prediction accuracy for new data.). Furthermore, searching an unrestricted BN structure is generally an NP-complete problem =-=[3]-=-. Another possible way is to upgrade the SNB into a mixture structure, where a hidden variable is used to coordinate its components: SNB structures. Mixture approaches have achieved great successes in... |

134 | Hidden markov model induction by bayesian model merging
- Stolcke, Omohundro
- 1993
(Show Context)
Citation Context ...ikelihood 1 0.99 0.98 0.97 0.96 0.95 0.94 10 20 30 40 50 60 70 80 90 100 1 0.95 0.9 0.85 0.8 0.75 Times (b) Vote Class 1 Class 2 0.7 0 2 4 6 8 10 12 14 16 Times (b) Segment now working on it [9] [10] =-=[32]-=- [33]. In this paper, we set the number of the component under some intuitive considerations. For the databases with more attributes and large number of training samples such as Tic-tactoe,Vote, Vehic... |

129 | Inducing probabilistic grammars by Bayesian model merging
- Stolcke, Omohundro
- 1994
(Show Context)
Citation Context ...hood 1 0.99 0.98 0.97 0.96 0.95 0.94 10 20 30 40 50 60 70 80 90 100 1 0.95 0.9 0.85 0.8 0.75 Times (b) Vote Class 1 Class 2 0.7 0 2 4 6 8 10 12 14 16 Times (b) Segment now working on it [9] [10] [32] =-=[33]-=-. In this paper, we set the number of the component under some intuitive considerations. For the databases with more attributes and large number of training samples such as Tic-tactoe,Vote, Vehicle an... |

110 | Learning with mixtures of trees
- Meila, Jordan, et al.
(Show Context)
Citation Context ...ian networks with a powerful expression ability cannot incur an increase in prediction accuracy, finding another path to upgrade the restricted Bayesian network is getting important. Meila and Jordan =-=[24] p-=-roposed a mixture of trees (MT) model to expand the Chow-Liu tree’s expression power based on EM algorithm. Their model is empirically shown to outperform other models such as C4.5sZ BN 1 BN 2 BN m-... |

109 |
Semi-naive Bayesian classifier
- Kononenko
- 1991
(Show Context)
Citation Context ...assumption by joining attributes into several combined attributes based on a conditional independency assumption among the combined attributes. Some performance improvements have been demonstrated in =-=[17] [25-=-]. Figure 2 is a graphical illustration of Semi-Naive Bayesian network. At this time, the conditional independency occurs among the “combined attributes”. However, even SNB makes the constraint of... |

108 | Learning limited dependence Bayesian classifiers
- Sahami
- 1996
(Show Context)
Citation Context ...sifiers [17] [25], Selective Naive Bayesian network classifiers [21], Recursive Bayesian classifier [19], Tree Augmented Naive Bayesian networks classifiers [11], Limited Bayesian network classifiers =-=[30]-=- and Adjusted probability Naive Bayesian networks classifiers [37]; And for inducing unrestricted Bayesian network classifiers, K2 [5] is a popular algorithm. Since our focus is in the restricted BNs,... |

69 | Searching for dependencies in bayesian classifiers
- Pazzani
- 1996
(Show Context)
Citation Context ...ption by joining attributes into several combined attributes based on a conditional independency assumption among the combined attributes. Some performance improvements have been demonstrated in [17] =-=[25]. Fi-=-gure 2 is a graphical illustration of Semi-Naive Bayesian network. At this time, the conditional independency occurs among the “combined attributes”. However, even SNB makes the constraint of NB l... |

67 |
Constructing decision trees in noisy domains
- Niblett
- 1986
(Show Context)
Citation Context ...performance of NB since it will degrade towards the NB classifier for the absence of many configurations of large attributes. To tackle the second issue, we use the popular Laplace correction methods =-=[27]-=-. The modified estimated empirical probability for P (Aj = ajk|Ci) is (nijk +f)/(ni +fnj) instead of the uncorrected one: nijk/ni, where ajk is the value of an attribute Aj, nijk is the number of time... |

59 | Improving Simple Bayes
- Kohavi, B, et al.
- 1997
(Show Context)
Citation Context ...value ajk of attribute Aj occur together, ni is the number of the observations with class label as Ci and nj is the number of values of attribute Aj. We take the same value 1/N for parameter f as [6] =-=[16], -=-N is the number of samples in training database. The correction for a large attribute is similar as the above. • Missing values is simply considered as another discrete value for the corresponding a... |

52 | Learning markov networks: maximum bounded tree-width graphs
- Srebro, Karger
- 2001
(Show Context)
Citation Context ...of sub-optimal algorithm with polynomial time cost for Bounded Semi-Naive Bayesian network. To borrow combinatorial optimization technique into the learning structure from data is reported firstly in =-=[14] a-=-nd [31]. They aimed at finding an approximation of the optimal hyper graph structure by the combinatorial technique. Their work’s contribution may be in the theoretic field not in the real applicati... |

49 | Induction of Recursive Bayesian Classifiers
- Langley
- 1993
(Show Context)
Citation Context ...es and unrestricted types. Among the restricted BN classifiers are Semi-Naive Bayesian networks classifiers [17] [25], Selective Naive Bayesian network classifiers [21], Recursive Bayesian classifier =-=[19]-=-, Tree Augmented Naive Bayesian networks classifiers [11], Limited Bayesian network classifiers [30] and Adjusted probability Naive Bayesian networks classifiers [37]; And for inducing unrestricted Ba... |

41 | Maximum Likelihood Bounded Tree-width Markov Networks
- Srebro
- 2001
(Show Context)
Citation Context ...timal algorithm with polynomial time cost for Bounded Semi-Naive Bayesian network. To borrow combinatorial optimization technique into the learning structure from data is reported firstly in [14] and =-=[31]. -=-They aimed at finding an approximation of the optimal hyper graph structure by the combinatorial technique. Their work’s contribution may be in the theoretic field not in the real application, since... |

24 | Learning the dimensionality of hidden variables
- Elidan, Friedman
- 2001
(Show Context)
Citation Context ...log likelihood 1 0.99 0.98 0.97 0.96 0.95 0.94 10 20 30 40 50 60 70 80 90 100 1 0.95 0.9 0.85 0.8 0.75 Times (b) Vote Class 1 Class 2 0.7 0 2 4 6 8 10 12 14 16 Times (b) Segment now working on it [9] =-=[10]-=- [32] [33]. In this paper, we set the number of the component under some intuitive considerations. For the databases with more attributes and large number of training samples such as Tic-tactoe,Vote, ... |

24 | Adjusted probability naive Bayesian induction
- Webb, Pazzani
- 1998
(Show Context)
Citation Context ...1], Recursive Bayesian classifier [19], Tree Augmented Naive Bayesian networks classifiers [11], Limited Bayesian network classifiers [30] and Adjusted probability Naive Bayesian networks classifiers =-=[37]-=-; And for inducing unrestricted Bayesian network classifiers, K2 [5] is a popular algorithm. Since our focus is in the restricted BNs, in the following, we will first give a short review about restric... |

21 | Learning mixtures of Bayesian networks
- Thiesson, Meek, et al.
- 1997
(Show Context)
Citation Context ...hrough the bottleneck for mixture upgrading, we then propose a Mixture model of Bounded Semi-Naive Bayesian network, which is shown to outperform NB, CLT, and SNB in our experiments.sIn fact Thiesson =-=[34]-=- et al. have proposed a mixture of general Bayesian networks. However its performance cannot be expected to be very promising since its components: unrestricted Bayesian network classifiers are not sh... |

5 |
Pattern classification and scene analysis
- In
- 1973
(Show Context)
Citation Context ...= C i . To handle this problem, many methods have been proposed. Among them are Statistical Neural Networks [23], Support Vector Machines [2] [36] and Decision trees [29]. Naive Bayesian Network (NB) =-=[8]-=- [20] shows a good performance in dealing with this problem even when compared with the state-of-the-art classifiers such as C4.5. With an independency assumption among the attributes, when given the ... |

5 | Learning maximum likelihood semi-naive bayesian network classifier
- Huang, King, et al.
- 2002
(Show Context)
Citation Context ...ese large attributes from X. 3) Goto 1, until all the attributes are covered. Approximating IP solution by LP may reduce the accuracy of the SNB while it can decrease the computational cost. Shown in =-=[13]-=- for two real world datasets experiments, the LP solution is the satisfcatory approximation on IP problem. D. When n/K is not an integer Problems may be encountered when n cannot be divided by K exact... |

4 |
Artificial Neural Networks: Concepts and Theory
- Mehra, Wah
- 1992
(Show Context)
Citation Context ...category space. The objective is to find a mapping function F : ℜ n → Ω to satisfy F (x i ) = C i . To handle this problem, many methods have been proposed. Among them are Statistical Neural Net=-=works [23]-=-, Support Vector Machines [2] [36] and Decision trees [29]. Naive Bayesian Network (NB) [8] [20] shows a good performance in dealing with this problem even when compared with the state-of-the-art clas... |

2 |
On the optimality of the simple baysian classifier under zero-one loss
- Domingos, Michael
- 1997
(Show Context)
Citation Context ...the value ajk of attribute Aj occur together, ni is the number of the observations with class label as Ci and nj is the number of values of attribute Aj. We take the same value 1/N for parameter f as =-=[6] [-=-16], N is the number of samples in training database. The correction for a large attribute is similar as the above. • Missing values is simply considered as another discrete value for the correspond... |

2 |
Discovering hidden variables:a structure-based approach
- Elidan, Lotner, et al.
- 2001
(Show Context)
Citation Context ...zed log likelihood 1 0.99 0.98 0.97 0.96 0.95 0.94 10 20 30 40 50 60 70 80 90 100 1 0.95 0.9 0.85 0.8 0.75 Times (b) Vote Class 1 Class 2 0.7 0 2 4 6 8 10 12 14 16 Times (b) Segment now working on it =-=[9]-=- [10] [32] [33]. In this paper, we set the number of the component under some intuitive considerations. For the databases with more attributes and large number of training samples such as Tic-tactoe,V... |

1 |
A tutorial on integer programming
- Trick
(Show Context)
Citation Context ... should be addressed that direct solving for IP problem is infeasible. It is reported that IP problems with as few as 40 variables can be beyond the abilities of even the most sophisticated computers =-=[35]-=-. We assume the set of all the possible large attributes {V1, V2, . . . , VK} as X. The rounding scheme is written as follows: Rounding Scheme: 1) Set the maximum xV1,V2,...,VK for the large attribute... |