Results 1 -
8 of
8
Beyond Independence: Conditions for the Optimality of the Simple Bayesian Classifier
"... The simple Bayesian classifier (SBC) is commonly thought to assume that attributes are independent given the class, but this is apparently contradicted by the surprisingly good performance it exhibits in many domains that contain clear attribute dependences. No explanation for this has been proposed ..."
Abstract
-
Cited by 253 (8 self)
- Add to MetaCart
The simple Bayesian classifier (SBC) is commonly thought to assume that attributes are independent given the class, but this is apparently contradicted by the surprisingly good performance it exhibits in many domains that contain clear attribute dependences. No explanation for this has been proposed so far. In this paper we show that the SBC does not in fact assume attribute independence, and can be optimal even when this assumption is violated by a wide margin. The key to this finding lies in the distinction between classification and probability estimation: correct classification can be achieved even when the probability estimates used contain large errors. We show that the previously-assumed region of optimality of the SBC is a second-order infinitesimal fraction of the actual one. This is followed by the derivation of several necessary and several sufficient conditions for the optimality of the SBC. For example, the SBC is optimal for learning arbitrary conjunctions and disjunctions, even though they violate the independence assumption. The paper also reports empirical evidence of the SBC's competitive performance in domains containing substantial degrees of attribute dependence.
Estimating Continuous Distributions in Bayesian Classifiers
- In Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence
, 1995
"... When modeling a probability distribution with a Bayesian network, we are faced with the problem of how to handle continuous variables. Most previous work has either solved the problem by discretizing, or assumed that the data are generated by a single Gaussian. In this paper we abandon the normality ..."
Abstract
-
Cited by 243 (2 self)
- Add to MetaCart
When modeling a probability distribution with a Bayesian network, we are faced with the problem of how to handle continuous variables. Most previous work has either solved the problem by discretizing, or assumed that the data are generated by a single Gaussian. In this paper we abandon the normality assumption and instead use statistical methods for nonparametric density estimation. For a naive Bayesian classifier, we present experimental results on a variety of natural and artificial domains, comparing two methods of density estimation: assuming normality and modeling each conditional distribution with a single Gaussian; and using nonparametric kernel density estimation. We observe large reductions in error on several natural and artificial data sets, which suggests that kernel estimation is a useful tool for learning Bayesian models. In Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, Morgan Kaufmann Publishers, San Mateo, 1995 1 Introduction In rec...
Scaling Up the Accuracy of Naive-Bayes Classifiers: a Decision-Tree Hybrid
- PROCEEDINGS OF THE SECOND INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING
, 1996
"... Naive-Bayes induction algorithms were previously shown to be surprisingly accurate on many classification tasks even when the conditional independence assumption on which they are based is violated. However, most studies were done on small databases. We show that in some larger databases, the accura ..."
Abstract
-
Cited by 140 (4 self)
- Add to MetaCart
Naive-Bayes induction algorithms were previously shown to be surprisingly accurate on many classification tasks even when the conditional independence assumption on which they are based is violated. However, most studies were done on small databases. We show that in some larger databases, the accuracy of Naive-Bayes does not scale up as well as decision trees. We then propose a new algorithm, NBTree, which induces a hybrid of decision-tree classifiers and Naive-Bayes classifiers: the decision-tree nodes contain univariate splits as regular decision-trees, but the leaves contain Naive-Bayesian classifiers. The approach retains the interpretability of Naive-Bayes and decision trees, while resulting in classifiers that frequently outperform both constituents, especially in the larger databases tested.
Feature selection for the naive bayesian classifier using decision trees
- Applied Artificial Intelligence
"... It is known that Naive Bayesian classifier (NB) works very well on some domains, and poorly on others. The performance of NB suffers in domains that involve correlated features. C4.5 decision trees, on the other hand, typically perform better than the Naive Bayesian algorithm on such domains. This p ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
It is known that Naive Bayesian classifier (NB) works very well on some domains, and poorly on others. The performance of NB suffers in domains that involve correlated features. C4.5 decision trees, on the other hand, typically perform better than the Naive Bayesian algorithm on such domains. This paper describes a Selective Bayesian classifier (SBC) that simply uses only those features that C4.5 would use in its decision tree when learning a small example of a training set, a combination of the two different natures of classifiers. Experiments conducted on ten data sets indicate that SBC performs markedly better than NB on all domains, and SBC outperforms C4.5 on many data sets of which C4.5 outperform NB. Augmented Bayesian classifier (ABC) is also tested on the same data, and SBC appears to perform as well as ABC. SBC also can eliminate, in most cases, more than half of the original attributes, which can greatly reduce the size of the training and test data as well as the running time. Further, the SBC algorithm typically learns faster than both C4.5 and NB, needing fewer training examples to reach a high accuracy of classifications.
Learning Bayesian Networks for Solving Real-World Problems
, 1998
"... Bayesian networks, which provide a compact graphical way to express complex probabilistic relationships among several random variables, are rapidly becoming the tool of choice for dealing with uncertainty in knowledge based systems. However, approaches based on Bayesian networks have often been dism ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Bayesian networks, which provide a compact graphical way to express complex probabilistic relationships among several random variables, are rapidly becoming the tool of choice for dealing with uncertainty in knowledge based systems. However, approaches based on Bayesian networks have often been dismissed as unfit for many real-world applications since probabilistic inference is intractable for most problems of realistic size, and algorithms for learning Bayesian networks impose the unrealistic requirement of datasets being complete. In this thesis, I present practical solutions to these two problems, and demonstrate their effectiveness on several real-world problems. The solution proposed to the first problem is to learn selective Bayesian networks, i.e., ones that use only a subset of the given attributes to model a domain. The aim is to learn networks that are smaller, and henc...
Induction of Selective Bayesian Network Classifiers
, 1996
"... We present an algorithm for inducing Bayesian networks using feature selection. The algorithm selects a subset of attributes that maximizes predictive accuracy prior to the network learning phase, thereby incorporating a bias for small networks that retain high predictive accuracy. We compare the be ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
We present an algorithm for inducing Bayesian networks using feature selection. The algorithm selects a subset of attributes that maximizes predictive accuracy prior to the network learning phase, thereby incorporating a bias for small networks that retain high predictive accuracy. We compare the behavior of this selective Bayesian network classifier with that of (a) Bayesian network classifiers that incorporate all attributes, (b) selective and non-selective naive Bayesian classifiers, and (c) the decision-tree algorithm C4.5. With respect to (a), we show that our approach generates networks that are computationally simpler to evaluate but display comparable predictive accuracy. With respect to (b), we show that the selective Bayesian network classifier performs significantly better than both versions of the naive Bayesian classifier on almost all databases studied, and hence is an enhancement of the naive method. With respect to (c), we show that the selective Bayesian network class...
Machine Learning: An Annotated Bibliography for the 1995 AI & . . .
, 1995
"... This is a brief annotated bibliography that I wanted to make available to the attendees of my Machine Learning tutorial at the 1995 AI & Statistics Workshop. These slides ..."
Abstract
- Add to MetaCart
This is a brief annotated bibliography that I wanted to make available to the attendees of my Machine Learning tutorial at the 1995 AI & Statistics Workshop. These slides
Wrapper Approach for Feature Selections in RBF Network Classifier
"... In this paper we investigate the impact of wrapper approach on classification accuracy and performance of RBF network. Wrapper approach used six rule induction algorithms for evaluators on supervised learning algorithms RBF network and tested using eight real and three artificial benchmark data sets ..."
Abstract
- Add to MetaCart
In this paper we investigate the impact of wrapper approach on classification accuracy and performance of RBF network. Wrapper approach used six rule induction algorithms for evaluators on supervised learning algorithms RBF network and tested using eight real and three artificial benchmark data sets. Classification accuracy and performance of RBF network depends on evaluators. Our experimental results indicate that every rule induction algorithms in wrapper approach maintains or improves the accuracy of RBF network for more than half data sets. Evaluation of selecting features with wrappers approach is not so fast compare with filters approach.

