Results 1  10
of
50
On the optimality of the simple Bayesian classifier under zeroone loss
 MACHINE LEARNING
, 1997
"... The simple Bayesian classifier is known to be optimal when attributes are independent given the class, but the question of whether other sufficient conditions for its optimality exist has so far not been explored. Empirical results showing that it performs surprisingly well in many domains containin ..."
Abstract

Cited by 645 (26 self)
 Add to MetaCart
The simple Bayesian classifier is known to be optimal when attributes are independent given the class, but the question of whether other sufficient conditions for its optimality exist has so far not been explored. Empirical results showing that it performs surprisingly well in many domains containing clear attribute dependences suggest that the answer to this question may be positive. This article shows that, although the Bayesian classifier’s probability estimates are only optimal under quadratic loss if the independence assumption holds, the classifier itself can be optimal under zeroone loss (misclassification rate) even when this assumption is violated by a wide margin. The region of quadraticloss optimality of the Bayesian classifier is in fact a secondorder infinitesimal fraction of the region of zeroone optimality. This implies that the Bayesian classifier has a much greater range of applicability than previously thought. For example, in this article it is shown to be optimal for learning conjunctions and disjunctions, even though they violate the independence assumption. Further, studies in artificial domains show that it will often outperform more powerful classifiers for common training set sizes and numbers of attributes, even if its bias is a priori much less appropriate to the domain. This article’s results also imply that detecting attribute dependence is not necessarily the best way to extend the Bayesian classifier, and this is also verified empirically.
Beyond Independence: Conditions for the Optimality of the Simple Bayesian Classifier
"... The simple Bayesian classifier (SBC) is commonly thought to assume that attributes are independent given the class, but this is apparently contradicted by the surprisingly good performance it exhibits in many domains that contain clear attribute dependences. No explanation for this has been proposed ..."
Abstract

Cited by 313 (8 self)
 Add to MetaCart
(Show Context)
The simple Bayesian classifier (SBC) is commonly thought to assume that attributes are independent given the class, but this is apparently contradicted by the surprisingly good performance it exhibits in many domains that contain clear attribute dependences. No explanation for this has been proposed so far. In this paper we show that the SBC does not in fact assume attribute independence, and can be optimal even when this assumption is violated by a wide margin. The key to this finding lies in the distinction between classification and probability estimation: correct classification can be achieved even when the probability estimates used contain large errors. We show that the previouslyassumed region of optimality of the SBC is a secondorder infinitesimal fraction of the actual one. This is followed by the derivation of several necessary and several sufficient conditions for the optimality of the SBC. For example, the SBC is optimal for learning arbitrary conjunctions and disjunctions, even though they violate the independence assumption. The paper also reports empirical evidence of the SBC's competitive performance in domains containing substantial degrees of attribute dependence.
Induction of Selective Bayesian Classifiers
 CONFERENCE ON UNCERTAINTY IN ARTIFICIAL INTELLIGENCE
, 1994
"... In this paper, we examine previous work on the naive Bayesian classifier and review its limitations, which include a sensitivity to correlated features. We respond to this problem by embedding the naive Bayesian induction scheme within an algorithm that carries out a greedy search through the space ..."
Abstract

Cited by 233 (7 self)
 Add to MetaCart
(Show Context)
In this paper, we examine previous work on the naive Bayesian classifier and review its limitations, which include a sensitivity to correlated features. We respond to this problem by embedding the naive Bayesian induction scheme within an algorithm that carries out a greedy search through the space of features. We hypothesize that this approach will improve asymptotic accuracy in domains that involve correlated features without reducing the rate of learning in ones that do not. We report experimental results on six natural domains, including comparisons with decisiontree induction, that support these hypotheses. In closing, we discuss other approaches to extending naive Bayesian classifiers and outline some directions for future research.
Searching for Dependencies in Bayesian Classifiers
, 1996
"... Naive Bayesian classifiers which make independence assumptions perform remarkably well on some data sets but poorly on others. We explore ways to improve the Bayesian classifier by searching for dependencies among attributes. We propose and evaluate two algorithms for detecting dependencies among at ..."
Abstract

Cited by 74 (5 self)
 Add to MetaCart
(Show Context)
Naive Bayesian classifiers which make independence assumptions perform remarkably well on some data sets but poorly on others. We explore ways to improve the Bayesian classifier by searching for dependencies among attributes. We propose and evaluate two algorithms for detecting dependencies among attributes and show that the backward sequential elimination and joining algorithm provides the most improvement over the naive Bayesian classifier. The domains on which the most improvement occurs are those domains on which the naive Bayesian classifier is significantly less accurate than a decision tree learner. This suggests that the attributes used in some common databases are not independent conditioned on the class and that the violations of the independence assumption that affect the accuracy of the classifier can be detected from training data. 23.1 Introduction The Bayesian classifier (Duda
Not so naive Bayes: Aggregating onedependence estimators
 Machine Learning
, 2005
"... Of numerous proposals to improve the accuracy of naive Bayes by weakening its attribute independence assumption, both LBR and superparent TAN have demonstrated remarkable error performance. However, both techniques obtain this outcome at a considerable computational cost. We present a new approach ..."
Abstract

Cited by 70 (9 self)
 Add to MetaCart
(Show Context)
Of numerous proposals to improve the accuracy of naive Bayes by weakening its attribute independence assumption, both LBR and superparent TAN have demonstrated remarkable error performance. However, both techniques obtain this outcome at a considerable computational cost. We present a new approach to weakening the attribute independence assumption by averaging all of a constrained class of classifiers. In extensive experiments this technique delivers comparable prediction accuracy to LBR and superparent TAN with substantially improved computational e#ciency at test time relative to the former and at training time relative to the latter. The new algorithm is shown to have low variance and is suited to incremental learning.
Improving Simple Bayes
, 1997
"... The simple Bayesian classifier (SBC), sometimes called NaiveBayes, is built based on a conditional independence model of each attribute given the class. The model was previously shown to be surprisingly robust to obvious violations of this independence assumption, yielding accurate classificat ..."
Abstract

Cited by 66 (1 self)
 Add to MetaCart
The simple Bayesian classifier (SBC), sometimes called NaiveBayes, is built based on a conditional independence model of each attribute given the class. The model was previously shown to be surprisingly robust to obvious violations of this independence assumption, yielding accurate classification models even when there are clear conditional dependencies. We examine different approaches for handling unknowns and zero counts when estimating probabilities. Large scale experiments on 37 datasets were conducted to determine the effects of these approaches and several interesting insights are given, including a new variant of the Laplace estimator that outperforms other methods for dealing with zero counts. Using the biasvariance decomposition [15, 10], we show that while the SBC has performed well on common benchmark datasets, its accuracy will not scale up as the dataset sizes grow. Even with these limitations in mind, the SBC can serve as an excellenttool for initial exp...
Efficient Learning of Selective Bayesian Network Classifiers
, 1995
"... In this paper, we present a computationally efficient method for inducing selective Bayesian network classifiers. Our approach is to use informationtheoretic metrics to efficiently select a subset of attributes from which to learn the classifier. We explore three conditional, informationtheoretic ..."
Abstract

Cited by 52 (4 self)
 Add to MetaCart
In this paper, we present a computationally efficient method for inducing selective Bayesian network classifiers. Our approach is to use informationtheoretic metrics to efficiently select a subset of attributes from which to learn the classifier. We explore three conditional, informationtheoretic metrics that are extensions of metrics used extensively in decision tree learning, namely Quinlan's gain and gain ratio metrics and Mantaras's distance metric. We experimentally show that the algorithms based on gain ratio and distance metric learn selective Bayesian networks that have predictive accuracies as good as or better than those learned by existing selective Bayesian network induction approaches (K2AS), but at a significantly lower computational cost. We prove that the subsetselection phase of these informationbased algorithms has polynomial complexity as compared to the worstcase exponential time complexity of the corresponding phase in K2AS. We also compare the performance o...
Local Cascade Generalization
, 1998
"... In a previous work we have presented Cascade Generalization, a new general method for merging classifiers. The basic idea of Cascade Generalization is to sequentially run the set of classifiers, at each step performing an extension of the original data by the insertion of new attributes. The new att ..."
Abstract

Cited by 44 (1 self)
 Add to MetaCart
In a previous work we have presented Cascade Generalization, a new general method for merging classifiers. The basic idea of Cascade Generalization is to sequentially run the set of classifiers, at each step performing an extension of the original data by the insertion of new attributes. The new attributes are derived from the probability class distribution given by a base classifier. This constructive step extends the representational language for the high level classifiers, relaxing their bias. In this paper we extend this work by applying Cascade locally. At each iteration of a divide and conquer algorithm, a reconstruction of the instance space occurs by the addition of new attributes. Each new attribute represents the probability that an example belongs to a class given by a base classifier. We have implemented three Local Generalization Algorithms. The first merges a linear discriminant with a decision tree, the second merges a naive Bayes with a decision tree, and the third mer...
Machine learning for medical diagnosis: history, state of the art and perspective
 Artificial Intelligence in Medicine
, 2001
"... The paper provides an overview of the development of intelligent data analysis in medicine from a machine learning perspective: a historical view, a state of the art view and a view on some future trends in this subfield of applied artificial intelligence. The paper is not intended to provide a com ..."
Abstract

Cited by 43 (1 self)
 Add to MetaCart
The paper provides an overview of the development of intelligent data analysis in medicine from a machine learning perspective: a historical view, a state of the art view and a view on some future trends in this subfield of applied artificial intelligence. The paper is not intended to provide a comprehensive overview but rather describes some subeareas and directions which from my personal point of view seem to be important for applying machine learning in medical diagnosis. In the historical overview I emphasize the naive Bayesian classifier, neural networks and decision trees. I present a comparison of some state of the art systems, representatives from each branch of machine learning, when applied to several medical diagnostic tasks. The future trends are illustrated by two case studies. The first describes a recently developed method for dealing with reliability of decisions of classifiers, which seems to be promising for intelligent data analysis in medicine. The second describes an approach to using machine learning in order to verify some unexplained phenomena from complementary medicine, which is not (yet) approved by the orthodox medical community but could in the future play an important role in overall medical diagnosis and treatment. 1
Lazy Learning of Bayesian Rules
 Machine Learning
, 2000
"... The naive Bayesian classifier provides a simple and e#ective approach to classifier learning, but its attribute independence assumption is often violated in the real world. A number of approaches have sought to alleviate this problem. A Bayesian tree learning algorithm builds a decision tree, and ge ..."
Abstract

Cited by 43 (8 self)
 Add to MetaCart
The naive Bayesian classifier provides a simple and e#ective approach to classifier learning, but its attribute independence assumption is often violated in the real world. A number of approaches have sought to alleviate this problem. A Bayesian tree learning algorithm builds a decision tree, and generates a local naive Bayesian classifier at each leaf. The tests leading to a leaf can alleviate attribute interdependencies for the local naive Bayesian classifier. However, Bayesian tree learning still su#ers from the small disjunct problem of tree learning. While inferred Bayesian trees demonstrate low average prediction error rates, there is reason to believe that error rates will be higher for those leaves with few training examples. This paper proposes the application of lazy learning techniques to Bayesian tree induction and presents the resulting lazy Bayesian rule learning algorithm, called Lbr. This algorithm can be justified by a variant of Bayes theorem which supports a weaker conditional attribute independence assumption than is required by naive Bayes. For each test example, it builds a most appropriate rule with a local naive Bayesian classifier as its consequent. It is demonstrated that the computational requirements of Lbr are reasonable in a wide crosssection of natural domains. Experiments with these domains show that, on average, this new algorithm obtains lower error rates significantly more often than the reverse in comparison to a naive Bayesian classifier, C4.5, a Bayesian tree learning algorithm, a constructive Bayesian classifier that eliminates attributes and constructs new attributes using Cartesian products of existing nominal attributes, and a lazy decision tree learning algorithm. It also outperforms, although the result is not statisticall...