Results 1  10
of
12
On the optimality of the simple Bayesian classifier under zeroone loss
 MACHINE LEARNING
, 1997
"... The simple Bayesian classifier is known to be optimal when attributes are independent given the class, but the question of whether other sufficient conditions for its optimality exist has so far not been explored. Empirical results showing that it performs surprisingly well in many domains containin ..."
Abstract

Cited by 805 (27 self)
 Add to MetaCart
The simple Bayesian classifier is known to be optimal when attributes are independent given the class, but the question of whether other sufficient conditions for its optimality exist has so far not been explored. Empirical results showing that it performs surprisingly well in many domains containing clear attribute dependences suggest that the answer to this question may be positive. This article shows that, although the Bayesian classifier’s probability estimates are only optimal under quadratic loss if the independence assumption holds, the classifier itself can be optimal under zeroone loss (misclassification rate) even when this assumption is violated by a wide margin. The region of quadraticloss optimality of the Bayesian classifier is in fact a secondorder infinitesimal fraction of the region of zeroone optimality. This implies that the Bayesian classifier has a much greater range of applicability than previously thought. For example, in this article it is shown to be optimal for learning conjunctions and disjunctions, even though they violate the independence assumption. Further, studies in artificial domains show that it will often outperform more powerful classifiers for common training set sizes and numbers of attributes, even if its bias is a priori much less appropriate to the domain. This article’s results also imply that detecting attribute dependence is not necessarily the best way to extend the Bayesian classifier, and this is also verified empirically.
Lazy Learning of Bayesian Rules
 Machine Learning
, 2000
"... The naive Bayesian classifier provides a simple and e#ective approach to classifier learning, but its attribute independence assumption is often violated in the real world. A number of approaches have sought to alleviate this problem. A Bayesian tree learning algorithm builds a decision tree, and ge ..."
Abstract

Cited by 55 (10 self)
 Add to MetaCart
The naive Bayesian classifier provides a simple and e#ective approach to classifier learning, but its attribute independence assumption is often violated in the real world. A number of approaches have sought to alleviate this problem. A Bayesian tree learning algorithm builds a decision tree, and generates a local naive Bayesian classifier at each leaf. The tests leading to a leaf can alleviate attribute interdependencies for the local naive Bayesian classifier. However, Bayesian tree learning still su#ers from the small disjunct problem of tree learning. While inferred Bayesian trees demonstrate low average prediction error rates, there is reason to believe that error rates will be higher for those leaves with few training examples. This paper proposes the application of lazy learning techniques to Bayesian tree induction and presents the resulting lazy Bayesian rule learning algorithm, called Lbr. This algorithm can be justified by a variant of Bayes theorem which supports a weaker conditional attribute independence assumption than is required by naive Bayes. For each test example, it builds a most appropriate rule with a local naive Bayesian classifier as its consequent. It is demonstrated that the computational requirements of Lbr are reasonable in a wide crosssection of natural domains. Experiments with these domains show that, on average, this new algorithm obtains lower error rates significantly more often than the reverse in comparison to a naive Bayesian classifier, C4.5, a Bayesian tree learning algorithm, a constructive Bayesian classifier that eliminates attributes and constructs new attributes using Cartesian products of existing nominal attributes, and a lazy decision tree learning algorithm. It also outperforms, although the result is not statisticall...
Feature selection for the naive bayesian classifier using decision trees
 Applied Artificial Intelligence
"... It is known that Naive Bayesian classifier (NB) works very well on some domains, and poorly on others. The performance of NB suffers in domains that involve correlated features. C4.5 decision trees, on the other hand, typically perform better than the Naive Bayesian algorithm on such domains. This p ..."
Abstract

Cited by 12 (0 self)
 Add to MetaCart
It is known that Naive Bayesian classifier (NB) works very well on some domains, and poorly on others. The performance of NB suffers in domains that involve correlated features. C4.5 decision trees, on the other hand, typically perform better than the Naive Bayesian algorithm on such domains. This paper describes a Selective Bayesian classifier (SBC) that simply uses only those features that C4.5 would use in its decision tree when learning a small example of a training set, a combination of the two different natures of classifiers. Experiments conducted on ten data sets indicate that SBC performs markedly better than NB on all domains, and SBC outperforms C4.5 on many data sets of which C4.5 outperform NB. Augmented Bayesian classifier (ABC) is also tested on the same data, and SBC appears to perform as well as ABC. SBC also can eliminate, in most cases, more than half of the original attributes, which can greatly reduce the size of the training and test data as well as the running time. Further, the SBC algorithm typically learns faster than both C4.5 and NB, needing fewer training examples to reach a high accuracy of classifications.
Scaling up the Naive Bayesian Classifier: Using Decision Trees for Feature Selection
"... It is known that Naive Bayesian classifier (NB) works very well on some domains, and poorly on some. The performance of NB suffers in domains that involve correlated features. C4.5 decision trees, on the other hand, typically perform better than the Naive Bayesian algorithm on such domains. Thi ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
It is known that Naive Bayesian classifier (NB) works very well on some domains, and poorly on some. The performance of NB suffers in domains that involve correlated features. C4.5 decision trees, on the other hand, typically perform better than the Naive Bayesian algorithm on such domains. This paper describes a Selective Bayesian classifier (SBC) that simply uses only those features that C4.5 would use in its decision tree when learning a small example of a training set, a combination of the two different natures of classifiers. Experiments conducted on ten datasets indicate that SBC performs reliably better than NB on all domains, and SBC outperforms CA.5 on many datasets of which C4.5 outperform NB. Augmented Bayesian classifier (ABC) are also tested on the same data, and SBC appears to perform as well as ABC. SBC also can eliminate, on most cases, more than half of the original attributes, which can greatly reduce the size of the training and test data, as well as the running time. Further, the SBC algorithm typically learns faster than both C4.5 and NB, needing fewer training examples to reach high accuracy of classification.
Dimensionality Reduction and Representation for Nearest Neighbour Learning
, 1999
"... This thesis has been composed by myself, it has not been accepted in any previous application for a degree, the work of which it is a record has been done by myself and all quotations have been distinguished by quotation marks and the sources of information have specically acknowledged. ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
This thesis has been composed by myself, it has not been accepted in any previous application for a degree, the work of which it is a record has been done by myself and all quotations have been distinguished by quotation marks and the sources of information have specically acknowledged.
Can kNN Imputation Improve the Performance of C4.5 With Small Software Project Data Sets? A Comparative Evaluation
, 2008
"... Missing data is a widespread problem that can affect the ability to use data to construct effective prediction systems. We investigate a common machine learning technique that can tolerate missing values, namely C4.5, to predict cost using six real world software project databases. We analyze the pr ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
Missing data is a widespread problem that can affect the ability to use data to construct effective prediction systems. We investigate a common machine learning technique that can tolerate missing values, namely C4.5, to predict cost using six real world software project databases. We analyze the predictive performance after using the kNN missing data imputation technique to see if it is better to tolerate missing data or to try to impute missing values and then apply the C4.5 algorithm. For the investigation, we simulated 3 missingness mechanisms, 3 missing data patterns, and 5 missing data percentages. We found that the kNN imputation can improve the prediction accuracy of C4.5. At the same time, both C4.5 and kNN are little affected by the missingness mechanism, but that the missing data pattern and the missing data percentage have a strong negative impact upon prediction (or imputation) accuracy particularly if the missing data percentage exceeds
Dimensionality Reduction through Correspondence Analysis
, 1999
"... Many learning algorithms make an implicit assumption that all the attributes of the presented data are relevant to a learning task. However, several studies on attribute selection have demonstrated that this assumption rarely holds. In addition, for many supervised learning algorithms such as neares ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Many learning algorithms make an implicit assumption that all the attributes of the presented data are relevant to a learning task. However, several studies on attribute selection have demonstrated that this assumption rarely holds. In addition, for many supervised learning algorithms such as nearest neighbour algorithms, the inclusion of irrelevant attributes can result in a degradation in the classification accuracy of the learning algorithm. Whilst a number of different methods for attribute selection exist, many of these are only appropriate for datasets which contain a small number of attributes (e.g. < 20). This paper presents an alternative approach to attribute selection, which can be applied to datasets with a greater number of attributes. We present an evaluation of the approach which contrasts its performance with one other attribute selection technique.
Performance Analysis of Various . . .
, 2011
"... Data warehouse is the essential point of data combination for business intelligence. Now days, there has been emerging trends in database to discover useful patterns and/or correlations among attributes, called data mining. This paper presents the data mining techniques like Classification, Clusteri ..."
Abstract
 Add to MetaCart
Data warehouse is the essential point of data combination for business intelligence. Now days, there has been emerging trends in database to discover useful patterns and/or correlations among attributes, called data mining. This paper presents the data mining techniques like Classification, Clustering and Associations Analysis which include
unknown title
"... io un g W om d, X Missing data ad p ate st u e ke m C4.5 and kNN are little affected by the missingness mechanism, but that the missing data pattern and the lem th edictio iques niques has been published (e.g. Little and Rubin, 1989; ..."
Abstract
 Add to MetaCart
io un g W om d, X Missing data ad p ate st u e ke m C4.5 and kNN are little affected by the missingness mechanism, but that the missing data pattern and the lem th edictio iques niques has been published (e.g. Little and Rubin, 1989;