Results 1 - 10
of
11
A Comparative Study of Discretization Methods for Naive-Bayes Classifiers
- In Proceedings of PKAW 2002: The 2002 Pacific Rim Knowledge Acquisition Workshop
, 2002
"... Discretization is a popular approach to handling numeric attributes in machine learning. We argue that the requirements for effective discretization differ between naive-Bayes learning and many other learning algorithms. We evaluate the effectiveness with naive-Bayes classifiers of nine discretizati ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
Discretization is a popular approach to handling numeric attributes in machine learning. We argue that the requirements for effective discretization differ between naive-Bayes learning and many other learning algorithms. We evaluate the effectiveness with naive-Bayes classifiers of nine discretization methods, equal width discretization (EWD), equal frequency discretization (EFD), fuzzy discretization (FD), entropy minimization discretization (EMD), iterative discretization (ID), proportional k-interval discretization (PKID), lazy discretization (LD), nondisjoint discretization (NDD) and weighted proportional k-interval discretization (WPKID). It is found that in general naive-Bayes classifiers trained on data preprocessed by LD, NDD or WPKID achieve lower classification error than those trained on data preprocessed by the other discretization methods. But LD can not scale to large data. This study leads to a new discretization method, weighted non-disjoint discretization (WNDD) that combines WPKID and NDD's advantages. Our experiments show that among all the rival discretization methods, WNDD best helps naive-Bayes classifiers reduce average classification error.
Proportional k-interval discretization for naive-Bayes classifiers
- Proc. of the Twelfth European Conf. on Machine Learning
, 2001
"... Abstract. This paper argues that two commonly-used discretization approaches, fixed k-interval discretization and entropy-based discretization have sub-optimal characteristics for naive-Bayes classification. This analysis leads to a new discretization method, Proportional k-Interval Discretization ( ..."
Abstract
-
Cited by 12 (5 self)
- Add to MetaCart
Abstract. This paper argues that two commonly-used discretization approaches, fixed k-interval discretization and entropy-based discretization have sub-optimal characteristics for naive-Bayes classification. This analysis leads to a new discretization method, Proportional k-Interval Discretization (PKID), which adjusts the number and size of discretized intervals to the number of training instances, thus seeks an appropriate trade-off between the bias and variance of the probability estimation for naive-Bayes classifiers. We justify PKID in theory, as well as test it on a wide cross-section of datasets. Our experimental results suggest that in comparison to its alternatives, PKID provides naive-Bayes classifiers competitive classification performance for smaller datasets and better classification performance for larger datasets. 1
Segmented regression estimators for massive data sets
- In Second SIAM International Conference on Data Mining
, 2002
"... We describe two methodologies for obtaining segmented regression estimators from massive training data sets. The first methodology, called Linear Regression Tree (LRT), is used for continuous response variables, and the second and complementary methodology, called Naive Bayes Tree (NBT), is used for ..."
Abstract
-
Cited by 9 (6 self)
- Add to MetaCart
We describe two methodologies for obtaining segmented regression estimators from massive training data sets. The first methodology, called Linear Regression Tree (LRT), is used for continuous response variables, and the second and complementary methodology, called Naive Bayes Tree (NBT), is used for categorical response variables. These are implemented in the IBM ProbE TM (Probabilistic Estimation) data mining engine, which is an object-oriented framework for building classes of segmented predictive models from massive training data sets. Based on this methodology, an application called ATM-SE TM for direct-mail targeted marketing has been developed jointly with Fingerhut Business Intelligence [1]).
Discretization for naive-Bayes learning: managing discretization bias and variance
, 2003
"... Quantitative attributes are usually discretized in naive-Bayes learning. We prove a theorem that explains why discretization can be effective for naive-Bayes learning. The use of different discretization techniques can be expected to affect the classification bias and variance of generated naive-Bay ..."
Abstract
-
Cited by 9 (5 self)
- Add to MetaCart
Quantitative attributes are usually discretized in naive-Bayes learning. We prove a theorem that explains why discretization can be effective for naive-Bayes learning. The use of different discretization techniques can be expected to affect the classification bias and variance of generated naive-Bayes classifiers, effects we name discretization bias and variance. We argue that by properly managing discretization bias and variance, we can effectively reduce naive-Bayes classification error. In particular, we propose proportional k-interval discretization and equal size discretization, two efficient heuristic discretization methods that are able to effectively manage discretization bias and variance by tuning discretized interval size and interval number. We empirically evaluate our new techniques against five key discretization methods for naive-Bayes classifiers. The experimental results support our theoretical arguments by showing that naive-Bayes classifiers trained on data discretized by our new methods are able to achieve lower classification error than those trained on data discretized by alternative discretization methods.
On Why Discretization Works for Naive-Bayes Classifiers
- In Proceedings of the 16th Australian Joint Conference on Artificial Intelligence (AI
, 2003
"... We investigate why discretization is effective in naive-Bayes learning. We prove a theorem that identifies particular conditions under which discretization will result in naive-Bayes classifiers delivering the same probability estimates as would be obtained if the correct probability density functio ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
We investigate why discretization is effective in naive-Bayes learning. We prove a theorem that identifies particular conditions under which discretization will result in naive-Bayes classifiers delivering the same probability estimates as would be obtained if the correct probability density functions were employed.
Weighted Proportional k-Interval Discretization for Naive-Bayes Classifiers
- in: Proc. of the PAKDD
, 2003
"... Abstract. The use of different discretization techniques can be expected to affect the classification bias and variance of naive-Bayes classifiers. We call such an effect discretization bias and variance. Proportional kinterval discretization (PKID) tunes discretization bias and variance by adjustin ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
Abstract. The use of different discretization techniques can be expected to affect the classification bias and variance of naive-Bayes classifiers. We call such an effect discretization bias and variance. Proportional kinterval discretization (PKID) tunes discretization bias and variance by adjusting discretized interval size and number proportional to the number of training instances. Theoretical analysis suggests that this is desirable for naive-Bayes classifiers. However PKID is sub-optimal when learning from training data of small size. We argue that this is because PKID equally weighs bias reduction and variance reduction. But for small data, variance reduction can contribute more to lower learning error and thus should be given greater weight than bias reduction. Accordingly we propose weighted proportional k-interval discretization (WPKID), which establishes a more suitable bias and variance trade-off for small data while allowing additional training data to be used to reduce both bias and variance. Our experiments demonstrate that for naive-Bayes classifiers, WPKID improves upon PKID for smaller datasets 1 with significant frequency; and WPKID delivers lower classification error significantly more often than not in comparison to three other leading alternative discretization techniques studied. 1
Using Simulated Pseudo Data To Speed Up Statistical Predictive Modeling, to appear
- in Proceedings of the First SIAM International Conference on Data Mining, SIAM Philadelphia
, 2001
"... Predictive modeling techniques are now being used in application domains where the training data sets are potentially enormous. For example, certain marketing databases that we have encountered contain millions of customer records with thousands of attributes per record. The development of statistic ..."
Abstract
-
Cited by 5 (5 self)
- Add to MetaCart
Predictive modeling techniques are now being used in application domains where the training data sets are potentially enormous. For example, certain marketing databases that we have encountered contain millions of customer records with thousands of attributes per record. The development of statistical modeling algorithms
Non-disjoint discretization for naive-Bayes classifiers
- Proc. Nineteenth International Conference on Machine Learning
, 2002
"... Previous discretization techniques have discretized numeric attributes into disjoint intervals. We argue that this is neither necessary nor appropriate for naive-Bayes classifiers. The analysis leads to a new discretization method, Non-Disjoint Discretization (NDD). NDD forms overlapping intervals f ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Previous discretization techniques have discretized numeric attributes into disjoint intervals. We argue that this is neither necessary nor appropriate for naive-Bayes classifiers. The analysis leads to a new discretization method, Non-Disjoint Discretization (NDD). NDD forms overlapping intervals for a numeric attribute, always locating a value toward the middle of an interval to obtain more reliable probability estimation. It also adjusts the number and size of discretized intervals to the number of training instances, seeking an appropriate trade-off between bias and variance of probability estimation. We justify NDD in theory and test it on a wide cross-section of datasets. Our experimental results suggest that for naive-Bayes classifiers, NDD works better than alternative discretization approaches. 1.
Multi-View 3-D Object Description with Uncertain Reasoning and Machine Learning
, 2001
"... xi Chapter 1. ..."
CloNI: Clustering of √N-Interval discretization
"... It is known that naive Bayesian classifier typically works well on discrete data. All continuous attributes then need to be discretized beforehand for such applications. Inappropriate range of discretization intervals may result in degrading its performance. In this paper, we review previous work on ..."
Abstract
- Add to MetaCart
It is known that naive Bayesian classifier typically works well on discrete data. All continuous attributes then need to be discretized beforehand for such applications. Inappropriate range of discretization intervals may result in degrading its performance. In this paper, we review previous work on continuous feature discretization and conduct an empirical evaluation of an improved method called Clustering of √N-Interval Discretization (CloNI). CloNI tries to reduce the number of √N intervals in the datasets by iteratively combining two consecutive intervals together, according to their median distance until a stopping criteria is met. We also show that even though C4.5 decision trees can handle continuous features, we can significantly improve its performance in some domains if those features were discretized in advance. In our empirical results, using discretized instead of continuous features in C4.5 never significantly degrades its accuracy. Our results indicate that CloNI reliably performs as well as or better than the Proportional k-interval Discretization (PKID) on all domains, and gives a competitive classification performance for both smaller and larger dataset.

