Results 1  10
of
12
Supervised and unsupervised discretization of continuous features
 in A. Prieditis & S. Russell, eds, Machine Learning: Proceedings of the Twelfth International Conference
, 1995
"... Many supervised machine learning algorithms require a discrete feature space. In this paper, we review previous work on continuous feature discretization, identify de ning characteristics of the methods, and conduct an empirical evaluation of several methods. We compare binning, an unsupervised dis ..."
Abstract

Cited by 534 (11 self)
 Add to MetaCart
Many supervised machine learning algorithms require a discrete feature space. In this paper, we review previous work on continuous feature discretization, identify de ning characteristics of the methods, and conduct an empirical evaluation of several methods. We compare binning, an unsupervised discretization method, to entropybased and puritybased methods, which are supervised algorithms. We found that the performance of the NaiveBayes algorithm signi cantly improved when features were discretized using an entropybased method. In fact, over the 16 tested datasets, the discretized version of NaiveBayes slightly outperformed C4.5 on average. We also show that in some cases, the performance of the C4.5 induction algorithm signi cantly improved if features were discretized in advance � in our experiments, the performance never signi cantly degraded, an interesting phenomenon considering the fact that C4.5 is capable of locally discretizing features. 1
Global discretization of continuous attributes as preprocessing for machine learning
 International Journal of Approximate Reasoning
, 1996
"... Abstract. Reallife data usually are presented in databases by real numbers. On the other hand, most inductive learning methods require small number of attribute values. Thus it is necessary to convert input data sets with continuous attributes into input data sets with discrete attributes. Methods ..."
Abstract

Cited by 60 (4 self)
 Add to MetaCart
(Show Context)
Abstract. Reallife data usually are presented in databases by real numbers. On the other hand, most inductive learning methods require small number of attribute values. Thus it is necessary to convert input data sets with continuous attributes into input data sets with discrete attributes. Methods of discretization restricted to single continuous attributes will be called local, while methods that simultaneously convert all continuous attributes will be called global. In this paper, a method of transforming any local discretization method into a global one is presented. A global discretization method, based on cluster analysis, is presented and compared experimentally with three known local methods, transformed into global. Experiments include tenfold cross validation and leavingoneout methods for ten reallife data sets.
CAIM Discretization Algorithm
, 2003
"... The task of extracting knowledge from databases is quite often performed by machine leaming algorithms. The majority of these algorithms can be applied only to data described by discrete numerical or nominal attributes (features). In the case of continuous attributes, there is a need for a discre ..."
Abstract

Cited by 41 (2 self)
 Add to MetaCart
(Show Context)
The task of extracting knowledge from databases is quite often performed by machine leaming algorithms. The majority of these algorithms can be applied only to data described by discrete numerical or nominal attributes (features). In the case of continuous attributes, there is a need for a discretization algorithm that transforms continuous attributes into discrete ones. This paper describes such an algorithm, called CAIM (classattribute interdependence maximization), which is designed to work with supervised data. The goal of the CAlM algorithm is to maximize the classattribute interdependence and to generate a (possibly) minimal number of discrete intervals. The algorithm does not require the user to predefine the number of intervals, as opposed to some other discretization algorithms. The tests performed using CALM, and six other stateoftheart discretization algorithms, show that discrete attributes generated by the CAlM algorithm almost always have the lowest number of intervals and the highest classattribute interdependency. Two machine learning algorithms, the CLIP4 rule algorithm and the decision tree algorithm, are used to generate classification rules from data discretized by CALM. For both the CLIP4 and decision tree algorithms, the accuracy of the generated rules is higher and the number of the rules is lower for data discretized using the CAlM algorithm when compared to data discretized using six other discretization algorithms. The highest classification accuracy was always achieved for datasets discretized using the CAlM algorithm, as compared with the other six algorithms. Of four supervised algorithms used for comparison, the CAlM algorithm is comparable in speed to the two fastest.
Quantization Of Real Value Attributes  Rough Set and Boolean Reasoning Approach
 Proc. of the Second Joint Annual Conference on Information Sciences, Wrightsville Beach, North Carolina, Sept 28  Oct 1
, 1995
"... s. The quantization of real value attributes is one of the main problem to be solved in synthesis of decision rules from data tables with real value attributes. We present an approach to this problem based on rough set methods and Boolean reasoning. The main result states that the problem of optimal ..."
Abstract

Cited by 39 (4 self)
 Add to MetaCart
s. The quantization of real value attributes is one of the main problem to be solved in synthesis of decision rules from data tables with real value attributes. We present an approach to this problem based on rough set methods and Boolean reasoning. The main result states that the problem of optimal quantization of real value attributes is polynomially reducible to the problem of minimal reduct finding, so it is NPhard. We construct efficient heuristics for finding suboptimal quantization of real value attributes. 1 INTRODUCTION A great effort has been made (see e.g. [5], [7], [9], [17], [18]) to find effective methods for real value attributes quantization (discretization). Our approach is based on the rough set methods and Boolean reasoning. We discuss the computational complexity of the quantization problems and we show that they can be solved by Boolean reasoning [1]. We prove that the main quantization problems are either NP complete or NP  hard. We show that the problem of o...
Discretization Methods with Backtracking
 Proceedings of 5th European Congress on Intelligent Techniques and Soft Computing[C
, 1997
"... Discretization is indispensable in preprocessing of data analysis. Any discretization process is defined by a set of cuts [8, 16, 3, 1, 6, 4, 18, 11] over domains of attributes. Almost all existing methods do not discern between equivalent cuts i.e. cuts discerning the same pairs of objects with dif ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
Discretization is indispensable in preprocessing of data analysis. Any discretization process is defined by a set of cuts [8, 16, 3, 1, 6, 4, 18, 11] over domains of attributes. Almost all existing methods do not discern between equivalent cuts i.e. cuts discerning the same pairs of objects with different decisions. Usually an "intermediate" cut is chosen as a representive of the whole family of cuts. However it is not true that in the family of indiscernible cuts the "median" cuts are the best ones. This fact can be justified using probability theory. The main goal of this paper is to propose some methods for reconstructing the obtained cut set after completing the discretization process. 1 INTRODUCTION The decision tables with real value attributes are usually discretized in the preprocesing step before some synthesis strategies for decision rules are initiated. We list some well known groups of discretization methods. Among them are: equal width and equal frequency intervals, one r...
NOffl]t~ Global Discretization of Continuous Attributes as Preprocessing for Machine Learning
"... Reallife data usually are presented in databases by real numbers. On the other hand, most inductive learning methods require a small number of attribute values. Thus it is necessary to convert input data sets with continuous attributes into input data sets with discrete attributes. Methods of discr ..."
Abstract
 Add to MetaCart
(Show Context)
Reallife data usually are presented in databases by real numbers. On the other hand, most inductive learning methods require a small number of attribute values. Thus it is necessary to convert input data sets with continuous attributes into input data sets with discrete attributes. Methods of discretization restricted to single continuous attributes will be called local, while methods that simultaneously convert all continuous attributes will be called global. In this paper, a method of transforming any local discretization method into a global one is presented. A global discretization method, based on cluster analysis, is presented and compared experimentalty with three known local methods, transformed into global. Experiments include tenfold crossvalidation and leavingoneout methods for ten reallife data sets. © 1996 Elsevier Science Inc.
Article Discretization Based on Entropy and Multiple Scanning
, 2013
"... entropy ..."
(Show Context)
Data Mining and Knowledge Discovery, 6, 393–423, 2002 c ○ 2002 Kluwer Academic Publishers. Manufactured in The Netherlands. Discretization: An Enabling Technique
, 1999
"... Abstract. Discrete values have important roles in data mining and knowledge discovery. They are about intervals of numbers which are more concise to represent and specify, easier to use and comprehend as they are closer to a knowledgelevel representation than continuous values. Many studies show in ..."
Abstract
 Add to MetaCart
Abstract. Discrete values have important roles in data mining and knowledge discovery. They are about intervals of numbers which are more concise to represent and specify, easier to use and comprehend as they are closer to a knowledgelevel representation than continuous values. Many studies show induction tasks can benefit from discretization: rules with discrete values are normally shorter and more understandable and discretization can lead to improved predictive accuracy. Furthermore, many induction algorithms found in the literature require discrete features. All these prompt researchers and practitioners to discretize continuous features before or during a machine learning or data mining task. There are numerous discretization methods available in the literature. It is time for us to examine these seemingly different methods for discretization and find out how different they really are, what are the key components of a discretization process, how we can improve the current level of research for new development as well as the use of existing methods. This paper aims at a systematic study of discretization methods with their history of development, effect on classification, and tradeoff between speed and accuracy. Contributions of this paper are an abstract description summarizing existing discretization methods, a hierarchical framework to categorize the existing methods and pave the way for further development, concise discussions of representative discretization methods, extensive experiments and their analysis, and some guidelines as to how to choose a discretization method under various circumstances. We also identify some issues yet to solve and future research for discretization.
From Optimal Hyperplanes to Optimal Decision Trees: Rough Set and Boolean Reasoning Approaches
"... ..."
c©Department of Mathematics, UTM. Rough Fuzzy Approach in Tourism Demand Analysis
"... Abstract The substantial growth of the tourism activities in Malaysia clearly marks tourism as one of the most remarkable economic and social phenomena of the past few years. This paper introduces the roughfuzzy approach in tourism forecasting. The roughfuzzy is the extension of rough sets. Its ca ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract The substantial growth of the tourism activities in Malaysia clearly marks tourism as one of the most remarkable economic and social phenomena of the past few years. This paper introduces the roughfuzzy approach in tourism forecasting. The roughfuzzy is the extension of rough sets. Its can also be defined when the value of decisions and conditions attribute are uncertain. Within the hybridization process, we can see the strengthens of knowledge by the membership function. The study shows that the membership value of tourist arrivals from Saudi Arabia, Australia and US is 1, while the values for tourist arrivals from Taiwan and UK are 0.6755 and 0.2053 respectively. The degree of tourist arrivals from China and Thailand is also equal to 1, and from Japan’s is 0.4167.