Results 1 - 10
of
10
Genetic programming for attribute construction in data mining
- Genetic Programming, Proceedings of EuroGP’2003, volume 2610 of LNCS
, 2003
"... Abstract. For a given data set, its set of attributes defines its data space representation. The quality of a data space representation is one of the most important factors influencing the performance of a data mining algorithm. The attributes defining the data space can be inadequate, making it dif ..."
Abstract
-
Cited by 14 (0 self)
- Add to MetaCart
Abstract. For a given data set, its set of attributes defines its data space representation. The quality of a data space representation is one of the most important factors influencing the performance of a data mining algorithm. The attributes defining the data space can be inadequate, making it difficult to discover highquality knowledge. In order to solve this problem, this paper proposes a Genetic Programming algorithm developed for attribute construction. This algorithm constructs new attributes out of the original attributes of the data set, performing an important preprocessing step for the subsequent application of a data mining algorithm. 1
Constructing X-of-N attributes with a genetic algorithm
- In Proc. of the Genetic and Evolutionary Computation Conference
, 2002
"... The predictive accuracy obtained by a classification algorithm is strongly dependent on the quality of the attributes of the data being mined. When the attributes are little relevant for predicting the class of a record, the predictive accuracy will tend to be low. To combat this problem, a natural ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
The predictive accuracy obtained by a classification algorithm is strongly dependent on the quality of the attributes of the data being mined. When the attributes are little relevant for predicting the class of a record, the predictive accuracy will tend to be low. To combat this problem, a natural approach consists of constructing new attributes out of the original attributes. Many attribute construction algorithms work by simply constructing conjunctions and/or disjunctions of attribute-value pairs. This kind of representation has a limited expressiveness power to represent attribute interactions. A more expressive representation is X-of-N [Zheng 1995]. An Xof-N condition consists of a set of N attribute-value pairs. The value of an X-of-N condition for a given example
A Combinatorial Fusion Method for Feature Mining
- Proceedings of KDD'07 Workshop on Mining Multiple Information Sources
, 2007
"... This paper demonstrates how methods borrowed from information fusion can improve the performance of a classifier by constructing (“fusing”) new features that are combinations of existing numeric features. This work is an example of local pattern analysis and fusion because it identifies potentially ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
This paper demonstrates how methods borrowed from information fusion can improve the performance of a classifier by constructing (“fusing”) new features that are combinations of existing numeric features. This work is an example of local pattern analysis and fusion because it identifies potentially useful patterns (i.e., feature combinations) from a single data source. In our work, we fuse features by mapping the numeric values for each feature to a rank and then averaging these ranks. The quality of the fused features is measured with respect to how well they classify minority-class examples, which makes this method especially effective for dealing with data sets that exhibit class imbalance. This paper evaluates our combinatorial feature fusion method on ten data sets, using three learning methods. The results indicate that our method can be quite effective in improving classifier performance, although it seems to improve the performance of some learning methods more than others.
Survey of Classification Techniques in Data Mining
"... Abstract — Classification is a data mining (machine learning) technique used to predict group membership for data instances. In this paper, we present the basic classification techniques. Several major kinds of classification method including decision tree induction, Bayesian networks, k-nearest nei ..."
Abstract
- Add to MetaCart
Abstract — Classification is a data mining (machine learning) technique used to predict group membership for data instances. In this paper, we present the basic classification techniques. Several major kinds of classification method including decision tree induction, Bayesian networks, k-nearest neighbor classifier, case-based reasoning, genetic algorithm and fuzzy logic techniques. The goal of this survey is to provide a comprehensive review of different classification techniques in data mining.
The Application of Genetic Programming for Feature Construction in Classification
, 2005
"... This Thesis addresses the task of feature construction for classification. The quality of the data is one of the most important factors influencing the performance of any classification algorithm. The attributes defining the feature space of a given data set can often be inadequate, making it diffic ..."
Abstract
- Add to MetaCart
This Thesis addresses the task of feature construction for classification. The quality of the data is one of the most important factors influencing the performance of any classification algorithm. The attributes defining the feature space of a given data set can often be inadequate, making it difficult to discover interesting knowledge. However, even when the original attributes are individually inadequate, it is often possible to combine such attributes in order to construct new ones with greater predictive power. The goal of this Thesis is to restructure the feature space in order to improve the performance of decision tree classification techniques on complex, real world data. The proposed framework involves the use of genetic programming to evolve (construct) new attributes, which are non--linear combinations of the original attributes. This approach incorporates a number of decision tree splitting mechanisms in the fitness measures of the genetic program. The empirical
A Combinatorial Fusion Method for Feature Construction
"... Abstract- This paper demonstrates how methods borrowed from information fusion can improve the performance of a classifier by constructing (i.e., fusing) new features that are combinations of existing numeric features. The new features are constructed by mapping the numeric values for each feature t ..."
Abstract
- Add to MetaCart
Abstract- This paper demonstrates how methods borrowed from information fusion can improve the performance of a classifier by constructing (i.e., fusing) new features that are combinations of existing numeric features. The new features are constructed by mapping the numeric values for each feature to a rank and then averaging these ranks. The quality of the fused features is measured with respect to how well they classify minority-class examples, which makes this method especially effective for dealing with data sets that exhibit class imbalance. This paper evaluates our combinatorial feature fusion method on ten data sets, using three learning methods. The results indicate that our method can be quite effective in improving classifier performance.
Optimizing Feature Construction Process for Dynamic Aggregation of Relational Attributes
"... Abstract: Problem statement: The importance of input representation has been recognized already in machine learning. Feature construction is one of the methods used to generate relevant features for learning data. This study addressed the question whether or not the descriptive accuracy of the DARA ..."
Abstract
- Add to MetaCart
Abstract: Problem statement: The importance of input representation has been recognized already in machine learning. Feature construction is one of the methods used to generate relevant features for learning data. This study addressed the question whether or not the descriptive accuracy of the DARA algorithm benefits from the feature construction process. In other words, this paper discusses the application of genetic algorithm to optimize the feature construction process to generate input data for the data summarization method called Dynamic Aggregation of Relational Attributes (DARA). Approach: The DARA algorithm was designed to summarize data stored in the non-target tables by clustering them into groups, where multiple records stored in non-target tables correspond to a single record stored in a target table. Here, feature construction methods are applied in order to improve the descriptive accuracy of the DARA algorithm. Since, the study addressed the question whether or not the descriptive accuracy of the DARA algorithm benefits from the feature construction process, the involved task includes solving the problem of constructing a relevant set of features for the DARA algorithm by using a genetic-based algorithm. Results: It is shown in the experimental results that the quality of summarized data is directly influenced by the methods used to create patterns that represent records in the (n×p) TF-IDF weighted frequency matrix. The results of the evaluation of the geneticbased feature construction algorithm showed that the data summarization results can be improved by constructing features by using the Cluster Entropy (CE) genetic-based feature construction algorithm. Conclusion: This study showed that the data summarization results can be improved by constructing features by using the cluster entropy genetic-based feature construction algorithm.
Data Preprocessing for Supervised Learning
, 2006
"... Many factors affect the success of Machine Learning (ML) on a given task. The representation and quality of the instance data is first and foremost. If there is much irrelevant and redundant information present or noisy and unreliable data, then knowledge discovery during the training phase is more ..."
Abstract
- Add to MetaCart
Many factors affect the success of Machine Learning (ML) on a given task. The representation and quality of the instance data is first and foremost. If there is much irrelevant and redundant information present or noisy and unreliable data, then knowledge discovery during the training phase is more difficult. It is well known that data preparation and filtering steps take considerable amount of processing time in ML problems. Data pre-processing includes data cleaning, normalization, transformation, feature extraction and selection, etc. The product of data pre-processing is the final training set. It would be nice if a single sequence of data pre-processing algorithms had the best performance for each data set but this is not happened. Thus, we present the most well know algorithms for each step of data pre-processing so that one achieves the best performance for their data set.

