Results 1 -
3 of
3
Feature Selection in Unsupervised Learning via Evolutionary Search
- In Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
, 2000
"... Feature subset selection is an important problem in knowl- edge discovery, not only for the insight gained from deter- mining relevant modeling variables but also for the improved understandability, scalability, and possibly, accuracy of the resulting models. In this paper we consider the problem of ..."
Abstract
-
Cited by 48 (3 self)
- Add to MetaCart
Feature subset selection is an important problem in knowl- edge discovery, not only for the insight gained from deter- mining relevant modeling variables but also for the improved understandability, scalability, and possibly, accuracy of the resulting models. In this paper we consider the problem of feature selection for unsupervised learning. A number of heuristic criteria can be used to estimate the quality of clusters built from a given featuresubset. Rather than combining such criteria, we use ELSA, an evolutionary lo- cal selection algorithm that maintains a diverse population of solutions that approximate the Pareto front in a multi- dimensional objectiv espace. Each evolved solution repre- sents a feature subset and a number of clusters; a standard K-means algorithm is applied to form the given n umber of clusters based on the selected features. Preliminary results on both real and synthetic data show promise in finding Pareto-optimal solutions through which we can identify the significant features and the correct number of clusters.
Evolutionary Model Selection in Unsupervised Learning
, 2002
"... Feature subset selection is important not only for the insight gained from determining relevant modeling variables but also for the improved understandability, scalability, and possibly, accuracy of the resulting models. Feature selection has traditionally been studied in supervised learning situati ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
Feature subset selection is important not only for the insight gained from determining relevant modeling variables but also for the improved understandability, scalability, and possibly, accuracy of the resulting models. Feature selection has traditionally been studied in supervised learning situations, with some estimate of accuracy used to evaluate candidate subsets. However, we often cannot apply supervised learning for lack of a training signal. For these cases, we propose a new feature selection approach based on clustering. A number of heuristic criteria can be used to estimate the quality of clusters built from a given feature subset. Rather than combining such criteria, we use ELSA, an evolutionary local selection algorithm that maintains a diverse population of solutions that approximate the Pareto front in a multi-dimensional objective space. Each evolved solution represents a feature subset and a number of clusters; two representative clustering algorithms, K-means and EM, are applied to form the given number of clusters based on the selected features. Experimental results on both real and synthetic data show that the method can consistently find approximate Pareto-optimal solutions through which we can identify the significant features and an appropriate number of clusters. This results in models with better and clearer semantic relevance. 1.
Customer targeting: A neural network approach guided by genetic algorithms, Management Science 51 (2
, 2005
"... informs ® doi 10.1287/mnsc.1040.0296 © 2005 INFORMS One of the key problems in database marketing is the identification and profiling of households that are most likely to be interested in a particular product or service. Principal component analysis (PCA) of customer background information followed ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
informs ® doi 10.1287/mnsc.1040.0296 © 2005 INFORMS One of the key problems in database marketing is the identification and profiling of households that are most likely to be interested in a particular product or service. Principal component analysis (PCA) of customer background information followed by logistic regression analysis of response behavior is commonly used by database marketers. In this paper, we propose a new approach that uses artificial neural networks (ANNs) guided by genetic algorithms (GAs) to target households. We show that the resulting selection rule is more accurate and more parsimonious than the PCA/logit rule when the manager has a clear decision criterion. Under vague decision criteria, the new procedure loses its advantage in interpretability, but is still more accurate than PCA/logit in targeting households. Key words: database marketing; neural networks; genetic algorithms; customer relationship management

