Results 1  10
of
86
Feature Subset Selection Using A Genetic Algorithm
, 1997
"... : Practical pattern classification and knowledge discovery problems require selection of a subset of attributes or features (from a much larger set) to represent the patterns to be classified. This is due to the fact that the performance of the classifier (usually induced by some learning algorithm) ..."
Abstract

Cited by 188 (7 self)
 Add to MetaCart
: Practical pattern classification and knowledge discovery problems require selection of a subset of attributes or features (from a much larger set) to represent the patterns to be classified. This is due to the fact that the performance of the classifier (usually induced by some learning algorithm) and the cost of classification are sensitive to the choice of the features used to construct the classifier. Exhaustive evaluation of possible feature subsets is usually infeasible in practice because of the large amount of computational effort required. Genetic algorithms, which belong to a class of randomized heuristic search techniques, offer an attractive approach to find nearoptimal solutions to such optimization problems. This paper presents an approach to feature subset selection using a genetic algorithm. Some advantages of this approach include the ability to accommodate multiple criteria such as accuracy and cost of classification into the feature selection process and to find fe...
Toward integrating feature selection algorithms for classification and clustering
 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
, 2005
"... This paper introduces concepts and algorithms of feature selection, surveys existing feature selection algorithms for classification and clustering, groups and compares different algorithms with a categorizing framework based on search strategies, evaluation criteria, and data mining tasks, reveals ..."
Abstract

Cited by 136 (16 self)
 Add to MetaCart
This paper introduces concepts and algorithms of feature selection, surveys existing feature selection algorithms for classification and clustering, groups and compares different algorithms with a categorizing framework based on search strategies, evaluation criteria, and data mining tasks, reveals unattempted combinations, and provides guidelines in selecting feature selection algorithms. With the categorizing framework, we continue our efforts toward building an integrated system for intelligent feature selection. A unifying platform is proposed as an intermediate step. An illustrative example is presented to show how existing feature selection algorithms can be integrated into a meta algorithm that can take advantage of individual algorithms. An added advantage of doing so is to help a user employ a suitable algorithm without knowing details of each algorithm. Some realworld applications are included to demonstrate the use of feature selection in data mining. We conclude this work by identifying trends and challenges of feature selection research and development.
The variable selection problem
 Journal of the American Statistical Association
, 2000
"... The problem of variable selection is one of the most pervasive model selection problems in statistical applications. Often referred to as the problem of subset selection, it arises when one wants to model the relationship between a variable of interest and a subset of potential explanatory variables ..."
Abstract

Cited by 39 (2 self)
 Add to MetaCart
The problem of variable selection is one of the most pervasive model selection problems in statistical applications. Often referred to as the problem of subset selection, it arises when one wants to model the relationship between a variable of interest and a subset of potential explanatory variables or predictors, but there is uncertainty about which subset to use. This vignette reviews some of the key developments which have led to the wide variety of approaches for this problem. 1
Evolutionary Model Selection in Unsupervised Learning
, 2002
"... Feature subset selection is important not only for the insight gained from determining relevant modeling variables but also for the improved understandability, scalability, and possibly, accuracy of the resulting models. Feature selection has traditionally been studied in supervised learning situati ..."
Abstract

Cited by 17 (0 self)
 Add to MetaCart
Feature subset selection is important not only for the insight gained from determining relevant modeling variables but also for the improved understandability, scalability, and possibly, accuracy of the resulting models. Feature selection has traditionally been studied in supervised learning situations, with some estimate of accuracy used to evaluate candidate subsets. However, we often cannot apply supervised learning for lack of a training signal. For these cases, we propose a new feature selection approach based on clustering. A number of heuristic criteria can be used to estimate the quality of clusters built from a given feature subset. Rather than combining such criteria, we use ELSA, an evolutionary local selection algorithm that maintains a diverse population of solutions that approximate the Pareto front in a multidimensional objective space. Each evolved solution represents a feature subset and a number of clusters; two representative clustering algorithms, Kmeans and EM, are applied to form the given number of clusters based on the selected features. Experimental results on both real and synthetic data show that the method can consistently find approximate Paretooptimal solutions through which we can identify the significant features and an appropriate number of clusters. This results in models with better and clearer semantic relevance. 1.
Distribution of Mutual Information from Complete And Incomplete Data
 Computational Statistics and Data Analysis
, 2004
"... Mutual information is widely used, in a descriptive way, to measure the stochastic dependence of categorical random variables. In order to address questions such as the reliability of the descriptive value, one must consider sampletopopulation inferential approaches. This paper deals with the post ..."
Abstract

Cited by 14 (2 self)
 Add to MetaCart
Mutual information is widely used, in a descriptive way, to measure the stochastic dependence of categorical random variables. In order to address questions such as the reliability of the descriptive value, one must consider sampletopopulation inferential approaches. This paper deals with the posterior distribution of mutual information, as obtained in a Bayesian framework by a secondorder Dirichlet prior distribution. The exact analytical expression for the mean, and analytical approximations for the variance, skewness and kurtosis are derived. These approximations have a guaranteed accuracy level of the order O(n 3 ), where n is the sample size. Leading order approximations for the mean and the variance are derived in the case of incomplete samples. The derived analytical expressions allow the distribution of mutual information to be approximated reliably and quickly. In fact, the derived expressions can be computed with the same order of complexity needed for descriptive mutual information. This makes the distribution of mutual information become a concrete alternative to descriptive mutual information in many applications which would benefit from moving to the inductive side. Some of these prospective applications are discussed, and one of them, namely feature selection,isshowntoperform significantly better when inductive mutual information is used.
New techniques for extracting features from protein sequences
 IBM Systems Journal
, 2001
"... In this paper we propose new techniques to extract features from protein sequences. We then use the features as inputs for a Bayesian neural network (BNN) and apply the BNN to classifying protein sequences obtained from the PIR protein database maintained at the National Biomedical Research Foundati ..."
Abstract

Cited by 14 (2 self)
 Add to MetaCart
In this paper we propose new techniques to extract features from protein sequences. We then use the features as inputs for a Bayesian neural network (BNN) and apply the BNN to classifying protein sequences obtained from the PIR protein database maintained at the National Biomedical Research Foundation. To evaluate the performance of the proposed approach, we compare it with other protein classiers built based on sequence alignment and machine learning methods. Experimental results show the high precision of the proposed classi er and the complementarity of the bioinformatics tools studied in the paper.
A New Dependency and Correlation Analysis for Features
 IEEE Transactions on Knowledge and Data Engineering
, 2005
"... Abstract—The quality of the data being analyzed is a critical factor that affects the accuracy of data mining algorithms. There are two important aspects of the data quality, one is relevance and the other is data redundancy. The inclusion of irrelevant and redundant features in the data mining mode ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
Abstract—The quality of the data being analyzed is a critical factor that affects the accuracy of data mining algorithms. There are two important aspects of the data quality, one is relevance and the other is data redundancy. The inclusion of irrelevant and redundant features in the data mining model results in poor predictions and high computational overhead. This paper presents an efficient method concerning both the relevance of the features and the pairwise features correlation in order to improve the prediction and accuracy of our data mining algorithm. We introduce a new feature correlation metric QY ðXi;XjÞ and feature subset merit measure eðSÞ to quantify the relevance and the correlation among features with respect to a desired data mining task (e.g., detection of an abnormal behavior in a network service due to network attacks). Our approach takes into consideration not only the dependency among the features, but also their dependency with respect to a given data mining task. Our analysis shows that the correlation relationship among features depends on the decision task and, thus, they display different behaviors as we change the decision task. We applied our data mining approach to network security and validated it using the DARPA KDD99 benchmark data set. Our results show that, using the new decision dependent correlation metric, we can efficiently detect rare network attacks such as User to Root (U2R) and Remote to Local (R2L) attacks. The best reported detection rates for U2R and R2L on the KDD99 data sets were 13.2 percent and 8.4 percent with 0.5 percent false alarm, respectively. For U2R attacks, our approach can achieve a 92.5 percent detection rate with a false alarm of 0.7587 percent. For R2L attacks, our approach can achieve a 92.47 percent detection rate with a false alarm of 8.35 percent. Index Terms—Feature extraction, correlation measure. æ 1
Feature selection filters based on the permutation test
 Pedreschi (Eds.), Machine Learning: ECML 2004, 15th European Conference on Machine Learning
, 2004
"... Abstract. We investigate the problem of supervised feature selection within the filtering framework. In our approach, applicable to the twoclass problems, the feature strength is inversely proportional to the pvalue of the null hypothesis that its classconditional densities, p(X  Y = 0) and p(X ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
Abstract. We investigate the problem of supervised feature selection within the filtering framework. In our approach, applicable to the twoclass problems, the feature strength is inversely proportional to the pvalue of the null hypothesis that its classconditional densities, p(X  Y = 0) and p(X  Y = 1), are identical. To estimate the pvalues, we use Fisher’s permutation test combined with the four simple filtering criteria in the roles of test statistics: sample mean difference, symmetric KullbackLeibler distance, information gain, and chisquare statistic. The experimental results of our study, performed using naive Bayes classifier and support vector machines, strongly indicate that the permutation test improves the abovementioned filters and can be used effectively when sample size is relatively small and number of features relatively large. 1
InformationTheoretic Feature Selection in Microarray Data using Variable Complementarity
, 2009
"... The paper presents an original filter approach for effective feature selection in microarray data characterized by a large number of input variables and a few samples. The approach is based on the use of a new informationtheoretic selection, the Double Input Symmetrical Relevance (DISR), which re ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
The paper presents an original filter approach for effective feature selection in microarray data characterized by a large number of input variables and a few samples. The approach is based on the use of a new informationtheoretic selection, the Double Input Symmetrical Relevance (DISR), which relies on a measure of variable complementarity. This measure evaluates the additional information that a set of variables provides about the output with respect to the sum of each single variable contribution. We show that a variable selection approach based on DISR can be formulated as a quadratic optimization problem the Dispersion Sum Problem. To solve this problem, we use a strategy based on Backward Elimination and Sequential Replacement (BESR). The combination of BESR and the DISR criterion is compared in theoretical and experimental terms to recently proposed informationtheoretic criteria. Experimental results on a synthetic dataset as well as on a set of 11 microarray classification tasks show that the proposed technique is competitive with existing filter selection methods.