Results 1 - 10
of
30
An introduction to variable and feature selection
- Journal of Machine Learning Research
, 2003
"... Variable and feature selection have become the focus of much research in areas of application for which datasets with tens or hundreds of thousands of variables are available. ..."
Abstract
-
Cited by 431 (8 self)
- Add to MetaCart
Variable and feature selection have become the focus of much research in areas of application for which datasets with tens or hundreds of thousands of variables are available.
Result analysis of the NIPS 2003 feature selection challenge
- Advances in Neural Information Processing Systems 17
, 2004
"... The NIPS 2003 workshops included a feature selection competition organized by the authors. We provided participants with five datasets from different application domains and called for classification results using a minimal number of features. The competition took place over a period of 13 weeks and ..."
Abstract
-
Cited by 24 (7 self)
- Add to MetaCart
The NIPS 2003 workshops included a feature selection competition organized by the authors. We provided participants with five datasets from different application domains and called for classification results using a minimal number of features. The competition took place over a period of 13 weeks and attracted 78 research groups. Participants were asked to make on-line submissions on the validation and test sets, with performance on the validation set being presented immediately to the participant and performance on the test set presented to the participants at the workshop. In total 1863 entries were made on the validation sets during the development period and 135 entries on all test sets for the final competition. The winners used a combination of Bayesian neural networks with ARD priors and Dirichlet diffusion trees. Other top entries used a variety of methods for feature selection, which combined filters and/or wrapper or embedded methods using Random Forests, kernel methods, or neural networks as a classification engine. The results of the benchmark (including the predictions made by the participants and the features they selected) and the scoring software are publicly available. The benchmark is available at www.nipsfsc.ecs.soton.ac.uk for post-challenge submissions to stimulate further research. 1
Feature Selection for Unsupervised and Supervised Inference: the Emergence of Sparsity in a Weighted-Based Approach
"... The problem of selecting a subset of relevant features in a potentially overwhelming quantity of data is classic and found in many branches of science including --- examples in computer vision, text processing and more recently bioinformatics are abundant. In this work we present a definition of "re ..."
Abstract
-
Cited by 20 (2 self)
- Add to MetaCart
The problem of selecting a subset of relevant features in a potentially overwhelming quantity of data is classic and found in many branches of science including --- examples in computer vision, text processing and more recently bioinformatics are abundant. In this work we present a definition of "relevancy" based on spectral properties of the Affinity (or Laplacian) of the features' measurement matrix. The feature selection process is then based on a continuous ranking of the features defined by a least-squares optimization process. A remarkable property of the feature relevance function is that sparse solutions for the ranking values naturally emerge as a result of a "biased non-negativity" of a key matrix in the process. As a result, a simple least-squares optimization process converges onto a sparse solution, i.e., a selection of a subset of features which form a local maxima over the relevance function. The feature selection algorithm can be embedded in both unsupervised and supervised inference problems and empirical evidence show that the feature selections typically achieve high accuracy even when only a small fraction of the features are relevant.
Compression-based averaging of selective naive Bayes classifiers
- Journal of Machine Learning Research
, 2007
"... The naive Bayes classifier has proved to be very effective on many real data applications. Its performance usually benefits from an accurate estimation of univariate conditional probabilities and from variable selection. However, although variable selection is a desirable feature, it is prone to ove ..."
Abstract
-
Cited by 13 (3 self)
- Add to MetaCart
The naive Bayes classifier has proved to be very effective on many real data applications. Its performance usually benefits from an accurate estimation of univariate conditional probabilities and from variable selection. However, although variable selection is a desirable feature, it is prone to overfitting. In this paper, we introduce a Bayesian regularization technique to select the most probable subset of variables compliant with the naive Bayes assumption. We also study the limits of Bayesian model averaging in the case of the naive Bayes assumption and introduce a new weighting scheme based on the ability of the models to conditionally compress the class labels. The weighting scheme on the models reduces to a weighting scheme on the variables, and finally results in a naive Bayes classifier with “soft variable selection”. Extensive experiments show that the compressionbased averaged classifier outperforms the Bayesian model averaging scheme.
Feature Selection for Ranking
- Proceedings of the 30th Annual International ACM SIGIR Conference
, 2007
"... Ranking is a very important topic in information retrieval. While algorithms for learning ranking models have been intensively studied, this is not the case for feature selection, despite of its importance. The reality is that many feature selection methods used in classification are directly applie ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
Ranking is a very important topic in information retrieval. While algorithms for learning ranking models have been intensively studied, this is not the case for feature selection, despite of its importance. The reality is that many feature selection methods used in classification are directly applied to ranking. We argue that because of the striking differences between ranking and classification, it is better to develop different feature selection methods for ranking. To this end, we propose a new feature selection method in this paper. Specifically, for each feature we use its value to rank the training instances, and define the ranking accuracy in terms of a performance measure or a loss function as the importance of the feature. We also define the correlation between the ranking results of two features as the similarity between them. Based on the definitions, we formulate the feature selection issue as an optimization problem, for which it is to find the features with maximum total importance scores and minimum total similarity scores. We also demonstrate how to solve the optimization problem in an efficient way. We have tested the effectiveness of our feature selection method on two information retrieval datasets and with two ranking models. Experimental results show that our method can outperform traditional feature selection methods for the ranking task.
Statistical Learning and Kernel Methods in Bioinformatics
- in Bioinformatics,” Artificial Intelligence and Heuristic Methods in Bioinformatics 183, (Eds.) P. Frasconi und R. Shamir, IOS
, 2000
"... We briefly describe the main ideas of statistical learning theory, support vector machines, and kernel feature spaces. In addition, we present an overview of applications of kernel methods in bioinformatics. ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
We briefly describe the main ideas of statistical learning theory, support vector machines, and kernel feature spaces. In addition, we present an overview of applications of kernel methods in bioinformatics.
Feature selection for Descriptor based Classification Models
- Part II - Human Intestinal Absorption (HIA). J. Chem. Inf. Comput. Sci
, 2003
"... The paper describes different aspects of classification models based on molecular data sets with the focus on feature selection methods. Especially model quality and avoiding a high variance on unseen data (overfitting) will be discussed with respect to the feature selection problem. We present seve ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
The paper describes different aspects of classification models based on molecular data sets with the focus on feature selection methods. Especially model quality and avoiding a high variance on unseen data (overfitting) will be discussed with respect to the feature selection problem. We present several standard approaches and modifications of our Genetic Algorithm based on the Shannon Entropy Cliques (GA-SEC) algorithm and the extension for classification problems using boosting.
Causal feature selection
, 2001
"... This chapter reviews techniques for learning causal relationships from data, in application to the problem of feature selection. Most feature selection methods do not attempt to uncover causal relationships between feature and target and focus instead on making best predictions. We examine situation ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
This chapter reviews techniques for learning causal relationships from data, in application to the problem of feature selection. Most feature selection methods do not attempt to uncover causal relationships between feature and target and focus instead on making best predictions. We examine situations in which the knowledge of causal relationships benefits feature selection. Such benefits may include: explaining relevance in terms of causal mechanisms, distinguishing between actual features and experimental artifacts, predicting the consequences of actions performed by external agents, and making predictions in non-stationary environments. Conversely, we highlight the benefits that causal discovery may draw from recent developments in feature selection theory and algorithms.
Feature selection filters based on the permutation test
- Pedreschi (Eds.), Machine Learning: ECML 2004, 15th European Conference on Machine Learning
, 2004
"... Abstract. We investigate the problem of supervised feature selection within the filtering framework. In our approach, applicable to the two-class problems, the feature strength is inversely proportional to the p-value of the null hypothesis that its class-conditional densities, p(X | Y = 0) and p(X ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Abstract. We investigate the problem of supervised feature selection within the filtering framework. In our approach, applicable to the two-class problems, the feature strength is inversely proportional to the p-value of the null hypothesis that its class-conditional densities, p(X | Y = 0) and p(X | Y = 1), are identical. To estimate the p-values, we use Fisher’s permutation test combined with the four simple filtering criteria in the roles of test statistics: sample mean difference, symmetric Kullback-Leibler distance, information gain, and chi-square statistic. The experimental results of our study, performed using naive Bayes classifier and support vector machines, strongly indicate that the permutation test improves the above-mentioned filters and can be used effectively when sample size is relatively small and number of features relatively large. 1
Explanation-Based Feature Construction
"... Choosing good features to represent objects can be crucial to the success of supervised machine learning algorithms. Good high-level features are those that concentrate information about the classification task. Such features can often be constructed as non-linear combinations of raw or native input ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Choosing good features to represent objects can be crucial to the success of supervised machine learning algorithms. Good high-level features are those that concentrate information about the classification task. Such features can often be constructed as non-linear combinations of raw or native input features such as the pixels of an image. Using many nonlinear combinations, as do SVMs, can dilute the classification information necessitating many training examples. On the other hand, searching even a modestly-expressive space of nonlinear functions for high-information ones can be intractable. We describe an approach to feature construction where task-relevant discriminative features are automatically constructed, guided by an explanation-based interaction of training examples and prior domain knowledge. We show that in the challenging task of distinguishing handwritten Chinese characters, our automatic feature-construction approach performs particularly well on the most difficult and complex character pairs. 1

