Results 1 - 10
of
41
An introduction to variable and feature selection
- Journal of Machine Learning Research
, 2003
"... Variable and feature selection have become the focus of much research in areas of application for which datasets with tens or hundreds of thousands of variables are available. ..."
Abstract
-
Cited by 431 (8 self)
- Add to MetaCart
Variable and feature selection have become the focus of much research in areas of application for which datasets with tens or hundreds of thousands of variables are available.
Gene Selection Using Support Vector Machines With Nonconvex Penalty
- Bioinformatics
, 2006
"... Motivation: With the development of DNA microarray technology, scientists can now measure the expression levels of thousands of genes simultaneously in one single experiment. One current difficulty in interpreting microarray data comes from their innate nature of “high dimensional low sample size.” ..."
Abstract
-
Cited by 15 (1 self)
- Add to MetaCart
Motivation: With the development of DNA microarray technology, scientists can now measure the expression levels of thousands of genes simultaneously in one single experiment. One current difficulty in interpreting microarray data comes from their innate nature of “high dimensional low sample size.” Therefore, robust and accurate gene selection methods are required to identify differentially expressed group of genes across different samples, e.g., between cancerous and normal cells. Successful gene selection will help to classify different cancer types, lead to a better understanding of genetic signatures in cancers, and improve treatment strategies. Although gene selection and cancer classification are two closely related problems, most existing approaches handle them separately by selecting genes prior to classification. We provide
Gene extraction for cancer diagnosis by support vector machinesan improvement
- Artificial Intelligence in Medicine
, 2005
"... Abstract. A cancer diagnosis by using the DNA microarray data faces many challenges the most serious one being the presence of thousands of genes and only several dozens (at the best) of patient’s samples. Thus, making any kind of classification in high-dimensional spaces from a limited number of da ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
Abstract. A cancer diagnosis by using the DNA microarray data faces many challenges the most serious one being the presence of thousands of genes and only several dozens (at the best) of patient’s samples. Thus, making any kind of classification in high-dimensional spaces from a limited number of data is both an extremely difficult and a prone to an error procedure. The improved Recursive Feature Elimination with Support Vector Machines (RFE-SVMs) is introduced and used here for an elimination of less relevant genes and just for a reduction of the overall number of genes used in a medical diagnostic. The paper shows why and how the, usually neglected, penalty parameter C influence classification results and the gene selection of RFE-SVMs. With an appropriate parameter C chosen, the reduction in a diagnosis error is as high as 37% on the colon cancer data set. The results suggest that with a properly chosen parameter C, the extracted genes and the constructed classifier will ensure less over-fitting of the training data leading to an increase accuracy in selecting relevant genes. 1
Direct convex relaxations of sparse svm
- in ICML ’07: Proceedings of the 24th international conference on Machine learning
"... Although support vector machines (SVMs) for binary classification give rise to a decision rule that only relies on a subset of the training data points (support vectors), it will in general be based on all available features in the input space. We propose two direct, novel convex relaxations of a no ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
Although support vector machines (SVMs) for binary classification give rise to a decision rule that only relies on a subset of the training data points (support vectors), it will in general be based on all available features in the input space. We propose two direct, novel convex relaxations of a nonconvex sparse SVM formulation that explicitly constrains the cardinality of the vector of feature weights. One relaxation results in a quadratically-constrained quadratic program (QCQP), while the second is based on a semidefinite programming (SDP) relaxation. The QCQP formulation can be interpreted as applying an adaptive soft-threshold on the SVM hyperplane, while the SDP formulation learns a weighted inner-product (i.e. a kernel) that results in a sparse hyperplane. Experimental results show an increase in sparsity while conserving the generalization performance compared to a standard as well as a linear programming SVM. 1.
Bci competition iii: dataset ii- ensemble of svms for bci p300 speller
- IEEE Trans Biomed Eng
, 2008
"... Brain-Computer Interface P300 speller aims at helping patients unable to activate muscles to spell words by means of their brain signal activities. Associated to this BCI paradigm, there is the problem of classifying electroencephalogram signals related to responses to some visual stimuli. This pape ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
Brain-Computer Interface P300 speller aims at helping patients unable to activate muscles to spell words by means of their brain signal activities. Associated to this BCI paradigm, there is the problem of classifying electroencephalogram signals related to responses to some visual stimuli. This paper addresses the problem of signal responses variability within a single subject in such Brain-Computer Interface. We propose a method that copes with such variabilities through an ensemble of classifiers approach. Each classifier is composed of a linear Support Vector Machine trained on a small part of the available data and for which a channel selection procedure has been performed. Performances of our algorithm have been evaluated on dataset II of the BCI Competition III and has yielded the best performance of the competition. 1
Trait Selection for Assessing Beef Meat Quality Using Non-Linear SVM
, 2004
"... In this paper we show that it is possible to model sensory impressions of consumers about beef meat. This is not a straightforward task; the reason is that when we are aiming to induce a function that maps object descriptions into ratings, we must consider that consumers' ratings are just a way ..."
Abstract
-
Cited by 7 (4 self)
- Add to MetaCart
In this paper we show that it is possible to model sensory impressions of consumers about beef meat. This is not a straightforward task; the reason is that when we are aiming to induce a function that maps object descriptions into ratings, we must consider that consumers' ratings are just a way to express their preferences about the products presented in the same testing session. Therefore, we had to use a special purpose SVM polynomial kernel. The training data set used collects the ratings of panels of experts and consumers; the meat was provided by 103 bovines of 7 Spanish breeds with different carcass weights and aging periods.
Embedded Methods
"... Although many embedded feature selection methods have been introduced during the last few years, a unifying theoretical framework has not been developed to date. We start this chapter by defining such a framework which we think is general enough to cover many embedded methods. We will then discuss e ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
Although many embedded feature selection methods have been introduced during the last few years, a unifying theoretical framework has not been developed to date. We start this chapter by defining such a framework which we think is general enough to cover many embedded methods. We will then discuss embedded methods based on how they solve the feature selection problem.
Feature selection for Descriptor based Classification Models
- Part II - Human Intestinal Absorption (HIA). J. Chem. Inf. Comput. Sci
, 2003
"... The paper describes different aspects of classification models based on molecular data sets with the focus on feature selection methods. Especially model quality and avoiding a high variance on unseen data (overfitting) will be discussed with respect to the feature selection problem. We present seve ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
The paper describes different aspects of classification models based on molecular data sets with the focus on feature selection methods. Especially model quality and avoiding a high variance on unseen data (overfitting) will be discussed with respect to the feature selection problem. We present several standard approaches and modifications of our Genetic Algorithm based on the Shannon Entropy Cliques (GA-SEC) algorithm and the extension for classification problems using boosting.
Robust support vector method for hyperspectral data classification and knowledge discovery
- IEEE Transactions on Geoscience and Remote Sensing
, 2004
"... Abstract — In this paper, we propose the use of Support Vector Machines (SVM) for automatic hyperspectral data classification and knowledge discovery. In the first stage of the study, we use SVMs for crop classification and analyze their performance in terms of efficiency and robustness, as compared ..."
Abstract
-
Cited by 6 (4 self)
- Add to MetaCart
Abstract — In this paper, we propose the use of Support Vector Machines (SVM) for automatic hyperspectral data classification and knowledge discovery. In the first stage of the study, we use SVMs for crop classification and analyze their performance in terms of efficiency and robustness, as compared to extensively used neural and fuzzy methods. Efficiency is assessed by evaluating accuracy and statistical differences in several scenes. Robustness is analyzed in terms of (a) suitability to working conditions when a feature selection stage is not possible, and (b) performance when different levels of Gaussian noise are introduced at their inputs. In the second stage of this work, we analyze the distribution of the support vectors (SV) and perform sensitivity analysis on the best classifier in order to analyze the significance of the input spectral bands. For classification purposes, six hyperspectral images acquired with the 128-band HyMAP spectrometer during the DAISEX-1999 campaign are used. Six crop classes were labelled for each image. A reduced set of labelled samples is used to train the models and the entire images are used to assess their performance. Several conclusions are drawn: (1) SVMs yield better outcomes than neural networks regarding accuracy, simplicity and robustness; (2) training neural and neurofuzzy models is unfeasible when working with high dimensional input spaces and great amounts of training data; (3) SVMs perform similarly for different training subsets with varying input dimension, which indicates that noisy bands are successfully detected; and (4) a valuable ranking of bands through sensitivity analysis is achieved. Index Terms — Hyperspectral imagery, crop classification, knowledge discovery, Support Vector Machines, neural networks.

