## Comparison of Redundancy and Relevance Measures for Feature Selection in Tissue Classification of CT images (2010)

### BibTeX

@MISC{Auffarth10comparisonof,

author = {Benjamin Auffarth and Maite López and Jesús Cerquides},

title = {Comparison of Redundancy and Relevance Measures for Feature Selection in Tissue Classification of CT images},

year = {2010}

}

### OpenURL

### Abstract

In this paper we report on a study on feature selection within the minimum–redundancy maximum–relevance framework. Features are ranked by their correlations to the target vector. These relevance scores are then integrated with correlations between features in order to obtain a set of relevant and least–redundant features. Applied measures of correlation or distributional similarity for redunancy and relevance include Kolmogorov–Smirnov (KS) test, Spearman correlations, Jensen– Shannon divergence, and the sign–test. We introduce a metric called “value difference metric “ (VDM) and present a simple measure, which we call “fit criterion “ (FC). We draw conclusions about the usefulness of different measures. While KS–test and sign–test provided useful information, Spearman correlations are not fit for comparison of data of different measurement intervals. VDM was very good in our experiments as both redundancy and relevance measure. Jensen–Shannon and the sign–test are good redundancy measure alternatives and FC is a good relevance measure alternative. Key words: feature selection; relevance and redundancy; distributional similarity; divergence measure 1

### Citations

3728 | LIBSVM: A library for support vector machines, 2001. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
- Chang, Lin
(Show Context)
Citation Context ...upport Vector Machine. As for Naïve Bayes we relied on our own implementation for multi-valued attributes using 100 bins. For GentleBoost we used 50 iterations. For SVM classification, we used libsvm =-=[22]-=-. Features were z-normalized and the cost function was made to compensate for unequal class priors, i. e. the weight of the less frequent class was set to ( ♯(Y =c2) ♯(Y =c1) , ♯(Y =c1) ♯(Y =c2) ) . W... |

1031 |
Structural equations with Latent Variables
- Bollen
- 1989
(Show Context)
Citation Context ...ons between distributions with highly unequal scales, such as the case for comparing classes (set cardinality 2) and continuous features. The Pearson correlation coefficient suffers the same weakness =-=[29]-=-. We expect, the Kendall rank correlation coefficient (see [30]), another much used rank correlation, to have similar problems in dealing with distributions. Other correlation measures could bring an ... |

495 |
Toward memory-based reasoning
- Stanfill, Waltz
- 1986
(Show Context)
Citation Context ...ations and [16] as one of many recommendations to use Spearman correlations instead). We show how a measure of probability difference, similar to one presented before as the “value difference metric“ =-=[17]-=-, can be adapted as a relevance criterion. We propose a new measure, which we call “fit criterion“ which measures relevance similar to the z-score. Value Difference Metric We will refer to p(X) as the... |

438 | Divergence measures based on the Shannon entropy
- Lin
- 1991
(Show Context)
Citation Context ...ck-Leibler divergence (sometimes: information divergence, information gain, relative entropy, which is an information theoretic measure of the difference between two probability distributions P and Q =-=[20]-=-. We will describe the redundancy VDM and the redundancy fit criterion in the following. Redundancy Fit Criterion Equation 11 gives the goodness of fit with respect to two classes, c1 and c2, averaged... |

303 | Statistical comparisons of classifiers over multiple data sets
- Demšar
(Show Context)
Citation Context ...ndancy criteria which only differ in using class-conditional distributions and total distributions. 4.1 Statistical Evaluation We used AUC as our performance measure. Following the recommendations of =-=[28]-=- we did not base our statistics on performances of single folds but took averages (medians 7 ) over folds. 7 According to the central limit theorem, any sum (such as e. g. a performance benchmark), if... |

265 | Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy
- Peng, Long, et al.
(Show Context)
Citation Context ... selection, it is important to choose features that are relevant for prediction, but at the same time it is important to have a set of features which is not redundant in order to increase robustness. =-=[3,4,5,6,7,8,9,10]-=- have elaborated on the concepts of redundancy and relevance for feature selection. [11,4,9] presented feature selection in a framework they call min-redundancy max-relevance (here short mRmR) that in... |

218 | Improved heterogeneous distance functions
- Wilson, Martinez
- 1997
(Show Context)
Citation Context ...ntinuous, monotonic function that measures overlap between two variables X1 and X2: (∫ |p(X1 = x) − p(X2 = x)| q ) 1/q dx , where q is a parameter (2) We chose q = 1, which has been used similarly by =-=[17,18,19]-=- under the name value difference metric as distance measure. Given that the probabilities that X is equal to a given x for all possible values of x is 1, ∫ p(x)dx = 1, total divergence would give the ... |

154 | Feature selection for high-dimensional data: A fast correlation-based filter solution
- Yu, Liu
(Show Context)
Citation Context ...on, but we thought it might be better to use a nonparametric measure instead of relying on linear correlations (Pearson product– moment correlations), which have been used before as relevance measure =-=[14,8]-=-. We did not use Pearson correlations because of their sensibility to extreme values, their focus on strictly linear relationships, and the assumption of gaussianity. For non-gaussian data rank–correl... |

131 | Minimum redundancy feature selection from microarray gene expression data
- Ding, Peng
- 2003
(Show Context)
Citation Context ... selection, it is important to choose features that are relevant for prediction, but at the same time it is important to have a set of features which is not redundant in order to increase robustness. =-=[3,4,5,6,7,8,9,10]-=- have elaborated on the concepts of redundancy and relevance for feature selection. [11,4,9] presented feature selection in a framework they call min-redundancy max-relevance (here short mRmR) that in... |

130 |
A review of feature selection techniques in bioinformatics. Bioinformatics 2007;23:2507–17
- Saeys, Inza, et al.
(Show Context)
Citation Context ...ke into account relationships between features, or are wrapper approaches which require high computational costs. Multivariate filter-based feature selection has enjoyed increased popularity recently =-=[2]-=-. The approach is generally low on computational costs. Filter–based techniques provide a clear picture of why a certain feature subset is chosen through the use of scoring methods in which inherent c... |

117 | Novelty and redundancy detection in adaptive filtering
- Zhang, Callan, et al.
- 2002
(Show Context)
Citation Context ... selection, it is important to choose features that are relevant for prediction, but at the same time it is important to have a set of features which is not redundant in order to increase robustness. =-=[3,4,5,6,7,8,9,10]-=- have elaborated on the concepts of redundancy and relevance for feature selection. [11,4,9] presented feature selection in a framework they call min-redundancy max-relevance (here short mRmR) that in... |

115 | Efficient feature selection via analysis of relevance and redundancy
- Yu
- 2004
(Show Context)
Citation Context |

107 |
Rank transformations as a bridge between parametric and nonparametric statistics
- Conover, Iman
- 1981
(Show Context)
Citation Context ...sibility to extreme values, their focus on strictly linear relationships, and the assumption of gaussianity. For non-gaussian data rank–correlations should be preferred over Pearson correlations (see =-=[15]-=- on rank correlations and [16] as one of many recommendations to use Spearman correlations instead). We show how a measure of probability difference, similar to one presented before as the “value diff... |

26 |
S (2008) A new rank correlation coefficient for information retrieval
- Yilmaz, Aslam, et al.
(Show Context)
Citation Context ...correlation coefficient (see [30]), another much used rank correlation, to have similar problems in dealing with distributions. Other correlation measures could bring an improvement, such as possibly =-=[31]-=-. RVDM and RFC performed very good as unitary filters. Integration of SU makes performance degrade in many cases with a given redundancy measure when compared to other relevance measures. RCC is a bad... |

25 |
E.H.: The Laplacian pyramid as a compact image code
- Burt, Adelson
- 1983
(Show Context)
Citation Context ...of the data of size n/10 and tested 10 random selections of features in 5-fold cross-validation. The complete feature set consisted of 127 features. We included 10 features from the Laplacian Pyramid =-=[23]-=-, 100 Gabor features [24] in 10 orientations and 10 scales, 9 features from luminance contrast [25], 7 features from texture contrast [26], and intensity.We added 50 useless variables (probes) which g... |

25 | Natural scene statistics at the center of gaze
- Reinagel, Zador
- 1999
(Show Context)
Citation Context ...e complete feature set consisted of 127 features. We included 10 features from the Laplacian Pyramid [23], 100 Gabor features [24] in 10 orientations and 10 scales, 9 features from luminance contrast =-=[25]-=-, 7 features from texture contrast [26], and intensity.We added 50 useless variables (probes) which good feature selection methods should eliminate. 49 of these probes were random variables. 25 of tho... |

20 |
An Entropy-based gene selection method for cancer classification using microarray data
- Liu, Krishnan, et al.
- 2005
(Show Context)
Citation Context ...Of these, symmetric uncertainty was used before as a relevance criterion [5,7]. Symmetric uncertainty is symmetric and scaled mutual information [12]. Mutual information was used by [4,9] and by [3]. =-=[13]-=- used normalized mutual information for gene selection. As for Spearman correlations, we did not find a prior publication that refers to it as a relevance criterion, but we thought it might be better ... |

18 | H.: “Automatic recognition and annotation of gene expression patterns of fly embryos.” Bioinformatics 23(5), pp 589–596
- Zhou, Peng
- 2007
(Show Context)
Citation Context ...it is important to have a set of features which is not redundant in order to increase robustness. [3,4,5,6,7,8,9,10] have elaborated on the concepts of redundancy and relevance for feature selection. =-=[11,4,9]-=- presented feature selection in a framework they call min-redundancy max-relevance (here short mRmR) that integrates relevance and redundancy information of each variable into a single scoring mechani... |

16 | Feature selection for highdimensional data: A kolmogorovsmirnov correlationbased filter
- Biesiada, Duch, et al.
- 2005
(Show Context)
Citation Context ... distinct from each other. We define VDM relevance (to which we will refer to short as VDM) of a feature X and labels Y with two classes c1 and c2 as: VDM(X,Y ) = 1 2 ∫ |p(X = x|c1) − p(X = x|c2)| dx =-=(5)-=- Fit Criterion For a given point x a criterion of fit to one distribution X1 could be defined as the points distance to the center of the distribution X1 in terms of the variance of the distribution v... |

16 | Differences of monkey and human overt attention under natural conditions. Vision Research
- Einhäuser, Kruse, et al.
- 2006
(Show Context)
Citation Context ... features. We included 10 features from the Laplacian Pyramid [23], 100 Gabor features [24] in 10 orientations and 10 scales, 9 features from luminance contrast [25], 7 features from texture contrast =-=[26]-=-, and intensity.We added 50 useless variables (probes) which good feature selection methods should eliminate. 49 of these probes were random variables. 25 of those standard normal distributed, 24 unif... |

14 | Edges are not just steps
- Kovesi
- 2002
(Show Context)
Citation Context ...and tested 10 random selections of features in 5-fold cross-validation. The complete feature set consisted of 127 features. We included 10 features from the Laplacian Pyramid [23], 100 Gabor features =-=[24]-=- in 10 orientations and 10 scales, 9 features from luminance contrast [25], 7 features from texture contrast [26], and intensity.We added 50 useless variables (probes) which good feature selection met... |

11 | Implicit Feature Selection with the Value Dierence Metric
- Payne, Edwards
- 1998
(Show Context)
Citation Context ...ntinuous, monotonic function that measures overlap between two variables X1 and X2: (∫ |p(X1 = x) − p(X2 = x)| q ) 1/q dx , where q is a parameter (2) We chose q = 1, which has been used similarly by =-=[17,18,19]-=- under the name value difference metric as distance measure. Given that the probabilities that X is equal to a given x for all possible values of x is 1, ∫ p(x)dx = 1, total divergence would give the ... |

10 | Feature selection using improved mutual information for text classification
- NOVOVIČOVÁ, MALÍK, et al.
(Show Context)
Citation Context |

7 | Automated Texture Analysis with Gabor filter
- Vyas, Rege
- 2006
(Show Context)
Citation Context ...visualization of temporal sequences of bioimplant Micro-CT images“ (MAT-2005-07244-C03-03). ⋆⋆ Corresponding author.2 Auffarth et al. something what can be very generally referred to as texture (see =-=[1]-=-). The use of an adequate feature set is a requirement to achieve good classification results. Feature selection generally means considering subsets of features and eventually choosing the best of the... |

3 |
In: SVM-RFE with Relevancy and Redundancy Criteria for Gene Selection. Volume 4774
- Mundra, Rajapakse
- 2007
(Show Context)
Citation Context |

2 |
Selecting relevant and non-relevant features in microarray classification applications
- Knijnenburg
- 2004
(Show Context)
Citation Context |

2 | Hopfield Networks in Relevance and Redundancy Feature Selection Applied to Classification
- Auffarth, Lopez, et al.
- 2008
(Show Context)
Citation Context ... choosing the first s. This schemes, presented by [9,4] were called minimum redundancy maximum relevance quotient (mRmRQ) and minimum redundancy maximum relevance difference (mRmRD), respectively. In =-=[21]-=- we presented a selection scheme based on an attractor network, which was thought to be capable of integrating more complex redundancy interactions between features (henceforth called Hopfield) and wa... |

2 |
Classification of biomedical high-resolution micro-ct images for direct volume rendering
- Auffarth
- 2007
(Show Context)
Citation Context ...eriments and comparisons following in this section are therefore based on a set of 177 features and their respective relevance measures and mutual redundancies. Details on the methods can be found in =-=[27]-=-. 4 Results Within the scope of this article we focus on these questions: 1. What are the best measures of relevance and redundancy (RR)? (a) What is the best redundancy and relevance (RR) combination... |

1 |
R.: Statistical Evaluation of Method-Comparison Data
- Wu, Twomey, et al.
- 1975
(Show Context)
Citation Context ...eir focus on strictly linear relationships, and the assumption of gaussianity. For non-gaussian data rank–correlations should be preferred over Pearson correlations (see [15] on rank correlations and =-=[16]-=- as one of many recommendations to use Spearman correlations instead). We show how a measure of probability difference, similar to one presented before as the “value difference metric“ [17], can be ad... |

1 |
The Kendall Rank Correlation Coefficient, edited by NJ Salkind. Encyclopedia of Measurement and Statistics
- Abdi
- 2007
(Show Context)
Citation Context ...own by unitary filters, then all 28 combinations of mentioned relevance and redundancy measures with different selection schemes and random selection. We selected feature sets of different sizes (S = =-=[4,8,12,16,20,30,45,60,80,100]-=- 6 ). We compared five basic feature selection schemes. In the simplest selection scheme, at each iteration we take the most relevant feature and discard all features for which redundancy with the new... |