## An extension on ―statistical comparisons of classifiers over multiple data sets‖ for all pairwise comparisons

Venue: | Journal of Machine Learning Research |

Citations: | 75 - 17 self |

### BibTeX

@ARTICLE{García_anextension,

author = {Salvador García and Francisco Herrera and John Shawe-taylor},

title = {An extension on ―statistical comparisons of classifiers over multiple data sets‖ for all pairwise comparisons},

journal = {Journal of Machine Learning Research},

year = {},

pages = {2677--2694}

}

### OpenURL

### Abstract

In a recently published paper in JMLR, Demˇsar (2006) recommends a set of non-parametric statistical tests and procedures which can be safely used for comparing the performance of classifiers over multiple data sets. After studying the paper, we realize that the paper correctly introduces the basic procedures and some of the most advanced ones when comparing a control method. However, it does not deal with some advanced topics in depth. Regarding these topics, we focus on more powerful proposals of statistical procedures for comparing n×n classifiers. Moreover, we illustrate an easy way of obtaining adjusted and comparable p-values in multiple comparison procedures.

### Citations

5220 |
C4.5: programs for machine learning
- Quinlan
(Show Context)
Citation Context ...ned above are employed. 2.2 Performing All Pairwise Comparisons: A Case Study In the following, we show an example involving the four procedures described with a comparison of five classifiers: C4.5 (=-=Quinlan, 1993-=-); One Nearest Neighbor (1-NN) with Euclidean distance, 2682AN EXTENSION ON “STATISTICAL COMPARISONS OF CLASSIFIERS OVER MULTIPLE DATA SETS” C 2 = {2, 3} E = {(23)} C 2 = {1, 3} E = {(13)} C 1 = {1, ... |

777 | The CN2 induction algorithm
- Clark, Niblett
- 1989
(Show Context)
Citation Context ...68 202 7 21 2097152 876 8 28 2.7 · 108 4139 9 36 6.7 · 10 10 21146 Table 1: All pairwise comparisons of k classifiers 2683GARCÍA AND HERRERA NaiveBayes, Kernel (McLachlan, 2004) 1 and, finally, CN2 (=-=Clark and Niblett, 1989-=-). 2 The parameters used are specified in Section 4. We have used 10-fold cross validation and standard parameters for each algorithm. The results correspond to average accuracy or 1 − class error in ... |

696 |
UCI Machine Learning Repository
- Asuncion, Newman
- 2007
(Show Context)
Citation Context ...borhood. All classifiers are available in KEEL software (Alcalá-Fdez et al., 2008). 5 For performing this study, we have compiled a sample of fifty data sets from the UCI machine learning repository (=-=Asuncion and Newman, 2007-=-), all of them valid for a classification task. 6 We measured the performance of each classifier by means of accuracy in test by using ten-fold cross validation. As Demˇsar did, when comparing two cla... |

655 | A simple sequentially rejective multiple test procedure - Holm - 1979 |

400 |
Discriminant Analysis and Statistical Pattern Recognition
- McLachlan
- 1992
(Show Context)
Citation Context ... 2m ne 4 6 64 14 5 10 1024 51 6 15 32768 202 7 21 2097152 876 8 28 2.7 · 108 4139 9 36 6.7 · 10 10 21146 Table 1: All pairwise comparisons of k classifiers 2683GARCÍA AND HERRERA NaiveBayes, Kernel (=-=McLachlan, 2004-=-) 1 and, finally, CN2 (Clark and Niblett, 1989). 2 The parameters used are specified in Section 4. We have used 10-fold cross validation and standard parameters for each algorithm. The results corresp... |

226 | A sharper Bonferroni procedure for multiple tests of significance. Biometrika - Hochberg - 1988 |

197 | Statistical comparisons of classifiers over multiple data sets - Demˇsar |

175 |
Resampling-Based Multiple Testing: Examples and Methods for P-value Adjustment
- Westfall, Young
- 1993
(Show Context)
Citation Context ...lm (1979), Hochberg (1988), Hommel (1988) and the ones described in this paper are usually not incorporated in statistical packages. The computation of the correct p-value, or Adjusted P-Value (APV) (=-=Westfall and Young, 2004-=-), in a comparison using any of these procedures is not very difficult and, in this paper, we show how to include it with an illustrative example. The paper is set up as follows. Section 2 presents mo... |

150 |
The Use of Ranks to Avoid the Assumption of Normality Implicit in the Analysis of Variance
- Friedman
- 1937
(Show Context)
Citation Context ...GARCÍA AND HERRERA ensembles of decision trees, non-parametric tests are also applied in the analysis of performance (Banfield et al., 2007). However, only the rankings computed by Friedman’s method (=-=Friedman, 1937-=-) are stipulated and authors establish comparisons based on them, without taking into account significance levels. Demˇsar focused his work in the analysis of new proposals, and he introduced the Neme... |

121 |
An improved Bonferroni procedure for multiple tests of significance
- Simes
- 1986
(Show Context)
Citation Context ...hberg and Rom (1995), several extensions were given in this way. Furthermore, a small improvement of power in the Bergmann-Hommel procedure described here can be achieved when using Simes conjecture (=-=Simes, 1986-=-) in the obtaining of A set (see Hommel and Bernhard, 1999, for more details). 3. Adjusted P-Values The smallest level of significance that results in the rejection of the null hypothesis, the p-value... |

62 | KEEL: A software tool to assess evolutionary algorithms to data mining problems
- Alcalá-Fdez, Sánchez, et al.
- 2009
(Show Context)
Citation Context ...examples to cover and Kernel classifier with sigmaKernel = 0.01, which is the inverse value of the variance that represents the radius of neighborhood. All classifiers are available in KEEL software (=-=Alcalá-Fdez et al., 2008-=-). 5 For performing this study, we have compiled a sample of fifty data sets from the UCI machine learning repository (Asuncion and Newman, 2007), all of them valid for a classification task. 6 We mea... |

59 |
Multiple Hypothesis Testing
- Dudoit, Shaffer, et al.
- 2003
(Show Context)
Citation Context ..., we can find a discussion about the power of Hochberg’s and Hommel’s procedures with respect to Holm’s one. They reject more hypothesis than Holm’s, but the differences are in practice rather small (=-=Shaffer, 1995-=-). The most powerful procedures detailed in this paper, Shaffer’s and Bergmann-Hommel’s, work following the same method of Holm’s procedure, so it is possible to hybridize them with other types of ste... |

49 | A stagewise rejective multiple test procedure on a modified Boneferroni test - Hommel - 1988 |

46 | Multi-interval discretization of continuous valued attributes for classification learning - Fayyad, Irani - 1993 |

45 |
Modified Sequentially Rejective Multiple Test Procedures
- Shaffer
- 1986
(Show Context)
Citation Context ... among these three. 2679GARCÍA AND HERRERA Based on this argument, Shaffer proposed two procedures which make use of the logical relation among the family of hypotheses for adjusting the value of α (=-=Shaffer, 1986-=-). • Shaffer’s static procedure: following Holm’s step down method, at stage j, instead of rejecting Hi if pi ≤ α/(m−i+1), reject Hi if pi ≤ α/ti, where ti is the maximum number of hypotheses which ca... |

39 |
Distribution-free multiple comparisons
- Nemenyi
- 1963
(Show Context)
Citation Context ...sons based on them, without taking into account significance levels. Demˇsar focused his work in the analysis of new proposals, and he introduced the Nemenyi test for making all pairwise comparisons (=-=Nemenyi, 1963-=-). Nevertheless, the Nemenyi test is very conservative and it may not find any difference in most of the experimentations. In recent papers, the authors have used the Nemenyi test in multiple comparis... |

29 | Machine learning methods for predicting failures in hard drives: A multiple-instance application - Murray, Hughes, et al. |

26 | Adjusted p-values for simultaneous inference - Wright - 1992 |

22 | A sequentially rejective test procedure based on a modified Bonferroni inequality - ROM - 1990 |

21 | A comparison of decision tree ensemble creation techniques - Banfield, Hall, et al. |

10 | Anytime learning of decision trees - Esmeir, Markovitch, et al. - 2007 |

10 | Classifying under computational resource constraints: anytime classification using probabilistic estimators - Yang, Webb, et al. |

9 | Improvements of general multiple test procedures for redundant systems of hypotheses. Multiple Hypotheses Testing - Bergmann, Hommel - 1988 |

8 | Infinitely imbalanced logistic regression - OWEN |

6 |
Extensions of multiple testing procedures based on Simes’ test
- Hochberg, Rom
- 1995
(Show Context)
Citation Context ...procedures, such as Hochberg’s, Hommel’s and Rom’s methods. When we apply these methods by using the logical relationships among hypothesis in a static way, they do not control the family-wise error (=-=Hochberg and Rom, 1995-=-). In opposite, when applying these methods by detecting dynamical relationships, they control the family-wise error. In Hochberg and Rom (1995), several extensions were given in this way. Furthermore... |

6 | Learning in environments with unknown dynamics: Towards more robust concept learners - Núñez, Fidalgo, et al. - 2007 |

4 | A rapid algorithm and a computer program for multiple test procedures using procedures using logical structures - Hommel, Bernhard - 1994 |

4 | Maximizing the area under the ROC curve by pairwise feature combination - Marrocco, Duin, et al. - 1974 |

1 |
Immune network based ensembles. Neurocomputing
- Fyfe
- 2007
(Show Context)
Citation Context ...ons of experiments with randomly selected data sets. On the other hand, we can see other works in which the p-value associated to a comparison between two classifiers is reported (García-Pedrajas and =-=Fyfe, 2007-=-). Classical non-parametric tests, such as Wilcoxon and Friedman (Sheskin, 2003), may be incorporated in most of the statistical packages (SPSS, SAS, R, etc.) and the computation of the final p-value ... |

1 |
Bonferroni procedures for logically related hypotheses
- Hommel, Bernhard
- 1999
(Show Context)
Citation Context ...e given in this way. Furthermore, a small improvement of power in the Bergmann-Hommel procedure described here can be achieved when using Simes conjecture (Simes, 1986) in the obtaining of A set (see =-=Hommel and Bernhard, 1999-=-, for more details). 3. Adjusted P-Values The smallest level of significance that results in the rejection of the null hypothesis, the p-value, is a useful and interesting datum for many consumers of ... |