## Model Selection via the AUC (2004)

Venue: | IN PROCEEDINGS OF THE 21ST INTERNATIONAL CONFERENCE ON MACHINE LEARNING |

Citations: | 17 - 0 self |

### BibTeX

@INPROCEEDINGS{Rosset04modelselection,

author = {Saharon Rosset},

title = {Model Selection via the AUC},

booktitle = {IN PROCEEDINGS OF THE 21ST INTERNATIONAL CONFERENCE ON MACHINE LEARNING},

year = {2004},

publisher = {Morgan Kaufmann}

}

### OpenURL

### Abstract

We present a statistical analysis of the AUC as an evaluation criterion for classification scoring models. First, we consider significance tests for the dierence between AUC scores of two algorithms on the same test set. We derive exact moments under simplifying assumptions and use them to examine approximate practical methods from the literature. We then compare AUC to empirical misclassification error when the prediction goal is to minimize future error rate. We show that the AUC may be preferable to empirical error even in this case and discuss the tradeoff between approximation error and estimation error underlying this phenomenon.

### Citations

2953 |
UCI Repository of machine learning databases
- Blake, Merz
- 1998
(Show Context)
Citation Context ...in using AUC to select classication models, which we discuss in section 3.2 Finally, we performed experiments on a real-life data set. We used the "Adult" data-set available from the UCI rep=-=ository (Blake & Merz, 199-=-8). We used only thesrst ten variables in this data-set, to make a largescale experiment feasible, and compared performance of Naive Bayes models using dierent subsets of these ten predictors. We had ... |

572 |
The meaning and use of the area under a receiver operating characteristic (ROC) curve
- Hanley, McNeil
- 1982
(Show Context)
Citation Context ...nd estimation error underlying this phenomenon. 1. Introduction: ROC Analysis and the AUC The term Receiver Operating Curve (ROC) has long been used in the signal processing (Egan, 1975) and medical (=-=Hanley & McNeil, 19-=-82) literature to describe a curve displaying the relationship between sensitivity and 1-specicity at all possible thresholds for a 2-class classication scoring model, when applied to independent (tes... |

244 |
Nonparametric Statistical Methods Based on Ranks
- Lehmann
(Show Context)
Citation Context ...s equivalent to the Mann-Whitney 2-sample statistic. A well known derivation exists for the moments of this statistic under the "alternative" that the two classes do not follow the same dist=-=ribution (Lehmann, 1975-=-). The mean of the AUC is p 1 and the variance is: p 1 (1 p 1 ) + (n+ 1)(p 2 p 2 1 ) + (n 1)(p 3 p 2 1 ) n+n with p 1 ; p 2 ; p 3 now representing various probabilities which depend on the probability... |

238 |
Signal detection theory and ROC analysis
- Egan
- 1975
(Show Context)
Citation Context ...een approximation error and estimation error underlying this phenomenon. 1. Introduction: ROC Analysis and the AUC The term Receiver Operating Curve (ROC) has long been used in the signal processing (=-=Egan, 19-=-75) and medical (Hanley & McNeil, 1982) literature to describe a curve displaying the relationship between sensitivity and 1-specicity at all possible thresholds for a 2-class classication scoring mod... |

118 |
B: A method of comparing the areas under receiver operating characteristic curves derived from the same cases. Radiology
- Hanley, McNeil
- 1983
(Show Context)
Citation Context ...wever, they can be used to test the performance of practical estimators for these moments in simulations, where the underlying structure is known. We then introduce two approximate signicance tests (H=-=anley & McNeil, 1983-=-; DeLong et al., 1988), developed in the 1980's, in the context of medical experiments. We examine the usefulness of these approximate tests by comparing them to the exact theoretic derivation for som... |

113 |
Comparing the Areas Under Two or More Correlated Receiver Operating Characteristic Curves: A Non-Parametric Approach
- DeLong, DeLong, et al.
- 1988
(Show Context)
Citation Context ... to test the performance of practical estimators for these moments in simulations, where the underlying structure is known. We then introduce two approximate signicance tests (Hanley & McNeil, 1983; D=-=eLong et al., 1988-=-), developed in the 1980's, in the context of medical experiments. We examine the usefulness of these approximate tests by comparing them to the exact theoretic derivation for some synthetic examples.... |

64 | Simonoff “Tree Induction vs. Logistic Regression: A Learning-Curve Analysis - Perlich, Provost, et al. |

48 | AUC: A Statistically Consistent and More Discriminating Measure than Accuracy
- Ling, Huang, et al.
- 2003
(Show Context)
Citation Context ...statistical properties have been investigated in some recent papers: (Provost & Fawcett, 1997) illustrate the "robustness" of ROC analysis, and the AUC in particular, against changing class =-=balance; (Ling et al., 200-=-3) dene a rigorous discrimination measure, under which the AUC is provably superior to the empirical misclassi cation rate as an evaluation measure, in that it is less prone to ties when evaluating no... |

47 | Improving accuracy and cost of two-class and multi-class probabilistic classifiers using roc curves
- Lachiche, Flach
- 2003
(Show Context)
Citation Context ... + x2 > 1g Twenty training sets of size 1000 each were drawn. For each training set, 100 test sets of size 100 each were drawn. As Naive Bayes is based on using discrete x1 Some recent papers, (e.g. (=-=Lachiche & Flasch, 2003-=-)) suggest that 0:5 may not be the optimal threshold for Naive Bayes models, even in balanced class situations. However, our interest is purely in comparing test set and population performance. So, ev... |

10 |
Analysis and visualization of classi performance: comparison under imprecise class and cost distributions
- Provost, Fawcett
- 1997
(Show Context)
Citation Context ..., there has been a surge of interest in the AUC as an evaluation measure in the Data Mining and Machine Learning communities. Its statistical properties have been investigated in some recent papers: (=-=Provost & Fawcett, 1997) illustr-=-ate the "robustness" of ROC analysis, and the AUC in particular, against changing class balance; (Ling et al., 2003) dene a rigorous discrimination measure, under which the AUC is provably s... |

2 |
AUC optimization vs. error rate minimization. NIPS-03
- Cortes, Mohri
- 2003
(Show Context)
Citation Context ...crimination measure, under which the AUC is provably superior to the empirical misclassi cation rate as an evaluation measure, in that it is less prone to ties when evaluating non-equivalent models; (=-=Cortes & Mohri, 2003-=-) investigate the relationship between error rate minimization and AUC maximization by analyzing the range of possible AUCs when the error rate issxed. They also discuss algorithms that directly maxim... |