## The use of the area under the ROC curve in the evaluation of machine learning algorithms (1997)

Venue: | Pattern Recognition |

Citations: | 435 - 0 self |

### BibTeX

@ARTICLE{Bradley97theuse,

author = {Andrew P. Bradley},

title = {The use of the area under the ROC curve in the evaluation of machine learning algorithms},

journal = {Pattern Recognition},

year = {1997},

volume = {30},

pages = {1145--1159}

}

### Years of Citing Articles

### OpenURL

### Abstract

Abstract--In this paper we investigate the use of the area under the receiver operating characteristic (ROC) curve (AUC) as a performance measure for machine learning algorithms. As a case study we evaluate six machine learning algorithms (C4.5, Multiscale Classifier, Perceptron, Multi-layer Perceptron, k-Nearest Neighbours, and a Quadratic Discriminant Function) on six "real world " medical diagnostics data sets. We compare and discuss the use of AUC to the more conventional overall accuracy and find that AUC exhibits a number of desirable properties when compared to overall accuracy: increased sensitivity in Analysis of Variance (ANOVA) tests; a standard error that decreased as both AUC and the number of test samples increased; decision threshold independent; and it is invariant to a priori class probabilities. The paper concludes with the recommendation that AUC be used in preference to overall accuracy for "single number " evaluation of machine

### Citations

4934 | C4.5: Programs for Machine Learning - Quinlan - 1993 |

3909 |
Classification and Regression Trees
- Breiman, Friedman, et al.
- 1984
(Show Context)
Citation Context ...ed and has been shown to provide an adequate and accurate estimate of 4Particularly for the Multi-layer Perceptron. The use of the area under the ROC curve in the evaluation 1149 the true error rate. =-=(27)-=- The cross-validation sampling technique used was random but ensured that the approximate proportions of examples of each class remain 90% in the training set and 10% in the test set. This slight adju... |

2723 | Learning internal representations by error propagation - Rumelhart, Hinton, et al. - 1986 |

2649 |
Introduction to Statistical Pattern Recognition”, 2nd edition
- Fukunaga
- 1990
(Show Context)
Citation Context ..., are then applied to these distances, dj, so as to weight the decision function and minimise the Bayes risk of misclassification. For these experiments misclassification costs were used in the range =-=[0,1]-=- in steps of 1/14. k-Nearest Neighbours. For each test example, the five nearest neighbours (calculated in terms of the sum of the squared difference of each input attribute) in the training set are c... |

865 | Signal detection theory and psychophysics - Green, Swets - 1966 |

805 |
Bootstrap methods: another look at the jackknife
- Efron
- 1979
(Show Context)
Citation Context ...gration to calculate AUC. The AUC was calculated for each learning algorithm on each of the 10 test partitions. This is in effect using a jackknife estimate to calculate the standard error of the AUC =-=(29)-=- and will be discussed in more detail shortly. Remark. It should be noted that there are two distinct possibilities when it comes to combining the ROC curves from the different test partitions, ~3°) 1... |

645 |
Pattern recognition: A statistical approach
- Devijver, Kittler
- 1982
(Show Context)
Citation Context ... blood loss to make the data set consistent with the others used in this paper, lit is also recommended for methods such as k nearest 2They were not highly correlated to the other features neighbours.=-=(16)-=- selected.s1148 A. E BRADLEY and as part of this preliminary study, this simplistic model was thought to be sufficient. However, most of the classification algorithms detailed in Section 4 have been u... |

539 |
The meaning and use of the area under a receiver operating characteristic (ROC) curve
- Hanley, McNeil
- 1982
(Show Context)
Citation Context ...rror. However, in general it is not misclassification rate we want to minimise, but rather misclassification cost. Misclassification cost is normally defined as follows: Cost = Fp . CFp -[- F. • CFn. =-=(6)-=- Unfortunately, we rarely know what the individual misclassification costs actually are (here, the cost of a false positive, Cv~ and the cost of a false negative, Cv.) and so system performance is oft... |

236 | Pattern Recognition Principles - Tou, Gonzalez - 1974 |

132 | Multisurface method of pattern separation for medical diagnosis applied to breast cytology - Wolberg, Mangasarian - 1990 |

120 |
Probability and Statistics for Engineers and Scientists
- Walpole, Myers
- 1993
(Show Context)
Citation Context ...each test example, the five nearest neighbours (calculated in terms of the sum of the squared difference of each input attribute) in the training set are calculated. Then, if greater than L, where L--=-=[0, 1, 2, 3, 4, 5]-=-, if the nearest neighbours are of class 1, the test sample is assigned to class 1; if not, it is assigned to class 0. Release 5 of the C4.5 decision tree generator (2°~ was used with the following mo... |

113 | Improving the learning speed of 2-layer neural networks by choosing initial values of the adaptive weights
- Nguyen, Widrow
- 1990
(Show Context)
Citation Context ...1]. All three networks were trained using back-propagation with a learning rate of 0.01, and a momentum of 0.2. Initial values for the weights in the networks were set using the Nguyen-Widrow method, =-=(25)-=- and the networks were trained for 20,000 epochs. Again, during the testing phase the output neuron was thresholded at values [0, 0.1, 0.2, 0.3 ..... 1.0] to simulate different misclassification costs... |

108 |
A method of comparing the areas under receiver operating characteristic curves derived from the same cases. Radiology
- Hanley, McNeil
- 1983
(Show Context)
Citation Context ...n the ROC curve [P(Fp) = c~, P(Tp) = 1 -/3] have been obtained the simplest way to calculate the area under the ROC curve is to use trapezoidal integration, AUC=~. {(1-/3i.Ac~)+~[A(1-/~).Ac~]}, where =-=(7)-=- ~(1-~) : (1-/3i)-(l-9i 1), (8) As = c~i - c~i-l. (9) It is also possible to calculate the AUC by assuming that the underlying probabilities of predicting negative or positive are Gaussian. The ROC cu... |

85 |
Using the ADAP learning algorithm to forecast the onset of diabetes mellitus
- Smith, Everhart, et al.
- 1988
(Show Context)
Citation Context ...Q2-02) cpc. The use of the area under the ROC curve in the evaluation 1147 (10) where, C. and Cp are the number of negative and positive examples respectively and 0 Q1 - (2- 0~' (11) 202 02 - (1 + 0) =-=(12)-=- In this paper we shall calculate AUC using trapezoidal integration and estimate the standard deviation, SD(0), using both SE(W) and cross-validation, details of which are given in Sections 5 and 6. N... |

68 |
An overview of predictive learning and function approximation. In: From Statistics to Neural Networks: Theory and. Pattern recognition Applications
- Friedman
- 1995
(Show Context)
Citation Context ...Tp) = 1 -/3] have been obtained the simplest way to calculate the area under the ROC curve is to use trapezoidal integration, AUC=~. {(1-/3i.Ac~)+~[A(1-/~).Ac~]}, where (7) ~(1-~) : (1-/3i)-(l-9i 1), =-=(8)-=- As = c~i - c~i-l. (9) It is also possible to calculate the AUC by assuming that the underlying probabilities of predicting negative or positive are Gaussian. The ROC curve will then have an exponenti... |

65 |
Decision, Estimation and Classification: an introduction into pattern recognition and related topics
- Therrien
- 1989
(Show Context)
Citation Context ...each test example, the five nearest neighbours (calculated in terms of the sum of the squared difference of each input attribute) in the training set are calculated. Then, if greater than L, where L--=-=[0, 1, 2, 3, 4, 5]-=-, if the nearest neighbours are of class 1, the test sample is assigned to class 1; if not, it is assigned to class 0. Release 5 of the C4.5 decision tree generator (2°~ was used with the following mo... |

50 | International application of a new probability algorithm for the diagnosis of coronary artery disease - Detrano, Janosi, et al. - 1989 |

28 |
ROC analysis applied to the evaluation of medical imaging techniques
- Swets
- 1979
(Show Context)
Citation Context ...each test example, the five nearest neighbours (calculated in terms of the sum of the squared difference of each input attribute) in the training set are calculated. Then, if greater than L, where L--=-=[0, 1, 2, 3, 4, 5]-=-, if the nearest neighbours are of class 1, the test sample is assigned to class 1; if not, it is assigned to class 0. Release 5 of the C4.5 decision tree generator (2°~ was used with the following mo... |

28 | Learning Internal Representations by Error - Rumelhart, Hinton, et al. - 1986 |

22 | Computer systems that learn: classi - cation and prediction methods from statistics, neural nets, machine learning, and expert systems - Weiss, Kulikowski - 1991 |

16 |
Maximum likelihood estimation of parameters of signal detection theory and determination of confidence intervals—rating method data
- DD, Alf
- 2005
(Show Context)
Citation Context |

13 |
Classification of cervical cell nuclei using morphological segmentation and textural feature extraction
- Walker, Jackway, et al.
- 1994
(Show Context)
Citation Context ... obtained the simplest way to calculate the area under the ROC curve is to use trapezoidal integration, AUC=~. {(1-/3i.Ac~)+~[A(1-/~).Ac~]}, where (7) ~(1-~) : (1-/3i)-(l-9i 1), (8) As = c~i - c~i-l. =-=(9)-=- It is also possible to calculate the AUC by assuming that the underlying probabilities of predicting negative or positive are Gaussian. The ROC curve will then have an exponential form and can be fit... |

9 | Over tting avoidance as bias - er, C - 1993 |

8 |
The multiscale classifier
- Lovell, Bradley
- 1996
(Show Context)
Citation Context ... xn). In the same situation, given one normal example and one positive example, 6 a classifier with decision threshold t will get both examples correct with a probability, P(C) - P(xp > t)P(x,, < t). =-=(15)-=- P(C) is dependent on the location of the decision threshold t and is therefore not a general measure of classifier performance. 9.3.2. The standard error of AUC. The AUC, 0, is an excellent way to me... |

8 |
Introduction to computational learning and statistical prediction
- Friedman
- 1995
(Show Context)
Citation Context ...nd 10% in the test set. This slight adjustment to maintain the prevalence of each class does not bias the error estimates and is supported in the research literature. (26) As pointed out by Friedman, =-=(28)-=- no classification method is universally better than any other, each method having a class of target functions for which it is best suited. These experiments then, are an attempt to investigate which ... |

8 | Discrimination and Classi cation - Hand - 1992 |

6 |
Computer Systems That Learn: Classification and
- Weiss, Kulikowski
- 1991
(Show Context)
Citation Context ... used in this experiment to minimise any estimation bias. A leave-one-out classification scheme was thought computationally too expensive 4 and so, in accordance with the recommendations in reference =-=(26)-=-, 10-fold crossvalidation was used on all of the data sets. For consistency, exactly the same data were used to train and test all of the nine classification schemes, this is often called a paired exp... |

5 | A Toolbox for the Analysis and Visualisation of Sensor Data in Supervision," Universidade Nova de Lisboa, Intelligent Robots Group Technical report - Rauber, Barata, et al. - 1993 |

4 | Decision Estimation and Classi cation� An Introduction to Pattern Recognition and Related Topics - Therrien - 1989 |

2 | Pickets, Evaluation of Diagnostic Systems: Methods from Signal Detection Theory - Swets, M - 1982 |

2 | Longsta , \Classi cation of Cervical Cell Nuclei Using Morphological Segmentation and Textural Feature Extraction - Walker, Lovell, et al. - 1994 |

2 | The Multiscale Classi er - Lovell, Bradley - 1996 |

2 | Power Curves for Pattern Classi cation Networks - Twomey, Smith - 1993 |

1 |
Relationship of Platelet Aggregation to Bleeding after Cardiopulmonary Bypass," Annals Thoractic Surgery 57
- Ray, Just, et al.
- 1994
(Show Context)
Citation Context ... Gaussian based ROC curve (as in the ML method). The standard error, SE(W), is given by SE(W) = ~O(1-O)+(Cp-1)(QI-02)+(C,-I)(Q2-02) cpc. The use of the area under the ROC curve in the evaluation 1147 =-=(10)-=- where, C. and Cp are the number of negative and positive examples respectively and 0 Q1 - (2- 0~' (11) 202 02 - (1 + 0) (12) In this paper we shall calculate AUC using trapezoidal integration and est... |

1 |
International application of a new probability algorithm for the diagnosis of coronary artery disease
- Schmid, Sandu, et al.
- 1989
(Show Context)
Citation Context ... P(wj), mean, mj, and covariance, Cj of the two class distributions. The Bayes decision function for class wj of an example x is then given by dj(x) = lnP(wj) -~ 1 In Icjl -~ 1 [(x--mj)Tcj l(x--mj)]. =-=(13)-=- This decision function is then a hyper-quadric, the class of an example being selected as the minimum distance class. Misclassification costs, cj, are then applied to these distances, dj, so as to we... |

1 |
Models o1" incremental conceot formation, Artif lntell
- Gennari, Langley, et al.
- 1989
(Show Context)
Citation Context ...s. For the difference between two subsets of means to be significant it must exceed a certain value. This value is called the least significant range for the p means, R m and is given by Rp rpX/s2/r, =-=(14)-=- where the sample variance, s 2, is estimated from the error mean square from the analysis of variance, s 2 3' r the number of observations (rows), and rp the least significant studentized-range for a... |

1 | A Toolbox for the Analysis and Visualisation - Rauber, Barata, et al. - 1993 |

1 |
l)iscrimination andClassi[~cation
- Hand
- 1981
(Show Context)
Citation Context ..., 37.5% heart disease present. 4. THE LEARNING ALGORITHMS The learning algorithms chosen for this experimental comparison were: • Quadratic Discriminant Function 08) (Bayes); 3 • k-Nearest Neighbours =-=(19)-=- (KNN); • C4.5 ~2°) (C4.5); • Multiscale Classifier ~15) (MSC); • Perceptron f2~) (PTRON); • and Multi-layer Perceptron <22) (MLP). We chose a cross-section of popular machine learning techniques toge... |

1 | The use of the area under the ROC curve in the evaluation 1159 - Minsky, Papert, et al. - 1969 |

1 | Principles qfNeurodynamies - Rosenblatt - 1961 |

1 |
Power curves lot pattern classification networks
- Twomey, Smith
- 1993
(Show Context)
Citation Context ...ion function, scaled to give an output in the range [0,1 ]. The output of this linear neuron was then thresholded at values [0, 0.1, 0.2, 0.3 ..... 1.0] to simulate different misclassification costs. =-=(24)-=- The Multi-layer Perceptron. Three network architectures were implemented, each with different numbers of hidden units. Their network architecture was as follows: an input layer consisting of a number... |

1 | Fundamental Concepts in the Design O/ E~periments - Hicks - 1993 |

1 | On the Methodology for Comparing Learning Algorithms: A Case Study - Bradley, Lovell, et al. - 1994 |

1 |
Overfitting aw)idance as bias
- Schaffer
- 1993
(Show Context)
Citation Context ...) we found that the MSC obtained a higher accuracy (76%) when no pruning was done on the tree. This is an example of a problem domain where the algorithm has been biased by the decision tree pruning. =-=(33)-=- There are three significant subgroups shown for the Breast Cancer data set in Table 4. There is a large amount of overlap in these subgroups and so no real identifiable groups seem to exist. However,... |

1 | On deflection as a performance criterion in dection, IEEE Trans. Aerospace Electronic Systems 31. 1072 1081 - PicinBono - 1995 |

1 | Models of Incremental Conceot Formation - Gennari, Langley, et al. - 1989 |

1 | ROC Curves for Classi cation Trees," Medical Decision Making 14 - Raubertas, Humiston, et al. - 1994 |

1 | On De ection as a Performance Criterion in Dection," IEEE Transactions on Aerospace and Electronic Systems 31 - PicinBono - 1995 |