## Cost curves: an improved method for visualizing classifier performance (2006)

### Cached

### Download Links

- [www.csi.uottawa.ca]
- [www.site.uottawa.ca]
- DBLP

### Other Repositories/Bibliography

Venue: | Machine Learning |

Citations: | 45 - 7 self |

### BibTeX

@INPROCEEDINGS{Drummond06costcurves:,

author = {Chris Drummond and Robert C. Holte},

title = {Cost curves: an improved method for visualizing classifier performance},

booktitle = {Machine Learning},

year = {2006},

pages = {95--130}

}

### Years of Citing Articles

### OpenURL

### Abstract

Abstract. This paper introduces cost curves, a graphical technique for visualizing the performance (error rate or expected cost) of 2-class classifiers over the full range of possible class distributions and misclassification costs. Cost curves are shown to be superior to ROC curves for visualizing classifier performance for most purposes. This is because they visually support several crucial types of performance assessment that cannot be done easily with ROC curves, such as showing confidence intervals on a classifier’s performance, and visualizing the statistical significance of the difference in performance of two classifiers. A software tool supporting all the cost curve analysis described in this paper is available from the authors.

### Citations

3927 |
Classification and Regression Trees
- Breiman
- 1984
(Show Context)
Citation Context ...by varying its threshold parameter. If such a parameter does not exist, algorithms such as decision trees can be modified to include costs producing different trees corresponding to different points (=-=Breiman et al., 1984-=-). The counts at the leaves may also be modified, thus changing the leaf’s classification, allowing a single tree to produce multiple points (Bradford cdrummond.tex; 23/11/2005; 16:54; p.56 Drummond ... |

3927 |
Pattern classification and scene analysis
- Duda, Hart
- 1973
(Show Context)
Citation Context ...ts for a given classifier (or learning algorithm) depends on the classifier. Some classifiers have parameters for which different settings produce different ROC points. For example, with naive Bayes (=-=Duda and Hart, 1973-=-; Clark and Niblett, 1989) an ROC curve is produced by varying its threshold parameter. If such a parameter does not exist, algorithms such as decision trees can be modified to include costs producing... |

2542 |
An introduction to the bootstrap
- Efron, Tibshirani
- 1993
(Show Context)
Citation Context ... as Gaussian or Student’s t. An alternative is to use computationally intense, non-parametric methods. Margineantu and Dietterich (2000) described how one such non-parametric approach, the bootstrap (=-=Efron and Tibshirani, 1993-=-), can be used to generate confidence intervals for predefined cost values. We use a similar technique, but for the complete range of class frequencies and misclassification costs. The bootstrap metho... |

979 | Fast effective rule induction
- Cohen
- 1995
(Show Context)
Citation Context ...a single dataset. By contrast, scalar measures are onedimensional leaving the second dimension free to be used creatively for comparing performance on multiple datasets (for example, see Figure 3 in (=-=Cohen, 1995-=-)). This paper has primarily been concerned to demonstrate that cost curves overcome several deficiencies of ROC curves while retaining most of their desirable properties. However, there are certain c... |

652 |
UCI repository of machine learning databases
- Newman, Hettich, et al.
- 1998
(Show Context)
Citation Context ...ason is that the distribution in the datasets used for training and evaluating the classifier may not reflect reality. For example, consider the two credit application datasets in the UCI repository (=-=Newman et al., 1998-=-). Positive examples in these datasets represent credit applications that were approved. In the Japanese credit dataset 44.5% of the examples are positive but in the German credit dataset 70% of the e... |

438 | Very simple classification rules perform well on most commonly used datasets
- Holte
- 1993
(Show Context)
Citation Context ...tives. Joining these two points by a straight line plots its overall error rate as a function of p(+). The dashed line in Figure 3(a) is the estimated error rate of the decision stump produced by 1R (=-=Holte, 1993-=-) for the Japanese credit dataset over the full range of possible p(+) values. The solid line in Figure 3(a) gives the same information for the decision tree C4.5 (Quinlan, 1993) learned from the same... |

377 | Information Retrieval - Rijsbergen - 1979 |

324 | The case against accuracy estimation for comparing induction algorithms
- Provost, Fawcett, et al.
- 1998
(Show Context)
Citation Context ...cost matrices has been the preferred measure (Bradford et al., 1998; Domingos, 1999; Margineantu and Dietterich, 2000). The shortcomings of using accuracy have been pointed out by others (Hand, 1997; =-=Provost et al., 1998-=-). The most fundamental shortcoming is the simple fact that a single, scalar performance measure cannot capture all aspects of the performance differences between two classifiers. Even when there are ... |

303 | Metacost: A general method for making classifiers cost-sensitive
- Domingos
- 1999
(Show Context)
Citation Context ...Netherlands. cdrummond.tex; 23/11/2005; 16:54; p.12 Drummond and Holte In cost-sensitive learning, expected cost under a range of cost matrices has been the preferred measure (Bradford et al., 1998; =-=Domingos, 1999-=-; Margineantu and Dietterich, 2000). The shortcomings of using accuracy have been pointed out by others (Hand, 1997; Provost et al., 1998). The most fundamental shortcoming is the simple fact that a s... |

276 |
JA: Measuring the accuracy of diagnostic systems
- Swets
- 1988
(Show Context)
Citation Context ...nalysis is that these points are samples of a continuous curve in a specific parametric family. Therefore standard curve fitting techniques can be used as means of interpolating between known points (=-=Swets, 1988-=-). In the machine learning literature it is more common to take a nonparametric approach and join the ROC points by line segments, as was done to create both ROC curves in Figure 2. 1 True Positive Ra... |

260 | Analysis and visualization of classifier performance: comparison under imprecise class and cost distributions
- Provost, Fawcett
- 1997
(Show Context)
Citation Context ...rformance for a particular operating point (Halpern et al., 1996). In advance of knowing the operating point, one can compute the upper convex hull of the ROC points defined by the system parameters (=-=Provost and Fawcett, 1997-=-). The set of points on the convex hull dominates all the other points, and therefore are the only classifiers that need be considered for any given operating point. cdrummond.tex; 23/11/2005; 16:54; ... |

188 |
Construction and Assessment of Classification Rules
- Hand
- 1997
(Show Context)
Citation Context ... a range of cost matrices has been the preferred measure (Bradford et al., 1998; Domingos, 1999; Margineantu and Dietterich, 2000). The shortcomings of using accuracy have been pointed out by others (=-=Hand, 1997-=-; Provost et al., 1998). The most fundamental shortcoming is the simple fact that a single, scalar performance measure cannot capture all aspects of the performance differences between two classifiers... |

164 | Adaptive fraud detection
- Fawcett, Provost
- 1997
(Show Context)
Citation Context ...ssifier that performs best for each operating point regardless of its training conditions or parameter settings. There are few examples of the practical application of this technique. One example is (=-=Fawcett and Provost, 1997-=-), in which the decision threshold parameter was tuned to be optimal, empirically, for the test distribution. This criterion is visualized very cdrummond.tex; 23/11/2005; 16:54; p.3536 Drummond and H... |

155 | Cost-sensitive classification: Empirical evaluation of a hybrid genetic decision tree induction algorithm
- Turney
- 1995
(Show Context)
Citation Context ...ing, where the cost matrix used to train the classifier is the same as the cost matrix used to test it (Domingos, 1999; Kukar and Kononenko, 1998; Margineantu, 2002; Pazzani et al., 1994; Ting, 2000; =-=Turney, 1995-=-; Webb, 1996). Likewise, Zadrozny et al. (2003) and Radivojac et al. (2003) adjust the training set distributions in precise accordance with the costs used in testing the resulting classifiers. We cal... |

122 | Data mining for direct marketing: Problems and solutions
- Ling, Li
(Show Context)
Citation Context ...ifying examples in one class is much different than the cost of misclassifying examples in the other class, or when one class is much rarer than the other (Japkowicz et al., 1995; Kubat et al., 1998; =-=Ling and Li, 1998-=-). A scalar measure can give the expected performance given a probability distribution over costs and class ratios, but it will not indicate for which costs and class ratios one classifier outperforms... |

112 |
Reducing misclassification costs
- Pazzani, Merz, et al.
- 1994
(Show Context)
Citation Context ...in studies of cost-sensitive learning, where the cost matrix used to train the classifier is the same as the cost matrix used to test it (Domingos, 1999; Kukar and Kononenko, 1998; Margineantu, 2002; =-=Pazzani et al., 1994-=-; Ting, 2000; Turney, 1995; Webb, 1996). Likewise, Zadrozny et al. (2003) and Radivojac et al. (2003) adjust the training set distributions in precise accordance with the costs used in testing the res... |

111 | Learning when training data are costly: The effect of class distribution on tree induction - Weiss, Provost - 2003 |

104 | Cost-Sensitive Learning by Cost-Proportionate Example Weighting - Zadrozny, Langford, et al. - 2003 |

92 | Learning decision trees using the area under the ROC curve - Ferri, Flach, et al. |

72 | A novelty detection approach to classification
- Japkowicz, Myers, et al.
- 1995
(Show Context)
Citation Context ...his failing occurs when the cost of misclassifying examples in one class is much different than the cost of misclassifying examples in the other class, or when one class is much rarer than the other (=-=Japkowicz et al., 1995-=-; Kubat et al., 1998; Ling and Li, 1998). A scalar measure can give the expected performance given a probability distribution over costs and class ratios, but it will not indicate for which costs and ... |

64 | The geometry of ROC space: understanding machine learning metrics through ROC isometrics - Flach - 2003 |

55 | The Expected Performance Curve: a New Assessment Measure for Person Authentication - Bengio, Marithoz - 2004 |

50 | Robust classification systems for imprecise environments - Provost, Fawcett |

41 | The expected performance curve - Bengio, Maréthoz, et al. - 2005 |

39 | Optimizing classifier performance via an approximation to the Wilcoxon-Mann-Whitney statistic - Yan, Dodier, et al. - 2003 |

38 | Information retrieval systems - Swets - 1963 |

33 | Note on the location of optimal classifiers in ndimensional roc space
- Srinivasan
- 1999
(Show Context)
Citation Context ...classes, although the bootstrap methods will suffer from sparsity of data if the number of classes is large. A number of researchers have looked at extensions to ROC curves for more than two classes (=-=Srinivasan, 1999-=-; Ferri et al., 2003). As the duality between the two representations, ROC and Cost curves, also holds for higher dimensions (Pottmann, 2001), we expect that these extensions can be easily applied to ... |

27 | Cost-sensitive learning with neural networks
- Kukar, Kononenko
- 1998
(Show Context)
Citation Context ...tions (Ting, 2004). This is most clearly seen in studies of cost-sensitive learning, where the cost matrix used to train the classifier is the same as the cost matrix used to test it (Domingos, 1999; =-=Kukar and Kononenko, 1998-=-; Margineantu, 2002; Pazzani et al., 1994; Ting, 2000; Turney, 1995; Webb, 1996). Likewise, Zadrozny et al. (2003) and Radivojac et al. (2003) adjust the training set distributions in precise accordan... |

26 |
Machine Learning for the Detection of Oil Spills
- Kubat, Holte, et al.
- 1998
(Show Context)
Citation Context ...the cost of misclassifying examples in one class is much different than the cost of misclassifying examples in the other class, or when one class is much rarer than the other (Japkowicz et al., 1995; =-=Kubat et al., 1998-=-; Ling and Li, 1998). A scalar measure can give the expected performance given a probability distribution over costs and class ratios, but it will not indicate for which costs and class ratios one cla... |

19 | Class probability estimation and cost-sensitive classification decisions - Margineantu - 2002 |

16 | ROC Confidence Bands: An Empirical Evaluation - Macskassy, Provost, et al. - 2005 |

16 |
Statistical approaches to the analysis of receiver operating characteristic (ROC) curves
- MCNEIL, HANLEY
- 1984
(Show Context)
Citation Context ...estion. The vast majority of the ROC literature on confidence intervals investigates confidence intervals on the ROC curve itself, constructed in either a point-wise manner (Dukic and Gatsonis, 2003; =-=McNeil and Hanley, 1984-=-; Platt et al., 2000; Tilbury et al., 2000; Zou et al., 1997) cdrummond.tex; 23/11/2005; 16:54; p.26Cost Curves 27 or as a global confidence band (Dukic and Gatsonis, 2003; Jensen et al., 2000; Ma an... |

15 |
Confidence bands for the receiver operating characteristics curves
- Ma, Hall
- 1993
(Show Context)
Citation Context ... 1984; Platt et al., 2000; Tilbury et al., 2000; Zou et al., 1997) cdrummond.tex; 23/11/2005; 16:54; p.26Cost Curves 27 or as a global confidence band (Dukic and Gatsonis, 2003; Jensen et al., 2000; =-=Ma and Hall, 1993-=-). Macskassy et al. (2005) give a good review of this work accessible to a machine learning audience. These confidence intervals provide bounds within which a classifier’s TP and FP are expected to co... |

12 | An empirical study of Metacost using boosting algorithms - TING |

10 |
The Use of the Area under the Roc Curve
- Bradley
- 1997
(Show Context)
Citation Context ...ar. The machine learning community has traditionally used error rate (or accuracy) as its default performance measure. Recently, however, area under the ROC curve (AUC) has been used in some studies (=-=Bradley, 1997-=-; Karwath and King, 2002; Weiss and Provost, 2003; Yan et al., 2003). c○ 2005 Kluwer Academic Publishers. Printed in the Netherlands. cdrummond.tex; 23/11/2005; 16:54; p.12 Drummond and Holte In cost... |

10 | King R: Homology Induction: the use of machine learning to improve sequence similarity searches - Karwath |

10 | Cost-sensitive specialization
- Webb
- 1996
(Show Context)
Citation Context ... cost matrix used to train the classifier is the same as the cost matrix used to test it (Domingos, 1999; Kukar and Kononenko, 1998; Margineantu, 2002; Pazzani et al., 1994; Ting, 2000; Turney, 1995; =-=Webb, 1996-=-). Likewise, Zadrozny et al. (2003) and Radivojac et al. (2003) adjust the training set distributions in precise accordance with the costs used in testing the resulting classifiers. We call this kind ... |

8 | A uniform convergence bound for the area under the ROC curve - Agarwal, Har-Peled, et al. - 2005 |

8 | Bootstrap confidence intervals for the sensitivity of a quantitative diagnostic test - Platt, Hanley, et al. |

8 | 2001, Robust classification for imprecise environments - PROVOST, FAWCETT |

7 | Comparing classifiers when misclassification costs are uncertain. Pattern Recognition 32:1139–1147 - Adams, Hand - 1999 |

6 | Issues in classifier evaluation using optimal cost curves - Ting - 2002 |

4 | Basics of projective geometry. An institute for mathematics and its applications tutorial. Geometric Design: Geometries for CAGD http://www.ima.umn.edu/multimedia/spring/tut7.html - Pottmann - 2001 |

4 | Matching model versus single model: A study of the requirement to match class distribution using decision trees - Ting - 2004 |

3 | Brodley: 1998, ‘Pruning Decision Trees with Misclassification Costs - Bradford, Kunz, et al. |

3 | Holte: 2003, ‘C4.5, Class Imbalance, and Cost Sensitivity: Why Undersampling beats Oversampling - Drummond, C |

2 | Mohri: 2005, ‘Confidence Intervals for the Area Under the ROC Curve - Cortes, M |

2 | Holte: 2000a, ‘Explicitly Representing Expected Cost: An Alternative to ROC Representation - Drummond, C |

2 | Holte: 2000b, ‘Exploiting the Cost (In)sensitivity of Decision Tree Splitting Criteria - Drummond, C |

2 | Gatsonis: 2003, ‘Meta-analysis of Diagnostic Test Accuracy Assessment Studies with Varying Number of Thresholds - Dukic, C |