Results 1 -
8 of
8
Cost curves: an improved method for visualizing classifier performance
- Machine Learning
, 2006
"... Abstract. This paper introduces cost curves, a graphical technique for visualizing the performance (error rate or expected cost) of 2-class classifiers over the full range of possible class distributions and misclassification costs. Cost curves are shown to be superior to ROC curves for visualizing ..."
Abstract
-
Cited by 27 (5 self)
- Add to MetaCart
Abstract. This paper introduces cost curves, a graphical technique for visualizing the performance (error rate or expected cost) of 2-class classifiers over the full range of possible class distributions and misclassification costs. Cost curves are shown to be superior to ROC curves for visualizing classifier performance for most purposes. This is because they visually support several crucial types of performance assessment that cannot be done easily with ROC curves, such as showing confidence intervals on a classifier’s performance, and visualizing the statistical significance of the difference in performance of two classifiers. A software tool supporting all the cost curve analysis described in this paper is available from the authors.
A quality-aware optimizer for information extraction
- ACM Transactions on Database Systems
"... A large amount of structured information is buried in unstructured text. Information extraction systems can extract structured relations from the documents and enable sophisticated, SQL-like queries over unstructured text. Information extraction systems are not perfect and their output has imperfect ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
A large amount of structured information is buried in unstructured text. Information extraction systems can extract structured relations from the documents and enable sophisticated, SQL-like queries over unstructured text. Information extraction systems are not perfect and their output has imperfect precision and recall (i.e., contains spurious tuples and misses good tuples). Typically, an extraction system has a set of parameters that can be used as “knobs ” to tune the system to be either precision- or recall-oriented. Furthermore, the choice of documents processed by the extraction system also affects the quality of the extracted relation. So far, estimating the output quality of an information extraction task has been an ad hoc procedure, based mainly on heuristics. In this article, we show how to use Receiver Operating Characteristic (ROC) curves to estimate the extraction quality in a statistically robust way and show how to use ROC analysis to select the extraction parameters in a principled manner. Furthermore, we present analytic models that reveal how different document retrieval strategies affect the quality of the extracted relation. Finally, we present our maximum likelihood approach for estimating, on the fly, the parameters required by our analytic models to predict the runtime and the output quality of each execution plan. Our experimental evaluation demonstrates that our optimization approach predicts accurately the output quality and selects the fastest execution plan that satisfies the output quality restrictions.
Pareto optimal linear classification
- in Proc. ICML, 2006
, 1990
"... We consider the problem of choosing a linear classifier that minimizes misclassification probabilities in two-class classification, which is a bi-criterion problem, involving a trade-off between two objectives. We assume that the class-conditional distributions are Gaussian. This assumption makes it ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
We consider the problem of choosing a linear classifier that minimizes misclassification probabilities in two-class classification, which is a bi-criterion problem, involving a trade-off between two objectives. We assume that the class-conditional distributions are Gaussian. This assumption makes it computationally tractable to find Pareto optimal linear classifiers whose classification capabilities are inferior to no other linear ones. The main purpose of this paper is to establish several robustness properties of those classifiers with respect to variations and uncertainties in the distributions. We also extend the results to kernel-based classification. Finally, we show how to carry out trade-off analysis empirically with a finite number of given labeled data. 1.
Performance Generalization in Biometric Authentication Using Joint User-Specific and Sample Bootstraps
- in IEEE Trans. Pattern Analysis and Machine Intelligence
, 2005
"... Biometric authentication performance is often depicted by a DET curve. We show that this curve is dependent on the choice of samples available, the demographic composition and the number of users specific to a database. We propose a two-step bootstrap procedure to take into account of the three ment ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Biometric authentication performance is often depicted by a DET curve. We show that this curve is dependent on the choice of samples available, the demographic composition and the number of users specific to a database. We propose a two-step bootstrap procedure to take into account of the three mentioned sources of variability. This is an extension to the Bolle et al.’s bootstrap subset technique. Preliminary experiments on the NIST2005 and XM2VTS benchmark databases is encouraging, e.g., the average result across all 24 systems evaluated on NIST2005 indicates that one can predict, with more than 75 % of DET coverage, an unseen DET curve with 8 times more users. Furthermore, our finding suggests that with more data available, the confidence intervals become smaller and hence more useful.
Pointwise ROC Confidence Bounds: An Empirical Evaluation
- Proceedings of the Workshop on ROC Analysis in Machine Learning (ROCML-2005) at ICML-2005
, 2005
"... This paper is about constructing and evaluating pointwise confidence bounds on an ROC curve. We describe four confidencebound methods, two from the medical field and two used previously in machine learning research. We evaluate whether the bounds indeed contain the relevant operating point on the "t ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
This paper is about constructing and evaluating pointwise confidence bounds on an ROC curve. We describe four confidencebound methods, two from the medical field and two used previously in machine learning research. We evaluate whether the bounds indeed contain the relevant operating point on the "true" ROC curve with a confidence of 1-#. We then evaluate pointwise confidence bounds on the region where the future performance of a model is expected to lie. For evaluation we use a synthetic world representing "binormal" distributions--the classification scores for positive and negative instances are drawn from (separate) normal distributions. For the "true-curve" bounds, all methods are sensitive to how well the distributions are separated, which corresponds directly to the area under the ROC curve. One method produces bounds that are universally too loose, another universally too tight, and the remaining two are close to the desired containment although containment breaks down at the extremes of the ROC curve. As would be expected, all methods fail when used to contain "future" ROC curves. Widening the bounds to account for the increased uncertainty yields identical qualitative results to the "true-curve" evaluation. We conclude by recommending a simple, very efficient method (vertical averaging) for large sample sizes and a more computationally expensive method (kernel estimation) for small sample sizes.
Receiver Operating Characteristic Curve Confidence Intervals and Regions
"... Abstract—Many researchers have presented results showing the empirical performance of target detection algorithms using hyperspectral or synthetic aperture radar imagery. In nearly all cases, these probabilities of detection and false alarm are presented as precise values as opposed to their true na ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Abstract—Many researchers have presented results showing the empirical performance of target detection algorithms using hyperspectral or synthetic aperture radar imagery. In nearly all cases, these probabilities of detection and false alarm are presented as precise values as opposed to their true nature as estimates of random values. In this letter, we provide analytical tools and examples of computing confidence intervals and regions around these estimates commonly presented as points on receiver operating characteristic (ROC) curves. It is suggested that these tools be adopted by researchers when presenting their results to provide their audience with a quantitative metric for proper interpretation of empirically estimated ROC curves. Index Terms—Confidence intervals, confidence regions, receiver operating characteristic (ROC) curves, target detection. I.
Evaluating Misclassifications in Imbalanced Data
"... Abstract. Evaluating classifier performance with ROC curves is popular in the machine learning community. To date, the only method to assess confidence of ROC curves is to construct ROC bands. In the case of severe class imbalance with few instances of the minority class, ROC bands become unreliable ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract. Evaluating classifier performance with ROC curves is popular in the machine learning community. To date, the only method to assess confidence of ROC curves is to construct ROC bands. In the case of severe class imbalance with few instances of the minority class, ROC bands become unreliable. We propose a generic framework for classifier evaluation to identify a segment of an ROC curve in which misclassifications are balanced. Confidence is measured by Tango’s 95%-confidence interval for the difference in misclassification in both classes. We test our method with severe class imbalance in a two-class problem. Our evaluation favors classifiers with low numbers of misclassifications in both classes. Our results show that the proposed evaluation method is more confident than ROC bands. 1
Estimating the Confidence Interval of Expected Performance Curve in Biometric Authentication Using Joint Bootstrap
, 2006
"... submitted for publication ..."

