Results 1  10
of
108
The Relationship Between PrecisionRecall and ROC Curves
 In ICML ’06: Proceedings of the 23rd international conference on Machine learning
, 2006
"... Receiver Operator Characteristic (ROC) curves are commonly used to present results for binary decision problems in machine learning. However, when dealing with highly skewed datasets, PrecisionRecall (PR) curves give a more informative picture of an algorithm’s performance. We show that a deep conn ..."
Abstract

Cited by 205 (2 self)
 Add to MetaCart
Receiver Operator Characteristic (ROC) curves are commonly used to present results for binary decision problems in machine learning. However, when dealing with highly skewed datasets, PrecisionRecall (PR) curves give a more informative picture of an algorithm’s performance. We show that a deep connection exists between ROC space and PR space, such that a curve dominates in ROC space if and only if it dominates in PR space. A corollary is the notion of an achievable PR curve, which has properties much like the convex hull in ROC space; we show an efficient algorithm for computing this curve. Finally, we also note differences in the two types of curves are significant for algorithm design. For example, in PR space it is incorrect to linearly interpolate between points. Furthermore, algorithms that optimize the area under the ROC curve are not guaranteed to optimize the area under the PR curve. 1.
A support vector method for multivariate performance measures
 Proceedings of the 22nd International Conference on Machine Learning
, 2005
"... This paper presents a Support Vector Method for optimizing multivariate nonlinear performance measures like the F1score. Taking a multivariate prediction approach, we give an algorithm with which such multivariate SVMs can be trained in polynomial time for large classes of potentially nonlinear per ..."
Abstract

Cited by 200 (5 self)
 Add to MetaCart
This paper presents a Support Vector Method for optimizing multivariate nonlinear performance measures like the F1score. Taking a multivariate prediction approach, we give an algorithm with which such multivariate SVMs can be trained in polynomial time for large classes of potentially nonlinear performance measures, in particular ROCArea and all measures that can be computed from the contingency table. The conventional classification SVM arises as a special case of our method. 1.
Generalization bounds for the area under the ROC curve
 Journal of Machine Learning Research
"... We study generalization properties of the area under an ROC curve (AUC), a quantity that has been advocated as an evaluation criterion for bipartite ranking problems. The AUC is a different and more complex term than the error rate used for evaluation in classification problems; consequently, existi ..."
Abstract

Cited by 48 (6 self)
 Add to MetaCart
We study generalization properties of the area under an ROC curve (AUC), a quantity that has been advocated as an evaluation criterion for bipartite ranking problems. The AUC is a different and more complex term than the error rate used for evaluation in classification problems; consequently, existing generalization bounds for the classification error rate cannot be used to draw conclusions about the AUC. In this paper, we define a precise notion of the expected accuracy of a ranking function (analogous to the expected error rate of a classification function), and derive distributionfree probabilistic bounds on the deviation of the empirical AUC of a ranking function (observed on a finite data sequence) from its expected accuracy. We derive both a large deviation bound, which serves to bound the expected accuracy of a ranking function in terms of its empirical AUC on a test sequence, and a uniform convergence bound, which serves to bound the expected accuracy of a learned ranking function in terms of its empirical AUC on a training sequence. Our uniform convergence bound is expressed in terms of a new set of combinatorial parameters that we term the bipartite rankshatter coefficients; these play the same role in our result as do the standard shatter coefficients (also known variously as the counting numbers or growth function) in uniform convergence results for the classification error rate. We also compare our result with a recent uniform convergence result derived by Freund et al. (2003) for a quantity closely related to the AUC; as we show, the bound provided by our result is considerably tighter. 1 1
Ranking on graph data
 In ICML
, 2006
"... In ranking, one is given examples of order relationships among objects, and the goal is to learn from these examples a realvalued ranking function that induces a ranking or ordering over the object space. We consider the problem of learning such a ranking function when the data is represented as a ..."
Abstract

Cited by 34 (1 self)
 Add to MetaCart
In ranking, one is given examples of order relationships among objects, and the goal is to learn from these examples a realvalued ranking function that induces a ranking or ordering over the object space. We consider the problem of learning such a ranking function when the data is represented as a graph, in which vertices correspond to objects and edges encode similarities between objects. Building on recent developments in regularization theory for graphs and corresponding Laplacianbased methods for classification, we develop an algorithmic framework for learning ranking functions on graph data. We provide generalization guarantees for our algorithms via recent results based on the notion of algorithmic stability, and give experimental evidence of the potential benefits of our framework. 1.
Learning Ensembles of FirstOrder Clauses for RecallPrecision Curves: A Case Study in Biomedical Information Extraction
 Proceedings of the 14th International Conference on Inductive Logic Programming (ILP
, 2004
"... Many domains in the field of Inductive Logic Programming (ILP) involve highly unbalanced data. Our research has focused on Information Extraction (IE), a task that typically involves many more negative examples than positive examples. IE is the process of finding facts in unstructured text, such as ..."
Abstract

Cited by 24 (8 self)
 Add to MetaCart
Many domains in the field of Inductive Logic Programming (ILP) involve highly unbalanced data. Our research has focused on Information Extraction (IE), a task that typically involves many more negative examples than positive examples. IE is the process of finding facts in unstructured text, such as biomedical journals, and putting those facts in an organized system. In particular, we have focused on learning to recognize instances of the proteinlocalization relationship in Medline abstracts. We view the problem as a machinelearning task: given positive and negative extractions from a training corpus of abstracts, learn a logical theory that performs well on a heldaside testing set. A common way to measure performance in these domains is to use precision and recall instead of simply using accuracy. We propose Gleaner, a randomized search method which collects good clauses from a broad spectrum of points along the recall dimension in recallprecision curves and employs an "at least N of these M clauses" thresholding method to combine the selected clauses. We compare Gleaner to ensembles of standard Aleph theories and find that Gleaner produces comparable testset results in a fraction of the training time needed for ensembles.
SemiSupervised Multitask Learning
"... A semisupervised multitask learning (MTL) framework is presented, in which M parameterized semisupervised classifiers, each associated with one of M partially labeled data manifolds, are learned jointly under the constraint of a softsharing prior imposed over the parameters of the classifiers. The ..."
Abstract

Cited by 23 (5 self)
 Add to MetaCart
A semisupervised multitask learning (MTL) framework is presented, in which M parameterized semisupervised classifiers, each associated with one of M partially labeled data manifolds, are learned jointly under the constraint of a softsharing prior imposed over the parameters of the classifiers. The unlabeled data are utilized by basing classifier learning on neighborhoods, induced by a Markov random walk over a graph representation of each manifold. Experimental results on real data sets demonstrate that semisupervised MTL yields significant improvements in generalization performance over either semisupervised singletask learning (STL) or supervised MTL. 1
Robust reductions from ranking to classification
 PROCEEDINGS OF THE 20TH ANNUAL CONFERENCE ON LEARNING THEORY (COLT), LECTURE NOTES IN COMPUTER SCIENCE 4539
"... We reduce ranking, as measured by the Area Under the Receiver Operating Characteristic Curve (AUC), to binary classification. The core theorem shows that a binary classification regret of r on the induced binary problem implies an AUC regret of at most 2r. This is a large improvement over approache ..."
Abstract

Cited by 23 (1 self)
 Add to MetaCart
We reduce ranking, as measured by the Area Under the Receiver Operating Characteristic Curve (AUC), to binary classification. The core theorem shows that a binary classification regret of r on the induced binary problem implies an AUC regret of at most 2r. This is a large improvement over approaches such as ordering according to regressed scores, which have a regret transform of r ↦ → nr where n is the number of elements.
Ranking the best instances
 Journal of Machine Learning Research
"... We formulate a local form of the bipartite ranking problem where the goal is to focus on the best instances. We propose a methodology based on the construction of realvalued scoring functions. We study empirical risk minimization of dedicated statistics which involve empirical quantiles of the scor ..."
Abstract

Cited by 21 (10 self)
 Add to MetaCart
We formulate a local form of the bipartite ranking problem where the goal is to focus on the best instances. We propose a methodology based on the construction of realvalued scoring functions. We study empirical risk minimization of dedicated statistics which involve empirical quantiles of the scores. We first state the problem of finding the best instances which can be cast as a classification problem with mass constraint. Next, we develop special performance measures for the local ranking problem which extend the Area Under an ROC Curve (AUC) criterion and describe the optimal elements of these new criteria. We also highlight the fact that the goal of ranking the best instances cannot be achieved in a stagewise manner where first, the best instances would be tentatively identified and then a standard AUC criterion could be applied. Eventually, we state preliminary statistical results for the local ranking problem.
Stability and generalization of bipartite ranking algorithms
 Proceedings of the Eighteenth Annual Conference on Computational Learning Theory (COLT
, 2005
"... Abstract. The problem of ranking, in which the goal is to learn a realvalued ranking function that induces a ranking or ordering over an instance space, has recently gained attention in machine learning. We study generalization properties of ranking algorithms, in a particular setting of the rankin ..."
Abstract

Cited by 19 (2 self)
 Add to MetaCart
Abstract. The problem of ranking, in which the goal is to learn a realvalued ranking function that induces a ranking or ordering over an instance space, has recently gained attention in machine learning. We study generalization properties of ranking algorithms, in a particular setting of the ranking problem known as the bipartite ranking problem, using the notion of algorithmic stability. In particular, we derive generalization bounds for bipartite ranking algorithms that have good stability properties. We show that kernelbased ranking algorithms that perform regularization in a reproducing kernel Hilbert space have such stability properties, and therefore our bounds can be applied to these algorithms; this is in contrast with previous generalization bounds for ranking, which are based on uniform convergence and in many cases cannot be applied to these algorithms. A comparison of the bounds we obtain with corresponding bounds for classification algorithms yields some interesting insights into the difference in generalization behaviour between ranking and classification. 1
An Experimental Comparison of Performance Measures for Classification
, 2007
"... Performance metrics in classification are fundamental to assess the quality of learning methods and learned models. However, many different measures have been defined in the literature with the aim of making better choices in general or for a specific application area. Choices made by one metric are ..."
Abstract

Cited by 18 (5 self)
 Add to MetaCart
Performance metrics in classification are fundamental to assess the quality of learning methods and learned models. However, many different measures have been defined in the literature with the aim of making better choices in general or for a specific application area. Choices made by one metric are claimed to be different from choices made by other metrics. In this work we analyse experimentally the behaviour of 18 different performance metrics in several scenarios, identifying clusters and relationships between measures. We also perform a sensitivity analysis for all of them in terms of several traits: class threshold choice, separability/ranking quality, calibration performance and sensitivity to changes in prior class distribution. From the definitions and the experiments, we give a comprehensive analysis on the relationships between metrics, and a taxonomy and arrangement of them according to the previous traits. This can be useful to choose the most adequate measure (or set of measures) for a specific application. Additionally, the study also highlights some niches in which new measures might be defined and also shows that some supposedly innovative measures make the same choices (or almost) than existing ones. Finally, this work can also be used as a reference for comparing experimental results in the pattern recognition and machine learning literature, when using different measures.