Results 1 - 10
of
49
A support vector method for multivariate performance measures
- Proceedings of the 22nd International Conference on Machine Learning
, 2005
"... This paper presents a Support Vector Method for optimizing multivariate nonlinear performance measures like the F1score. Taking a multivariate prediction approach, we give an algorithm with which such multivariate SVMs can be trained in polynomial time for large classes of potentially non-linear per ..."
Abstract
-
Cited by 130 (5 self)
- Add to MetaCart
This paper presents a Support Vector Method for optimizing multivariate nonlinear performance measures like the F1score. Taking a multivariate prediction approach, we give an algorithm with which such multivariate SVMs can be trained in polynomial time for large classes of potentially non-linear performance measures, in particular ROCArea and all measures that can be computed from the contingency table. The conventional classification SVM arises as a special case of our method. 1.
The Relationship Between Precision-Recall and ROC Curves
- In ICML ’06: Proceedings of the 23rd international conference on Machine learning
, 2006
"... Receiver Operator Characteristic (ROC) curves are commonly used to present results for binary decision problems in machine learning. However, when dealing with highly skewed datasets, Precision-Recall (PR) curves give a more informative picture of an algorithm’s performance. We show that a deep conn ..."
Abstract
-
Cited by 92 (2 self)
- Add to MetaCart
Receiver Operator Characteristic (ROC) curves are commonly used to present results for binary decision problems in machine learning. However, when dealing with highly skewed datasets, Precision-Recall (PR) curves give a more informative picture of an algorithm’s performance. We show that a deep connection exists between ROC space and PR space, such that a curve dominates in ROC space if and only if it dominates in PR space. A corollary is the notion of an achievable PR curve, which has properties much like the convex hull in ROC space; we show an efficient algorithm for computing this curve. Finally, we also note differences in the two types of curves are significant for algorithm design. For example, in PR space it is incorrect to linearly interpolate between points. Furthermore, algorithms that optimize the area under the ROC curve are not guaranteed to optimize the area under the PR curve. 1.
Generalization bounds for the area under the ROC curve
- Journal of Machine Learning Research
"... We study generalization properties of the area under an ROC curve (AUC), a quantity that has been advocated as an evaluation criterion for bipartite ranking problems. The AUC is a different and more complex term than the error rate used for evaluation in classification problems; consequently, existi ..."
Abstract
-
Cited by 23 (5 self)
- Add to MetaCart
We study generalization properties of the area under an ROC curve (AUC), a quantity that has been advocated as an evaluation criterion for bipartite ranking problems. The AUC is a different and more complex term than the error rate used for evaluation in classification problems; consequently, existing generalization bounds for the classification error rate cannot be used to draw conclusions about the AUC. In this paper, we define a precise notion of the expected accuracy of a ranking function (analogous to the expected error rate of a classification function), and derive distribution-free probabilistic bounds on the deviation of the empirical AUC of a ranking function (observed on a finite data sequence) from its expected accuracy. We derive both a large deviation bound, which serves to bound the expected accuracy of a ranking function in terms of its empirical AUC on a test sequence, and a uniform convergence bound, which serves to bound the expected accuracy of a learned ranking function in terms of its empirical AUC on a training sequence. Our uniform convergence bound is expressed in terms of a new set of combinatorial parameters that we term the bipartite rank-shatter coefficients; these play the same role in our result as do the standard shatter coefficients (also known variously as the counting numbers or growth function) in uniform convergence results for the classification error rate. We also compare our result with a recent uniform convergence result derived by Freund et al. (2003) for a quantity closely related to the AUC; as we show, the bound provided by our result is considerably tighter. 1 1
Ranking on graph data
- In ICML
, 2006
"... In ranking, one is given examples of order relationships among objects, and the goal is to learn from these examples a real-valued ranking function that induces a ranking or ordering over the object space. We consider the problem of learning such a ranking function when the data is represented as a ..."
Abstract
-
Cited by 22 (0 self)
- Add to MetaCart
In ranking, one is given examples of order relationships among objects, and the goal is to learn from these examples a real-valued ranking function that induces a ranking or ordering over the object space. We consider the problem of learning such a ranking function when the data is represented as a graph, in which vertices correspond to objects and edges encode similarities between objects. Building on recent developments in regularization theory for graphs and corresponding Laplacian-based methods for classification, we develop an algorithmic framework for learning ranking functions on graph data. We provide generalization guarantees for our algorithms via recent results based on the notion of algorithmic stability, and give experimental evidence of the potential benefits of our framework. 1.
Learning Ensembles of First-Order Clauses for Recall-Precision Curves: A Case Study in Biomedical Information Extraction
- Proceedings of the 14th International Conference on Inductive Logic Programming (ILP
, 2004
"... Many domains in the field of Inductive Logic Programming (ILP) involve highly unbalanced data. Our research has focused on Information Extraction (IE), a task that typically involves many more negative examples than positive examples. IE is the process of finding facts in unstructured text, such as ..."
Abstract
-
Cited by 21 (6 self)
- Add to MetaCart
Many domains in the field of Inductive Logic Programming (ILP) involve highly unbalanced data. Our research has focused on Information Extraction (IE), a task that typically involves many more negative examples than positive examples. IE is the process of finding facts in unstructured text, such as biomedical journals, and putting those facts in an organized system. In particular, we have focused on learning to recognize instances of the protein-localization relationship in Medline abstracts. We view the problem as a machine-learning task: given positive and negative extractions from a training corpus of abstracts, learn a logical theory that performs well on a held-aside testing set. A common way to measure performance in these domains is to use precision and recall instead of simply using accuracy. We propose Gleaner, a randomized search method which collects good clauses from a broad spectrum of points along the recall dimension in recall-precision curves and employs an "at least N of these M clauses" thresholding method to combine the selected clauses. We compare Gleaner to ensembles of standard Aleph theories and find that Gleaner produces comparable testset results in a fraction of the training time needed for ensembles.
Stability and generalization of bipartite ranking algorithms
- Proceedings of the Eighteenth Annual Conference on Computational Learning Theory (COLT
, 2005
"... Abstract. The problem of ranking, in which the goal is to learn a real-valued ranking function that induces a ranking or ordering over an instance space, has recently gained attention in machine learning. We study generalization properties of ranking algorithms, in a particular setting of the rankin ..."
Abstract
-
Cited by 14 (2 self)
- Add to MetaCart
Abstract. The problem of ranking, in which the goal is to learn a real-valued ranking function that induces a ranking or ordering over an instance space, has recently gained attention in machine learning. We study generalization properties of ranking algorithms, in a particular setting of the ranking problem known as the bipartite ranking problem, using the notion of algorithmic stability. In particular, we derive generalization bounds for bipartite ranking algorithms that have good stability properties. We show that kernel-based ranking algorithms that perform regularization in a reproducing kernel Hilbert space have such stability properties, and therefore our bounds can be applied to these algorithms; this is in contrast with previous generalization bounds for ranking, which are based on uniform convergence and in many cases cannot be applied to these algorithms. A comparison of the bounds we obtain with corresponding bounds for classification algorithms yields some interesting insights into the difference in generalization behaviour between ranking and classification. 1
Robust reductions from ranking to classification
- PROCEEDINGS OF THE 20TH ANNUAL CONFERENCE ON LEARNING THEORY (COLT), LECTURE NOTES IN COMPUTER SCIENCE 4539
"... We reduce ranking, as measured by the Area Under the Receiver Operating Characteristic Curve (AUC), to binary classification. The core theorem shows that a binary classification regret of r on the induced binary problem implies an AUC regret of at most 2r. This is a large improvement over approache ..."
Abstract
-
Cited by 14 (1 self)
- Add to MetaCart
We reduce ranking, as measured by the Area Under the Receiver Operating Characteristic Curve (AUC), to binary classification. The core theorem shows that a binary classification regret of r on the induced binary problem implies an AUC regret of at most 2r. This is a large improvement over approaches such as ordering according to regressed scores, which have a regret transform of r ↦ → nr where n is the number of elements.
Semi-Supervised Multitask Learning
"... A semi-supervised multitask learning (MTL) framework is presented, in which M parameterized semi-supervised classifiers, each associated with one of M partially labeled data manifolds, are learned jointly under the constraint of a softsharing prior imposed over the parameters of the classifiers. The ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
A semi-supervised multitask learning (MTL) framework is presented, in which M parameterized semi-supervised classifiers, each associated with one of M partially labeled data manifolds, are learned jointly under the constraint of a softsharing prior imposed over the parameters of the classifiers. The unlabeled data are utilized by basing classifier learning on neighborhoods, induced by a Markov random walk over a graph representation of each manifold. Experimental results on real data sets demonstrate that semi-supervised MTL yields significant improvements in generalization performance over either semi-supervised single-task learning (STL) or supervised MTL. 1
Margin-based Ranking and an Equivalence between AdaBoost and RankBoost
, 2009
"... We study boosting algorithms for learning to rank. We give a general margin-based bound for ranking based on covering numbers for the hypothesis space. Our bound suggests that algorithms that maximize the ranking margin will generalize well. We then describe a new algorithm, smooth margin ranking, t ..."
Abstract
-
Cited by 9 (6 self)
- Add to MetaCart
We study boosting algorithms for learning to rank. We give a general margin-based bound for ranking based on covering numbers for the hypothesis space. Our bound suggests that algorithms that maximize the ranking margin will generalize well. We then describe a new algorithm, smooth margin ranking, that precisely converges to a maximum ranking-margin solution. The algorithm is a modification of RankBoost, analogous to “approximate coordinate ascent boosting. ” Finally, we prove that AdaBoost and RankBoost are equally good for the problems of bipartite ranking and classification in terms of their asymptotic behavior on the training set. Under natural conditions, AdaBoost achieves an area under the ROC curve that is equally as good as RankBoost’s; furthermore, RankBoost, when given a specific intercept, achieves a misclassification error that is as good as AdaBoost’s. This may help to explain the empirical observations made by Cortes and Mohri, and Caruana and Niculescu-Mizil, about the excellent performance of AdaBoost as a bipartite ranking algorithm, as measured by the area under the ROC curve.
An Experimental Comparison of Performance Measures for Classification
, 2007
"... Performance metrics in classification are fundamental to assess the quality of learning methods and learned models. However, many different measures have been defined in the literature with the aim of making better choices in general or for a specific application area. Choices made by one metric are ..."
Abstract
-
Cited by 8 (4 self)
- Add to MetaCart
Performance metrics in classification are fundamental to assess the quality of learning methods and learned models. However, many different measures have been defined in the literature with the aim of making better choices in general or for a specific application area. Choices made by one metric are claimed to be different from choices made by other metrics. In this work we analyse experimentally the behaviour of 18 different performance metrics in several scenarios, identifying clusters and relationships between measures. We also perform a sensitivity analysis for all of them in terms of several traits: class threshold choice, separability/ranking quality, calibration performance and sensitivity to changes in prior class distribution. From the definitions and the experiments, we give a comprehensive analysis on the relationships between metrics, and a taxonomy and arrangement of them according to the previous traits. This can be useful to choose the most adequate measure (or set of measures) for a specific application. Additionally, the study also highlights some niches in which new measures might be defined and also shows that some supposedly innovative measures make the same choices (or almost) than existing ones. Finally, this work can also be used as a reference for comparing experimental results in the pattern recognition and machine learning literature, when using different measures.

