Results 1  10
of
15
A support vector method for multivariate performance measures
 Proceedings of the 22nd International Conference on Machine Learning
, 2005
"... This paper presents a Support Vector Method for optimizing multivariate nonlinear performance measures like the F1score. Taking a multivariate prediction approach, we give an algorithm with which such multivariate SVMs can be trained in polynomial time for large classes of potentially nonlinear per ..."
Abstract

Cited by 192 (5 self)
 Add to MetaCart
This paper presents a Support Vector Method for optimizing multivariate nonlinear performance measures like the F1score. Taking a multivariate prediction approach, we give an algorithm with which such multivariate SVMs can be trained in polynomial time for large classes of potentially nonlinear performance measures, in particular ROCArea and all measures that can be computed from the contingency table. The conventional classification SVM arises as a special case of our method. 1.
Relating reinforcement learning performance to classification performance
 In 22nd International Conference on Machine Learning (ICML
, 2005
"... We prove a quantitative connection between the expected sum of rewards of a policy and binary classification performance on created subproblems. This connection holds without any unobservable assumptions (no assumption of independence, small mixing time, fully observable states, or even hidden state ..."
Abstract

Cited by 25 (4 self)
 Add to MetaCart
We prove a quantitative connection between the expected sum of rewards of a policy and binary classification performance on created subproblems. This connection holds without any unobservable assumptions (no assumption of independence, small mixing time, fully observable states, or even hidden states) and the resulting statement is independent of the number of states or actions. The statement is critically dependent on the size of the rewards and prediction performance of the created classifiers. We also provide some general guidelines for obtaining good classification performance on the created subproblems. In particular, we discuss possible methods for generating training examples for a classifier learning algorithm. 1.
Robust reductions from ranking to classification
 PROCEEDINGS OF THE 20TH ANNUAL CONFERENCE ON LEARNING THEORY (COLT), LECTURE NOTES IN COMPUTER SCIENCE 4539
"... We reduce ranking, as measured by the Area Under the Receiver Operating Characteristic Curve (AUC), to binary classification. The core theorem shows that a binary classification regret of r on the induced binary problem implies an AUC regret of at most 2r. This is a large improvement over approache ..."
Abstract

Cited by 22 (1 self)
 Add to MetaCart
We reduce ranking, as measured by the Area Under the Receiver Operating Characteristic Curve (AUC), to binary classification. The core theorem shows that a binary classification regret of r on the induced binary problem implies an AUC regret of at most 2r. This is a large improvement over approaches such as ordering according to regressed scores, which have a regret transform of r ↦ → nr where n is the number of elements.
Error Correcting Tournaments
, 2008
"... Abstract. We present a family of adaptive pairwise tournaments that are provably robust against large error fractions when used to determine the largest element in a set. The tournaments use nk pairwise comparisons but have only O(k + log n) depth, where n is the number of players and k is the robus ..."
Abstract

Cited by 15 (3 self)
 Add to MetaCart
Abstract. We present a family of adaptive pairwise tournaments that are provably robust against large error fractions when used to determine the largest element in a set. The tournaments use nk pairwise comparisons but have only O(k + log n) depth, where n is the number of players and k is the robustness parameter (for reasonable values of n and k). These tournaments also give a reduction from multiclass to binary classification in machine learning, yielding the best known analysis for the problem. 1
An empirical comparison of algorithms for aggregating expert predictions
 In UAI
, 2006
"... Predicting the outcomes of future events is a challenging problem for which a variety of solution methods have been explored and attempted. We present an empirical comparison of a variety of online and offline adaptive algorithms for aggregating experts ’ predictions of the outcomes of five years of ..."
Abstract

Cited by 13 (3 self)
 Add to MetaCart
Predicting the outcomes of future events is a challenging problem for which a variety of solution methods have been explored and attempted. We present an empirical comparison of a variety of online and offline adaptive algorithms for aggregating experts ’ predictions of the outcomes of five years of US National Football League games (1319 games) using expert probability elicitations obtained from an Internet contest called ProbabilitySports. We find that it is difficult to improve over simple averaging of the predictions in terms of prediction accuracy, but that there is room for improvement in quadratic loss. Somewhat surprisingly, a Bayesian estimation algorithm which estimates the variance of each expert’s prediction exhibits the most consistent superior performance over simple averaging among our collection of algorithms. 1
Experienceefficient learning in associative bandit problems. ICML
 Proceedings of the Twentythird International Conference on Machine Learning (ICML06
, 2006
"... We formalize the associative bandit problem framework introduced by Kaelbling as a learningtheory problem. The learning environment is modeled as a karmed bandit where arm payoffs are conditioned on an observable input selected on each trial. We show that, if the payoff functions are constrained t ..."
Abstract

Cited by 12 (1 self)
 Add to MetaCart
We formalize the associative bandit problem framework introduced by Kaelbling as a learningtheory problem. The learning environment is modeled as a karmed bandit where arm payoffs are conditioned on an observable input selected on each trial. We show that, if the payoff functions are constrained to a known hypothesis class, learning can be performed efficiently with respect to the VC dimension of this class. We formally reduce the problem of PAC classification to the associative bandit problem, producing an efficient algorithm for any hypothesis class for which efficient classification algorithms are known. We demonstrate the approach empirically on a scalable concept class. 1.
Composite Binary Losses
, 2009
"... We study losses for binary classification and class probability estimation and extend the understanding of them from margin losses to general composite losses which are the composition of a proper loss with a link function. We characterise when margin losses can be proper composite losses, explicitl ..."
Abstract

Cited by 11 (8 self)
 Add to MetaCart
We study losses for binary classification and class probability estimation and extend the understanding of them from margin losses to general composite losses which are the composition of a proper loss with a link function. We characterise when margin losses can be proper composite losses, explicitly show how to determine a symmetric loss in full from half of one of its partial losses, introduce an intrinsic parametrisation of composite binary losses and give a complete characterisation of the relationship between proper losses and “classification calibrated ” losses. We also consider the question of the “best ” surrogate binary loss. We introduce a precise notion of “best ” and show there exist situations where two convex surrogate losses are incommensurable. We provide a complete explicit characterisation of the convexity of composite binary losses in terms of the link function and the weight function associated with the proper loss which make up the composite loss. This characterisation suggests new ways of “surrogate tuning”. Finally, in an appendix we present some new algorithmindependent results on the relationship between properness, convexity and robustness to misclassification noise for binary losses and show that all convex proper losses are nonrobust to misclassification noise. 1
Label Ranking Algorithms: A Survey
"... Abstract. Label ranking is a complex prediction task where the goal is to map instances to a total order over a finite set of predefined labels. An interesting aspect of this problem is that it subsumes several supervised learning problems such as multiclass prediction, multilabel classification and ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
Abstract. Label ranking is a complex prediction task where the goal is to map instances to a total order over a finite set of predefined labels. An interesting aspect of this problem is that it subsumes several supervised learning problems such as multiclass prediction, multilabel classification and hierarchical classification. Unsurpisingly, there exists a plethora of label ranking algorithms in the literature due, in part, to this versatile nature of the problem. In this paper, we survey these algorithms. 1
Surrogate Regret Bounds for Proper Losses
"... We present tight surrogate regret bounds for the class of proper (i.e., Fisher consistent) losses. The bounds generalise the marginbased bounds due to Bartlett et al. (2006). The proof uses Taylor’s theorem and leads to new representations for loss and regret and a simple proof of the integral repr ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
We present tight surrogate regret bounds for the class of proper (i.e., Fisher consistent) losses. The bounds generalise the marginbased bounds due to Bartlett et al. (2006). The proof uses Taylor’s theorem and leads to new representations for loss and regret and a simple proof of the integral representation of proper losses. We also present a different formulation of a duality result of Bregman divergences which leads to a simple demonstration of the convexity of composite losses using canonical link functions. 1.
Machine Learning Techniques—Reductions Between Prediction Quality Metrics
"... Abstract Machine learning involves optimizing a loss function on unlabeled data points given examples of labeled data points, where the loss function measures the performance of a learning algorithm. We give an overview of techniques, called reductions, for converting a problem of minimizing one los ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
Abstract Machine learning involves optimizing a loss function on unlabeled data points given examples of labeled data points, where the loss function measures the performance of a learning algorithm. We give an overview of techniques, called reductions, for converting a problem of minimizing one loss function into a problem of minimizing another, simpler loss function. This tutorial discusses how to create robust reductions that perform well in practice. The reductions discussed here can be used to solve any supervised learning problem with a standard binary classification or regression algorithm available in any machine learning toolkit. We also discuss common design flaws in folklore reductions. 1