Results 1 - 10
of
13
A support vector method for multivariate performance measures
- Proceedings of the 22nd International Conference on Machine Learning
, 2005
"... This paper presents a Support Vector Method for optimizing multivariate nonlinear performance measures like the F1score. Taking a multivariate prediction approach, we give an algorithm with which such multivariate SVMs can be trained in polynomial time for large classes of potentially non-linear per ..."
Abstract
-
Cited by 132 (5 self)
- Add to MetaCart
This paper presents a Support Vector Method for optimizing multivariate nonlinear performance measures like the F1score. Taking a multivariate prediction approach, we give an algorithm with which such multivariate SVMs can be trained in polynomial time for large classes of potentially non-linear performance measures, in particular ROCArea and all measures that can be computed from the contingency table. The conventional classification SVM arises as a special case of our method. 1.
Relating reinforcement learning performance to classification performance
- In Proceedings of the International Conference on Machine Learning (ICML
, 2005
"... We prove a quantitative connection between the expected sum of rewards of a policy and binary classification performance on created subproblems. This connection holds without any unobservable assumptions (no assumption of independence, small mixing time, fully observable states, or even hidden state ..."
Abstract
-
Cited by 17 (4 self)
- Add to MetaCart
We prove a quantitative connection between the expected sum of rewards of a policy and binary classification performance on created subproblems. This connection holds without any unobservable assumptions (no assumption of independence, small mixing time, fully observable states, or even hidden states) and the resulting statement is independent of the number of states or actions. The statement is critically dependent on the size of the rewards and prediction performance of the created classifiers. We also provide some general guidelines for obtaining good classification performance on the created subproblems. In particular, we discuss possible methods for generating training examples for a classifier learning algorithm. 1.
Robust reductions from ranking to classification
- PROCEEDINGS OF THE 20TH ANNUAL CONFERENCE ON LEARNING THEORY (COLT), LECTURE NOTES IN COMPUTER SCIENCE 4539
"... We reduce ranking, as measured by the Area Under the Receiver Operating Characteristic Curve (AUC), to binary classification. The core theorem shows that a binary classification regret of r on the induced binary problem implies an AUC regret of at most 2r. This is a large improvement over approache ..."
Abstract
-
Cited by 14 (1 self)
- Add to MetaCart
We reduce ranking, as measured by the Area Under the Receiver Operating Characteristic Curve (AUC), to binary classification. The core theorem shows that a binary classification regret of r on the induced binary problem implies an AUC regret of at most 2r. This is a large improvement over approaches such as ordering according to regressed scores, which have a regret transform of r ↦ → nr where n is the number of elements.
An empirical comparison of algorithms for aggregating expert predictions
- In UAI
, 2006
"... Predicting the outcomes of future events is a challenging problem for which a variety of solution methods have been explored and attempted. We present an empirical comparison of a variety of online and offline adaptive algorithms for aggregating experts ’ predictions of the outcomes of five years of ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
Predicting the outcomes of future events is a challenging problem for which a variety of solution methods have been explored and attempted. We present an empirical comparison of a variety of online and offline adaptive algorithms for aggregating experts ’ predictions of the outcomes of five years of US National Football League games (1319 games) using expert probability elicitations obtained from an Internet contest called ProbabilitySports. We find that it is difficult to improve over simple averaging of the predictions in terms of prediction accuracy, but that there is room for improvement in quadratic loss. Somewhat surprisingly, a Bayesian estimation algorithm which estimates the variance of each expert’s prediction exhibits the most consistent superior performance over simple averaging among our collection of algorithms. 1
Experience-efficient learning in associative bandit problems. ICML
- Proceedings of the Twenty-third International Conference on Machine Learning (ICML-06
, 2006
"... We formalize the associative bandit problem framework introduced by Kaelbling as a learning-theory problem. The learning environment is modeled as a k-armed bandit where arm payoffs are conditioned on an observable input selected on each trial. We show that, if the payoff functions are constrained t ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
We formalize the associative bandit problem framework introduced by Kaelbling as a learning-theory problem. The learning environment is modeled as a k-armed bandit where arm payoffs are conditioned on an observable input selected on each trial. We show that, if the payoff functions are constrained to a known hypothesis class, learning can be performed efficiently with respect to the VC dimension of this class. We formally reduce the problem of PAC classification to the associative bandit problem, producing an efficient algorithm for any hypothesis class for which efficient classification algorithms are known. We demonstrate the approach empirically on a scalable concept class. 1.
Information, Divergence and Risk for Binary Experiments
- JOURNAL OF MACHINE LEARNING RESEARCH
, 2009
"... We unify f-divergences, Bregman divergences, surrogate regret bounds, proper scoring rules, cost curves, ROC-curves and statistical information. We do this by systematically studying integral and variational representations of these various objects and in so doing identify their primitives which all ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
We unify f-divergences, Bregman divergences, surrogate regret bounds, proper scoring rules, cost curves, ROC-curves and statistical information. We do this by systematically studying integral and variational representations of these various objects and in so doing identify their primitives which all are related to cost-sensitive binary classification. As well as developing relationships between generative and discriminative views of learning, the new machinery leads to tight and more general surrogate regret bounds and generalised Pinsker inequalities relating f-divergences to variational divergence. The new viewpoint also illuminates existing algorithms: it provides a new derivation of Support Vector Machines in terms of divergences and relates Maximum Mean Discrepancy to Fisher Linear Discriminants.
Surrogate Regret Bounds for Proper Losses
"... We present tight surrogate regret bounds for the class of proper (i.e., Fisher consistent) losses. The bounds generalise the margin-based bounds due to Bartlett et al. (2006). The proof uses Taylor’s theorem and leads to new representations for loss and regret and a simple proof of the integral repr ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
We present tight surrogate regret bounds for the class of proper (i.e., Fisher consistent) losses. The bounds generalise the margin-based bounds due to Bartlett et al. (2006). The proof uses Taylor’s theorem and leads to new representations for loss and regret and a simple proof of the integral representation of proper losses. We also present a different formulation of a duality result of Bregman divergences which leads to a simple demonstration of the convexity of composite losses using canonical link functions. 1.
Label Ranking Algorithms: A Survey
"... Abstract. Label ranking is a complex prediction task where the goal is to map instances to a total order over a finite set of predefined labels. An interesting aspect of this problem is that it subsumes several supervised learning problems such as multiclass prediction, multilabel classification and ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Abstract. Label ranking is a complex prediction task where the goal is to map instances to a total order over a finite set of predefined labels. An interesting aspect of this problem is that it subsumes several supervised learning problems such as multiclass prediction, multilabel classification and hierarchical classification. Unsurpisingly, there exists a plethora of label ranking algorithms in the literature due, in part, to this versatile nature of the problem. In this paper, we survey these algorithms. 1
Evaluating Predictive Uncertainty Challenge
"... Abstract. This Chapter presents the PASCAL 1 Evaluating Predictive Uncertainty Challenge, introduces the contributed Chapters by the participants who obtained outstanding results, and provides a discussion with some lessons to be learnt. The Challenge was set up to evaluate the ability of Machine Le ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Abstract. This Chapter presents the PASCAL 1 Evaluating Predictive Uncertainty Challenge, introduces the contributed Chapters by the participants who obtained outstanding results, and provides a discussion with some lessons to be learnt. The Challenge was set up to evaluate the ability of Machine Learning algorithms to provide good “probabilistic predictions”, rather than just the usual “point predictions ” with no measure of uncertainty, in regression and classification problems. Participants had to compete on a number of regression and classification tasks, and were evaluated by both traditional losses that only take into account point predictions and losses we proposed that evaluate the quality of the probabilistic predictions. 1
Composite Binary Losses
, 2009
"... We study losses for binary classification and class probability estimation and extend the understanding of them from margin losses to general composite losses which are the composition of a proper loss with a link function. We characterise when margin losses can be proper composite losses, explicitl ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
We study losses for binary classification and class probability estimation and extend the understanding of them from margin losses to general composite losses which are the composition of a proper loss with a link function. We characterise when margin losses can be proper composite losses, explicitly show how to determine a symmetric loss in full from half of one of its partial losses, introduce an intrinsic parametrisation of composite binary losses and give a complete characterisation of the relationship between proper losses and “classification calibrated ” losses. We also consider the question of the “best ” surrogate binary loss. We introduce a precise notion of “best ” and show there exist situations where two convex surrogate losses are incommensurable. We provide a complete explicit characterisation of the convexity of composite binary losses in terms of the link function and the weight function associated with the proper loss which make up the composite loss. This characterisation suggests new ways of “surrogate tuning”. Finally, in an appendix we present some new algorithm-independent results on the relationship between properness, convexity and robustness to misclassification noise for binary losses and show that all convex proper losses are non-robust to misclassification noise. 1

