Results 1  10
of
10
Information, Divergence and Risk for Binary Experiments
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2009
"... We unify fdivergences, Bregman divergences, surrogate regret bounds, proper scoring rules, cost curves, ROCcurves and statistical information. We do this by systematically studying integral and variational representations of these various objects and in so doing identify their primitives which all ..."
Abstract

Cited by 17 (6 self)
 Add to MetaCart
We unify fdivergences, Bregman divergences, surrogate regret bounds, proper scoring rules, cost curves, ROCcurves and statistical information. We do this by systematically studying integral and variational representations of these various objects and in so doing identify their primitives which all are related to costsensitive binary classification. As well as developing relationships between generative and discriminative views of learning, the new machinery leads to tight and more general surrogate regret bounds and generalised Pinsker inequalities relating fdivergences to variational divergence. The new viewpoint also illuminates existing algorithms: it provides a new derivation of Support Vector Machines in terms of divergences and relates Maximum Mean Discrepancy to Fisher Linear Discriminants.
Composite Multiclass Losses
"... We consider loss functions for multiclass prediction problems. We show when a multiclass loss can be expressed as a “proper composite loss”, which is the composition of a proper loss and a link function. We extend existing results for binary losses to multiclass losses. We determine the stationarity ..."
Abstract

Cited by 9 (5 self)
 Add to MetaCart
We consider loss functions for multiclass prediction problems. We show when a multiclass loss can be expressed as a “proper composite loss”, which is the composition of a proper loss and a link function. We extend existing results for binary losses to multiclass losses. We determine the stationarity condition, Bregman representation, ordersensitivity, existence and uniqueness of the composite representation for multiclass losses. We subsume existing results on “classification calibration ” by relating it to properness and show that the simple integral representation for binary proper losses can not be extended to multiclass losses. 1
MIXABILITY IS BAYES RISK CURVATURE RELATIVE TO LOG LOSS
"... Given K codes, a standard result from source coding tells us how to design a single universal code with codelengths within log(K) bits of the best code, on any data sequence. Translated to the online learning setting of prediction with expert advice, this result implies that for logarithmic loss one ..."
Abstract

Cited by 4 (3 self)
 Add to MetaCart
Given K codes, a standard result from source coding tells us how to design a single universal code with codelengths within log(K) bits of the best code, on any data sequence. Translated to the online learning setting of prediction with expert advice, this result implies that for logarithmic loss one can guarantee constant regret, which does not grow with the number of outcomes that need to be predicted. In this setting, it is known for which other losses the same guarantee can be given: these are the losses that are mixable. We show that among the mixable losses, log loss is special: in fact, one may understand the class of mixable losses as those that behave like log loss in an essential way. More specifically, a loss is mixable if and only if the curvature of its Bayes risk is at least as large as the curvature of the Bayes risk for log loss (for which the Bayes risk equals the entropy). 1.
The convexity and design of composite multiclass losses
 In Proceedings of the 29th International Conference on Machine Learning (ICML12
"... We consider composite loss functions for multiclass prediction comprising a proper (i.e., Fisherconsistent) loss over probability distributions and an inverse link function. We establish conditions for their (strong) convexity and explore the implications. We also show how the separation of concerns ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
We consider composite loss functions for multiclass prediction comprising a proper (i.e., Fisherconsistent) loss over probability distributions and an inverse link function. We establish conditions for their (strong) convexity and explore the implications. We also show how the separation of concerns afforded by using this composite representation allows for the design of families of losses with the same Bayes risk. 1.
25th Annual Conference on Learning Theory A Characterization of Scoring Rules for Linear Properties
"... We consider the design of proper scoring rules, equivalently proper losses, when the goal is to elicit some function, known as a property, of the underlying distribution. We provide a full characterization of the class of proper scoring rules when the property is linear as a function of the input di ..."
Abstract
 Add to MetaCart
We consider the design of proper scoring rules, equivalently proper losses, when the goal is to elicit some function, known as a property, of the underlying distribution. We provide a full characterization of the class of proper scoring rules when the property is linear as a function of the input distribution. A key conclusion is that any such scoring rule can be written in the form of a Bregman divergence for some convex function. We also apply our results to the design of prediction market mechanisms, showing a strong equivalence between scoring rules for linear properties and automated prediction market makers. 1.
25th Annual Conference on Learning Theory Divergences and Risks for Multiclass Experiments
"... Csiszár’s fdivergence is a way to measure the similarity of two probability distributions. We study the extension of fdivergence to more than two distributions to measure their joint similarity. By exploiting classical results from the comparison of experiments literature we prove the resulting di ..."
Abstract
 Add to MetaCart
Csiszár’s fdivergence is a way to measure the similarity of two probability distributions. We study the extension of fdivergence to more than two distributions to measure their joint similarity. By exploiting classical results from the comparison of experiments literature we prove the resulting divergence satisfies all the same properties as the traditional binary one. Considering the multidistribution case actually makes the proofs simpler. The key to these results is a formal bridge between these multidistribution fdivergences and Bayes risks for multiclass classification problems.
24th Annual Conference on Learning Theory Mixability is Bayes Risk Curvature Relative to Log Loss
"... Mixability of a loss governs the best possible performance when aggregating expert predictions with respect to that loss. The determination of the mixability constant for binary losses is straightforward but opaque. In the binary case we make this transparent and simpler by characterising mixability ..."
Abstract
 Add to MetaCart
Mixability of a loss governs the best possible performance when aggregating expert predictions with respect to that loss. The determination of the mixability constant for binary losses is straightforward but opaque. In the binary case we make this transparent and simpler by characterising mixability in terms of the second derivative of the Bayes risk of proper losses. We then extend this result to multiclass proper losses where there are few existing results. We show that mixability is governed by the Hessian of the Bayes risk, relative to the Hessian of the Bayes risk for log loss. We conclude by comparing our result to other work that bounds prediction performance in terms of the geometry of the Bayes risk. Although all calculations are for proper losses, we also show how to carry the results across to improper losses.
Loss Functions
"... Abstract Vapnik described the “three main learning problems ” of pattern recognition, regression estimation and density estimation. These are defined in terms of the loss functions used to evaluate performance (01 loss, squared loss and log loss respectively). But there are many other loss function ..."
Abstract
 Add to MetaCart
Abstract Vapnik described the “three main learning problems ” of pattern recognition, regression estimation and density estimation. These are defined in terms of the loss functions used to evaluate performance (01 loss, squared loss and log loss respectively). But there are many other loss functions one could use. In this chapter I will summarise some recent work by myself and colleagues studying the theoretical aspects of loss functions. The results elucidate the richness of the set of loss functions and explain some of the implications of their choice. 1
JMLR: Workshop and Conference Proceedings vol (2012) 1–20 Divergences and Risks for Multiclass Experiments
"... Csiszár’s fdivergence is a way to measure the similarity of two probability distributions. We study the extension of fdivergence to more than two distributions to measure their joint similarity. By exploiting classical results from the comparison of experiments literature we prove the resulting di ..."
Abstract
 Add to MetaCart
Csiszár’s fdivergence is a way to measure the similarity of two probability distributions. We study the extension of fdivergence to more than two distributions to measure their joint similarity. By exploiting classical results from the comparison of experiments literature we prove the resulting divergence satisfies all the same properties as the traditional binary one. Considering the multidistribution case actually makes the proofs simpler. The key to these results is a formal bridge between these multidistribution fdivergences and Bayes risks for multiclass classification problems.