Results 1 
5 of
5
Learning from Corrupted Binary Labels via ClassProbability Estimation
"... Abstract Many supervised learning problems involve learning from samples whose labels are corrupted in some way. For example, each label may be flipped with some constant probability (learning with label noise), or one may have a pool of unlabelled samples in lieu of negative samples (learning from ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Abstract Many supervised learning problems involve learning from samples whose labels are corrupted in some way. For example, each label may be flipped with some constant probability (learning with label noise), or one may have a pool of unlabelled samples in lieu of negative samples (learning from positive and unlabelled data). This paper uses classprobability estimation to study these and other corruption processes belonging to the mutually contaminated distributions framework Learning from corrupted binary labels In many practical scenarios involving learning from binary labels, one observes samples whose labels are corrupted versions of the actual ground truth. For example, in learning from classconditional label noise (CCN learning), the labels are flipped with some constant probability A fundamental question is whether one can minimise a given performance measure with respect to D, given access only to samples from D corr . Intuitively, in general this requires knowledge of the parameters of the corruption process that determines D corr . This yields two further questions: are there measures for which knowledge of these corruption parameters is unnecessary, and for other measures, can we estimate these parameters? In this paper, we consider corruption problems belonging to the mutually contaminated distributions framework While some of our results are known for the special cases of CCN and PU learning, our interest is in determining to what extent they generalise to other label corruption problems. This is a step towards a unified treatment of these problems. We now fix notation and formalise the problem.
Convex calibration dimension for multiclass loss matrices.
 Journal of Machine Learning Research.
, 2015
"... Abstract We study consistency properties of surrogate loss functions for general multiclass learning problems, defined by a general multiclass loss matrix. We extend the notion of classification calibration, which has been studied for binary and multiclass 01 classification problems (and for certa ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Abstract We study consistency properties of surrogate loss functions for general multiclass learning problems, defined by a general multiclass loss matrix. We extend the notion of classification calibration, which has been studied for binary and multiclass 01 classification problems (and for certain other specific learning problems), to the general multiclass setting, and derive necessary and sufficient conditions for a surrogate loss to be calibrated with respect to a loss matrix in this setting. We then introduce the notion of convex calibration dimension of a multiclass loss matrix, which measures the smallest 'size' of a prediction space in which it is possible to design a convex surrogate that is calibrated with respect to the loss matrix. We derive both upper and lower bounds on this quantity, and use these results to analyze various loss matrices. In particular, we apply our framework to study various subset ranking losses, and use the convex calibration dimension as a tool to show both the existence and nonexistence of various types of convex calibrated surrogates for these losses. Our results strengthen recent results of
Extreme FMeasure Maximization using Sparse Probability Estimates Krzysztof Dembczyński Karlson Pfannschmidt Timo Klerx Eyke Hüllermeier
"... Abstract We consider the problem of (macro) Fmeasure maximization in the context of extreme multilabel classification (XMLC), i.e., multilabel classification with extremely large label spaces. We investigate several approaches based on recent results on the maximization of complex performance me ..."
Abstract
 Add to MetaCart
Abstract We consider the problem of (macro) Fmeasure maximization in the context of extreme multilabel classification (XMLC), i.e., multilabel classification with extremely large label spaces. We investigate several approaches based on recent results on the maximization of complex performance measures in binary classification. According to these results, the Fmeasure can be maximized by properly thresholding conditional class probability estimates. We show that a naïve adaptation of this approach can be very costly for XMLC and propose to solve the problem by classifiers that efficiently deliver sparse probability estimates (SPEs), that is, probability estimates restricted to the most probable labels. Empirical results provide evidence for the strong practical performance of this approach.
Linking losses for density ratio and classprobability estimation
"... Abstract Given samples from two densities p and q, density ratio estimation (DRE) is the problem of estimating the ratio p/q. In this paper, we formally relate DRE and classprobability estimation (CPE), and theoretically justify the use of existing losses from one problem for the other. In the CPE ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract Given samples from two densities p and q, density ratio estimation (DRE) is the problem of estimating the ratio p/q. In this paper, we formally relate DRE and classprobability estimation (CPE), and theoretically justify the use of existing losses from one problem for the other. In the CPE to DRE direction, we show that essentially any CPE loss (e.g. logistic, exponential) minimises a Bregman divergence to the true density ratio, and thus can be used for DRE. We also show how different losses focus on accurately modelling different ranges of the density ratio, and use this to design new CPE losses for DRE. In the DRE to CPE direction, we argue that the least squares importance fitting method has potential use for bipartite ranking of instances with maximal accuracy at the head of the ranking. Our analysis relies on a novel Bregman divergence identity that may be of independent interest.