Results 1  10
of
21
Composite Binary Losses
, 2009
"... We study losses for binary classification and class probability estimation and extend the understanding of them from margin losses to general composite losses which are the composition of a proper loss with a link function. We characterise when margin losses can be proper composite losses, explicitl ..."
Abstract

Cited by 11 (8 self)
 Add to MetaCart
We study losses for binary classification and class probability estimation and extend the understanding of them from margin losses to general composite losses which are the composition of a proper loss with a link function. We characterise when margin losses can be proper composite losses, explicitly show how to determine a symmetric loss in full from half of one of its partial losses, introduce an intrinsic parametrisation of composite binary losses and give a complete characterisation of the relationship between proper losses and “classification calibrated ” losses. We also consider the question of the “best ” surrogate binary loss. We introduce a precise notion of “best ” and show there exist situations where two convex surrogate losses are incommensurable. We provide a complete explicit characterisation of the convexity of composite binary losses in terms of the link function and the weight function associated with the proper loss which make up the composite loss. This characterisation suggests new ways of “surrogate tuning”. Finally, in an appendix we present some new algorithmindependent results on the relationship between properness, convexity and robustness to misclassification noise for binary losses and show that all convex proper losses are nonrobust to misclassification noise. 1
Overlaying classifiers: A practical approach for optimal ranking
 Adv. Neural Inf. Process. Syst
, 2009
"... The ROC curve is one of the most widely used visual tool to evaluate performance of scoring functions regarding their capacities to discriminate between two populations. It is the goal of this paper to propose a statistical learning method for constructing a scoring function with nearly optimal ROC ..."
Abstract

Cited by 9 (5 self)
 Add to MetaCart
The ROC curve is one of the most widely used visual tool to evaluate performance of scoring functions regarding their capacities to discriminate between two populations. It is the goal of this paper to propose a statistical learning method for constructing a scoring function with nearly optimal ROC curve. In this bipartite setup, the target is known to be the regression function up to an increasing transform and solving the optimization problem boils down to recovering the collection of level sets of the latter, which we interpret here as a continuum of imbricated classification problems. We propose a discretization approach, consisting in building a finite sequence of N classifiers by constrained empirical risk minimization and then constructing a piecewise constant scoring function sN(x) by overlaying the resulting classifiers. Given the functional nature of the ROC criterion, the accuracy of the ranking induced by sN(x) can be conceived in a variety of ways, depending on the distance chosen for measuring closeness to the optimal curve in the ROC space. By relating the ROC curve of the resulting scoring function to piecewise linear approximates of the optimal ROC curve, we establish the consistency of the method as well as rate bounds to control its generalization ability in supnorm. Eventually, we also highlight the fact that, as a byproduct, the algorithm proposed provides an accurate estimate of the optimal ROC curve.
Nonparametric assessment of contamination in multivariate data using minimumvolume sets and FDR
, 2007
"... Large, multivariate datasets from highthroughput instrumentation have become ubiquitous throughout the sciences. Frequently, it is of great interest to characterize the measurements in these datasets by the extent to which they represent ‘nominal ’ versus ‘contaminated ’ instances. However, often t ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
Large, multivariate datasets from highthroughput instrumentation have become ubiquitous throughout the sciences. Frequently, it is of great interest to characterize the measurements in these datasets by the extent to which they represent ‘nominal ’ versus ‘contaminated ’ instances. However, often the nature of even the nominal patterns in the data are unknown and potentially quite complex, making their explicit parametric modeling a daunting task. In this paper, we introduce a nonparametric method for the simultaneous annotation of multivariate data (called MNSCAnn), by which one may produce an annotated ranking of the observations, indicating the relative extent to which each may or may not be considered nominal, while making minimal assumptions on the nature of the nominal distribution. In our framework each observation is linked to a corresponding minimum volume set and, implicitly adopting a hypothesis testing perspective, each set is associated with a test, which in turn is accompanied by a certain false discovery rate. The combination of minimum volume set methods with false discovery rate principles, in the context of contaminated data, is new. Moreover, estimation of the key underlying quantities requires that a number of issues be addressed. We illustrate MNSCAnn through examples in two contexts – the preprocessing of cellbased assays in bioinformatics, and the detection of anomalous traffic patterns in Internet measurement studies.
An exponential lower bound on the complexity of regularization paths. arXiv:0903.4817v2 [cs.LG
, 2009
"... For a variety of regularization methods, algorithms computing the entire solution path have been developed recently. Solution path algorithms do not only compute the solution for one particular value of the regularization parameter but the entire path of solutions, making the selection of an optimal ..."
Abstract

Cited by 6 (3 self)
 Add to MetaCart
For a variety of regularization methods, algorithms computing the entire solution path have been developed recently. Solution path algorithms do not only compute the solution for one particular value of the regularization parameter but the entire path of solutions, making the selection of an optimal parameter much easier. It has been assumed that these piecewise linear solution paths have only linear complexity, i.e. linearly many bends. We prove that for the support vector machine this complexity can indeed be exponential in the number of training points in the worst case.
Risk minimization, probability elicitation, and costsensitive SVMs
"... A new procedure for learning costsensitive SVM classifiers is proposed. The SVM hinge loss is extended to the cost sensitive setting, and the costsensitive SVM is derived as the minimizer of the associated risk. The extension of the hinge loss draws on recent connections between risk minimization ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
A new procedure for learning costsensitive SVM classifiers is proposed. The SVM hinge loss is extended to the cost sensitive setting, and the costsensitive SVM is derived as the minimizer of the associated risk. The extension of the hinge loss draws on recent connections between risk minimization and probability elicitation. These connections are generalized to costsensitive classification, in a manner that guarantees consistency with the costsensitive Bayes risk, and associated Bayes decision rule. This ensures that optimal decision rules, under the new hinge loss, implement the Bayesoptimal costsensitive classification boundary. Minimization of the new hinge loss is shown to be a generalization of the classic SVM optimization problem, and can be solved by identical procedures. The resulting algorithm avoids the shortcomings of previous approaches to costsensitive SVM design, and has superior experimental performance. 1.
GENERALIZATION ERROR ANALYSIS FOR FDR CONTROLLED CLASSIFICATION
"... The false discovery rate (FDR) and false nondiscovery rate (FNDR) have received considerable attention in the literature on multiple testing. These performance measures are also appropriate for classification, and in this work we develop generalization error bounds for FDR and FNDR from the perspect ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
The false discovery rate (FDR) and false nondiscovery rate (FNDR) have received considerable attention in the literature on multiple testing. These performance measures are also appropriate for classification, and in this work we develop generalization error bounds for FDR and FNDR from the perspective of statistical learning theory. Unlike more conventional classification performance measures, the empirical FDR and FNDR are not binomial random variables but rather a ratio of binomials, which introduces several challenges not addressed in conventional analyses. We develop distributionfree uniform deviation bounds and apply these, in conjunction with the BorelCantelli lemma, to obtain a strongly consistent learning rule. Index Terms — Statistical learning theory, false discovery
ANNOTATED MINIMUM VOLUME SETS FOR NONPARAMETRIC ANOMALY DISCOVERY
"... We consider an anomaly detection problem, wherein a combination of typical and anomalous data are observed and it is necessary to identify the anomalies in this particular dataset without recourse to labeled exemplars. We take as our goal to produce an annotated ranking of the observations, indicati ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
We consider an anomaly detection problem, wherein a combination of typical and anomalous data are observed and it is necessary to identify the anomalies in this particular dataset without recourse to labeled exemplars. We take as our goal to produce an annotated ranking of the observations, indicating the relative priority for each to be examined further as a possible anomaly, while making no assumptions on the distribution of typical data. We propose a framework in which each observation is linked to a corresponding minimum volume set and, implicitly adopting a hypothesis testing perspective, each set is associated with a test. An inherent ordering of these sets yields a natural ranking, while the association of each test with a false discovery rate yields an appropriate annotation. The combination of minimum volume set methods with false discovery rate principles, in the context of data contaminated by anomalies, is new and estimation of the key underlying quantities requires that a number of issues be addressed. We offer some solutions to the relevant estimation problems, and illustrate the proposed methodology on synthetic and computer network traffic data. Index Terms — minimum volume sets, false discovery rate, nonparametric outlier detection, multiple level set estimation, monotone density estimation 1.
Nested Support Vector Machines
, 2008
"... The oneclass and costsensitive support vector machines (SVMs) are stateoftheart machine learning methods for estimating density level sets and solving weighted classification problems, respectively. However, the solutions of these SVMs do not necessarily produce set estimates that are nested as ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
The oneclass and costsensitive support vector machines (SVMs) are stateoftheart machine learning methods for estimating density level sets and solving weighted classification problems, respectively. However, the solutions of these SVMs do not necessarily produce set estimates that are nested as the parameters controlling the density level or costasymmetry are continuously varied. Such a nesting constraint is desirable for applications requiring the simultaneous estimation of multiple sets, including clustering, anomaly detection, and ranking problems. We propose new quadratic programs whose solutions give rise to nested extensions of the oneclass and costsensitive SVMs. Furthermore, like conventional SVMs, the solution paths in our construction are piecewise linear in the control parameters, with significantly fewer breakpoints. We also describe decomposition algorithms to solve the quadratic programs. These methods are compared to conventional SVMs on synthetic and benchmark data sets, and are shown to exhibit more stable rankings and decreased sensitivity to parameter settings.
Tuning Support Vector Machines for Minimax and NeymanPearson Classification
, 2009
"... This paper studies the training of support vector machine (SVM) classifiers with respect to the minimax and NeymanPearson criteria. In principle, these criteria can be optimized in a straightforward way using a costsensitive SVM. In practice, however, because these criteria require especially accu ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
This paper studies the training of support vector machine (SVM) classifiers with respect to the minimax and NeymanPearson criteria. In principle, these criteria can be optimized in a straightforward way using a costsensitive SVM. In practice, however, because these criteria require especially accurate error estimation, standard techniques for tuning SVM parameters, such as crossvalidation, can lead to poor classifier performance. To address this issue, we first prove that the usual costsensitive SVM, here called the 2CSVM, is equivalent to another formulation called the 2νSVM. We then exploit a characterization of the 2νSVM parameter space to develop a simple yet powerful approach to error estimation based on smoothing. In an extensive experimental study we demonstrate that smoothing significantly improves the accuracy of crossvalidation error estimates, leading to dramatic performance gains. Furthermore, we propose coordinate descent strategies that offer significant gains in computational efficiency, with little to no loss in performance.
Performance and Preferences: Interactive Refinement of Machine Learning Procedures
"... Problemsolving procedures have been typically aimed at achieving welldefined goals or satisfying straightforward preferences. However, learners and solvers may often generate rich multiattribute results with procedures guided by sets of controls that define different dimensions of quality. We expl ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Problemsolving procedures have been typically aimed at achieving welldefined goals or satisfying straightforward preferences. However, learners and solvers may often generate rich multiattribute results with procedures guided by sets of controls that define different dimensions of quality. We explore methods that enable people to explore and express preferences about the operation of classification models in supervised multiclass learning. We leverage a leaveoneout confusion matrix that provides users with views and realtime controls of a model space. The approach allows people to consider in an interactive manner the global implications of local changes in decision boundaries. We focus on kernel classifiers and show the effectiveness of the methodology on a variety of tasks.