## Cost-sensitive learning in Support Vector Machines (2002)

Venue: | In VIII Convegno Associazione Italiana per L’Intelligenza Artificiale |

Citations: | 6 - 0 self |

### BibTeX

@INPROCEEDINGS{Fumera02cost-sensitivelearning,

author = {Giorgio Fumera and Fabio Roli},

title = {Cost-sensitive learning in Support Vector Machines},

booktitle = {In VIII Convegno Associazione Italiana per L’Intelligenza Artificiale},

year = {2002}

}

### Years of Citing Articles

### OpenURL

### Abstract

In this paper, a cost-sensitive learning method for support vector machine (SVM) classifiers is proposed. We focus on a particular case of cost-sensitive problems, namely, classification with reject option. Standard learning algorithms, the one for SVMs included, are not cost-sensitive. In particular, they can not handle the reject option. However, we show that, under the framework of the structural risk minimisation induction principle, on which standard SVMs are based, the rejection region should be determined during the training phase of a classifier, by the learning algorithm. We apply this approach to develop a cost-sensitive SVM classifier, by following Vapnik's maximum margin method to the derivation of standard SVMs.

### Citations

9928 | The nature of statistical learning theory
- Vapnik
- 1995
(Show Context)
Citation Context ...on, regression, and density estimation problems. The SVM learning method is based on the structural risk minimisation (SRM) induction principle, which was derived from the statistical learning theory =-=[5]-=-. SVMs have proven to be effective in many practical applications. However, SVMs are not cost-sensitive, like traditional classifiers. In particular, despite of their strong theoretical foundations, o... |

2383 | Support vector networks
- Cortes, Vapnik
- 1995
(Show Context)
Citation Context ... trained classifier. This corresponds again to the second of the above approaches. In this paper, we focus on the problem of implementing the reject option in support vector machine (SVM) classifiers =-=[4]-=-. SVMs are a recently introduced technique for pattern recognition, regression, and density estimation problems. The SVM learning method is based on the structural risk minimisation (SRM) induction pr... |

1720 |
An Introduction to Support Vector Machines and other Kernel-Based Learning Methods
- CRISTIANINI, SHAWE-TAYLOR
- 2000
(Show Context)
Citation Context ...ore the entire kernel matrix K( xi, xj) in memory. Therefore, several iterative optimisation algorithms targeted to problem (10,11) have been proposed in the literature, based on different heuristics =-=[13]-=-. In particular, these algorithms exploit the KKT conditions to speed up the convergence, and to implement the stopping criterion. 2.2 SVMs with embedded reject option Let us now consider the problem ... |

1561 |
Making large-Scale SVM Learning Practical
- Joachims
- 1999
(Show Context)
Citation Context ...sts in rejecting the patterns for which the output of a trained standard SVM is lower than a predefined threshold D. To implement this technique, we trained standard SVMs using the software SVM light =-=[19]-=-, available at http://svmlight.joachims.org. The value of the regularisation parameter C was automatically set by SVM light . In our experiments, we used a linear kernel (that is, a linear decision su... |

867 |
Nonlinear Programming. Theory and Algorithms. 2nd Edition
- Bazaraa, Sherali, et al.
- 1993
(Show Context)
Citation Context ...al (23,24) problems. Nevertheless, since the objective function (21) of the primal problem is continuous, the objective function of the dual problem (23) is concave, and therefore has no local maxima =-=[15]-=-. We exploited the last characteristic above to develop an algorithm for solving the dual problem (23,24). Our algorithm is derived from the sequential minimal optimisation (SMO) algorithm, developed ... |

765 | Probabilistic outputs for support vector machines and comparison to regularized likelihood methods
- Platt
- 1999
(Show Context)
Citation Context ...ich the absolute value of the SVM output is lower than a predefined threshold [6]. The second method consists in mapping the SVM outputs to posterior probabilities, so that Chow’s rule can be applie=-=d [7,8,9,10]-=-. The mapping is implemented by using sigmoidal-like functions, as usually happens for distance classifiers [11]. We point out that these two method are equivalent, since the estimates of the a poster... |

296 | Classification by pairwise coupling
- Hastie, Tibshirani
- 1998
(Show Context)
Citation Context ...ich the absolute value of the SVM output is lower than a predefined threshold [6]. The second method consists in mapping the SVM outputs to posterior probabilities, so that Chow’s rule can be applie=-=d [7,8,9,10]-=-. The mapping is implemented by using sigmoidal-like functions, as usually happens for distance classifiers [11]. We point out that these two method are equivalent, since the estimates of the a poster... |

55 | Support vector machine classification of microarray data - MUKHERJEE, TAMAYO, et al. - 1999 |

46 | Moderating the outputs of support vector machine classifiers
- Kwok
- 1999
(Show Context)
Citation Context ...ich the absolute value of the SVM output is lower than a predefined threshold [6]. The second method consists in mapping the SVM outputs to posterior probabilities, so that Chow’s rule can be applie=-=d [7,8,9,10]-=-. The mapping is implemented by using sigmoidal-like functions, as usually happens for distance classifiers [11]. We point out that these two method are equivalent, since the estimates of the a poster... |

34 | Reject option with multiple thresholds
- Fumera, Roli, et al.
(Show Context)
Citation Context ... Chow’s rule (2) is applied to the estimates of the a posteriori probabilities provided by a trained classifier. For instance, this is the case of neural networks and k-nearest neighbours classifier=-=s [3]-=-. For classifiers which do not provide estimates of the a posteriori probabilities, heuristic rejection rules targeted to the particular classifier are used. Anyway, such rules are usually based on re... |

29 |
On optimum error and reject tradeoff
- Chow
(Show Context)
Citation Context ...rect classifications do not depend on the classes. These costs will be denoted respectively as cE, cR, and cC (obviously, cE > cR > cC). The optimal decision rule for this problem was defined by Chow =-=[2]. Ch-=-ow’s rule consists in accepting a pattern x, and assigning it to the class with maximum posterior probability, if it is higher than a predefined reject threshold T: cE − cR assign x to class i= ar... |

18 | Classifier conditional posterior probabilities
- Duin, Tax
- 1998
(Show Context)
Citation Context ...ng the SVM outputs to posterior probabilities, so that Chow’s rule can be applied [7,8,9,10]. The mapping is implemented by using sigmoidal-like functions, as usually happens for distance classifier=-=s [11]-=-. We point out that these two method are equivalent, since the estimates of the a posteriori probabilities obtained using a sigmoidal-like function are monotonic functions of the SVM output. Therefore... |

10 | Support vector machines with embedded reject option
- Fumera, Roli
- 2002
(Show Context)
Citation Context ...accepted pattern as its distance from the hyperplane w⋅ x+ b = 0. Under the above assumption, it can be shown that the OSHR is the solution of an optimisation problem similar to that of standard SVM=-=s [14]-=-. In particular, a pair of parallel hyperplanes (13) which minimise the empirical risk (17), andsmaximise the margin of samples accepted and correctly classified, can be found by minimising the follow... |

10 | The learning behavior of single neuron classifiers on linearly separable or nonseparable input
- BASU, HO
- 1999
(Show Context)
Citation Context ... on non-linearly separable problems, since these are the most significant for testing the performance of rejection techniques. The non-linearly separable problems are 193 out of 325, as identified in =-=[18]-=-. For each two-class problem, we randomly subdivided the patterns of the corresponding classes in a training set and a test set of equal size. As explained in section 1, the rejection technique propos... |

5 |
The Foundations of Cost-Sensitive
- Elkan
- 2001
(Show Context)
Citation Context ...itive. This raises the question of how a classifier trained with a standard learning algorithm can be used for a cost-sensitive problem. This issue was addressed for the case of two-class problems in =-=[1]-=-, where two approaches were considered. The first approach consists in changing the proportion of positive and negative training patterns, according to the classification costs. The learning algorithm... |

5 |
Estimation of Dependences Based on Empirical Data, Addendum 1
- Vapnik
- 1982
(Show Context)
Citation Context ... margin with which the training samples can be separated without errors. This led to the concept of optimal separating hyperplane (OSH), as the one which separates the two classes with maximum margin =-=[12]-=-. The heuristic extension of the OSH to the general case of not linearly separable classes, was based on the idea of finding the hyperplane which minimises the number of training errors, and separates... |

3 |
A new approach of modifying SVM outputs
- Madevska-Bogdanova, Nikolic
- 2000
(Show Context)
Citation Context |

2 |
Fast training of supprt vector machines using sequential minimal optimisation
- Platt
- 1999
(Show Context)
Citation Context ... last characteristic above to develop an algorithm for solving the dual problem (23,24). Our algorithm is derived from the sequential minimal optimisation (SMO) algorithm, developed for standard SVMs =-=[16]-=-. More details about our algorithm can be found in [17]. It is worth noting that our algorithm was not optimised in terms of computational efficiency, since our primary goal was to evaluate the error-... |

2 |
Advanced Methods for Pattern Recognition with Reject Option
- Fumera
- 2002
(Show Context)
Citation Context ...solving the dual problem (23,24). Our algorithm is derived from the sequential minimal optimisation (SMO) algorithm, developed for standard SVMs [16]. More details about our algorithm can be found in =-=[17]-=-. It is worth noting that our algorithm was not optimised in terms of computational efficiency, since our primary goal was to evaluate the error-reject trade-off achievable by our technique. In its cu... |