## Support Vector Machines with Embedded Reject Option (0)

Venue: | Proceedings of the Int. Workshop on Pattern Recognition with Support Vector Machines (SVM2002), Niagara Falls |

Citations: | 10 - 1 self |

### BibTeX

@INPROCEEDINGS{Fumera_supportvector,

author = {Giorgio Fumera and Fabio Roli},

title = {Support Vector Machines with Embedded Reject Option},

booktitle = {Proceedings of the Int. Workshop on Pattern Recognition with Support Vector Machines (SVM2002), Niagara Falls},

year = {},

pages = {68--82},

publisher = {Springer}

}

### OpenURL

### Abstract

In this paper, the problem of implementing the reject option i n support vector machines (SVMs) is addressed. We started by observing that methods proposed so far simply apply a reject threshold to the outputs of a trained SVM. We then showed that, under the framework of the structural risk minimisation principle, the rejection region must be determined during the training phase of a classifier. By applying this concept, and by following Vapnik's approach, we developed a maximum margin classifier with reject option. This led us to a SVM whose rejection region is determined during the training phase, that is, a SVM with embedded reject option. To implement such a SVM, we devised a novel formulation of the SVM training problem and developed a specific algorithm to solve it. Preliminary results on a character recognition problem show the advantages of the proposed SVM i n terms of the achievable error-reject trade-off.

### Citations

9810 | The Nature of Statistical Learning Theory
- Vapnik
- 1995
(Show Context)
Citation Context ...tructed by a given classifier. However, this heuristic approach is not coherent with the theoretical foundations of SVMs, which are based on the structural risk minimisation (SRM) induction principle =-=[11]-=-. In this paper, we propose a different approach for introducing the reject option in the framework of SVM classifiers. Our approach is based on the observation that, under the framework of the SRM pr... |

5290 |
Neural networks for pattern recognition
- Bishop
- 1995
(Show Context)
Citation Context ...ions, the a posteriori probabilities are usually unknown [19]. Some classifiers, like neural networks and the k-nearest neighbours classifier, provide approximations of the a posteriori probabilities =-=[3,4,19]. -=-In such case, Chow’s rule is commonly used, despite its non-optimality [19]. Other classifiers, like support vector machines (SVMs), do not provide probabilistic outputs. In this case, a rejection t... |

2350 | Support-vector networks
- Cortes, Vapnik
- 1995
(Show Context)
Citation Context ...linearly separable classes, was based on the idea of finding the hyperplane which minimises the number of training errors, and separates the remaining correctly classified samples with maximum margin =-=[13]-=-. By analogy, we assume that the OSHR can be defined as a pair of parallel hyperplanes (8) which minimise the empirical risk (12), and separate with maximum margin the samples correctly classified and... |

1536 |
Making Large-Scale SVM Learning Practical
- Joachims
- 1999
(Show Context)
Citation Context ...ure consists in training a standard SVM, and rejecting the patterns x for which f( x) < D, where D is a predefined threshold. To implement this technique, we trained SVMs using the software SVM light =-=[18]-=-, available at http://svmlight.joachims.org. The value of the parameter C was automatically set by SVM light . In our experiments, we used a linear kernel. The A-R curve achievable using this techniqu... |

840 |
Sherali H, Shetty C. Nonlinear Programming: Theory and Algorithms, 2-nd Edition
- Bazaraa
- 1993
(Show Context)
Citation Context ...oblem (23,24) can be found by minimising the Lagrange function with respect to w, b, ξ and ε, under constraints 0 ≤ε ≤1, and then maximising it with respect to the non-negative Lagrange multipl=-=iers α [14]. -=-Note that the Lagrange function is the sum of a convex function of w and b, and a non-convex function of ξ and ε. Accordingly, its minimum with respect to w and b can be found by imposing stationari... |

755 | Probabilistic outputs for support vector machines and comparison to regularized likelihood methods
- Platt
- 2000
(Show Context)
Citation Context ...how’s rule can be applied. Usually, for distance classifiers (like the Fisher’s linear discriminant) the mapping is implemented using a sigmoid function [6]. This method was also proposed for SVMs=-= in [7], -=-using the following form for the sigmoid function: 1 Py ( =+ 1 | x)= , (1) 1 + exp af ( x)+ b ( ) where the class labels are denoted as y =+ 1, −1, while a and b are constant terms to be defined on ... |

46 | Moderating the outputs of support vector machine classifiers
- Kwok
- 1999
(Show Context)
Citation Context ...ance was proposed in [9]. The corresponding estimate of Py=+ ( 1|x ) is again a sigmoid function. A more complex method based on a Bayesian approach, the so-called evidence framework, was proposed in =-=[10]-=-. Nonetheless, also in this case the resulting estimate of Py=+ ( 1|x ) is a sigmoid-like function. We point out that all the mentioned methods provide estimates of the posterior probabilities that ar... |

34 | Reject option with multiple thresholds
- Fumera, Roli, et al.
(Show Context)
Citation Context ...iven threshold [2]. The optimality of this rule relies on the exact knowledge of the a posteriori probabilities. However, in practical applications, the a posteriori probabilities are usually unknown =-=[19]. -=-Some classifiers, like neural networks and the k-nearest neighbours classifier, provide approximations of the a posteriori probabilities [3,4,19]. In such case, Chow’s rule is commonly used, despite... |

10 | The learning behavior of single neuron classifiers on linearly separable or nonseparable input
- BASU, HO
- 1999
(Show Context)
Citation Context ... on non-linearly separable problems, since these are the most significant for testing the performance of rejection techniques. The non-linearly separable problems are 193 out of 325, as identified in =-=[17]-=-. For each two-class problem, we randomly subdivided the patterns of the corresponding classes in a training set and a test set of equal size.sAs explained in Sect. 1, the main rejection technique pro... |

5 |
Estimation of Dependences Based on Empirical Data, Addendum 1
- Vapnik
- 1982
(Show Context)
Citation Context ... the margin with which the training samples can be separated without errors. This suggested the concept of optimal separating hyperplane as the one which separates the two classes with maximum margin =-=[12]-=-. The extension of the concept of OSH to the general case of non-linearly separable classes, was based on the idea of finding the hyperplane which minimises the number of training errors, and separate... |

2 |
Fast training of supprt vector machines using sequential minimal optimisation
- Platt
- 1999
(Show Context)
Citation Context ... last characteristic above to develop an algorithm for solving the dual problem (27,28). Our algorithm is derived from the sequential minimal optimisation (SMO) algorithm, developed for standard SVMs =-=[15]-=-. Basically, SMO iteratively maximises the dual objective function of the standard SVMs dual problem by updating at each step only two lagrange multipliers, while enforcing the constraints. It is wort... |

2 |
Advanced Methods for Pattern Recognition with Reject Option
- Fumera
- 2002
(Show Context)
Citation Context ...ipliers at each iteration, and to implement a stopping criterion, specific heuristics were used, which exploit the characteristics of problem (27,28). More details about our algorithm can be found in =-=[16]. 4 -=-Experimental Results The aim of our experiments was to compare the performance of our SVM with embedded reject option, with that of standard SVMs with the “external” reject technique described in ... |