## Multicategory Classification by Support Vector Machines (1999)

Venue: | Computational Optimizations and Applications |

Citations: | 56 - 0 self |

### BibTeX

@ARTICLE{Bredensteiner99multicategoryclassification,

author = {Erin J. Bredensteiner and Kristin P. Bennett},

title = {Multicategory Classification by Support Vector Machines},

journal = {Computational Optimizations and Applications},

year = {1999},

volume = {12},

pages = {53--79}

}

### Years of Citing Articles

### OpenURL

### Abstract

We examine the problem of how to discriminate between objects of three or more classes. Specifically, we investigate how two-class discrimination methods can be extended to the multiclass case. We show how the linear programming (LP) approaches based on the work of Mangasarian and quadratic programming (QP) approaches based on Vapnik's Support Vector Machines (SVM) can be combined to yield two new approaches to the multiclass problem. In LP multiclass discrimination, a single linear program is used to construct a piecewise linear classification function. In our proposed multiclass SVM method, a single quadratic program is used to construct a piecewise nonlinear classification function. Each piece of this function can take the form of a polynomial, radial basis function, or even a neural network. For the k > 2 class problems, the SVM method as originally proposed required the construction of a two-class SVM to separate each class from the remaining classes. Similarily, k two-class linear programs can be used for the multiclass problem. We performed an empirical study of the original LP method, the proposed k LP method, the proposed single QP method and the original k QP methods. We discuss the advantages and disadvantages of each approach. 1 1

### Citations

8980 | Statistical Learning Theory
- Vapnik
- 1998
(Show Context)
Citation Context ...w the the two research directions have di#ered in their approach to solving problems with k > 2 classes. The original SVM method for multiclass problems was to find k separate two-class discriminants =-=[23]-=-. Each discriminant is constructed by separating a single class from all the others. This process requires the solution of k quadratic programs. When applying all k classifiers to the original multica... |

2171 | Support-vector networks
- Cortes, Vapnik
- 1995
(Show Context)
Citation Context ...use it is a function of a subset of the training data known as support vectors. Specific implementations such as the Generalized Optimal Plane (GOP) method has proven to perform very well in practice =-=[8]-=-. Throughout this paper we will refer to the two di#erent approaches as RLP and SVM. The primary focus of this paper is how the the two research directions have di#ered in their approach to solving pr... |

741 |
Aha, “UCI repository of machine learning data bases,” http: //www.ics.uci.edu/~mlearn/MLRepository.html
- Murphy, W
- 1992
(Show Context)
Citation Context ...class 1, 71 points in class 2, and 48 points in class 3. This dataset is available via anonymous file transfer protocol (ftp) from the UCI Repository of Machine Learning Databases and Domain Theories =-=[16]-=- at ftp://ftp.ics.uci.edu/pub/machine-learning-databases. Glass Identification Database The Glass dataset [11] is used to identify the origin of a sample of glass through chemical analysis. This datas... |

304 |
Backpropagation applied to handwritten zip code recognition
- Cun, Boser, et al.
- 1989
(Show Context)
Citation Context ...fer protocol (ftp) from the UCI Repository of Machine Learning Databases and Domain Theories [16] at ftp://ftp.ics.uci.edu/pub/machine-learning-databases. US Postal Service Database The USPS Database =-=[10]-=- contains zipcode samples from actual mail. This database is comprised of separate training and testing sets. There are 7291 samples in the training set and 2007 samples in the testing set. Each sampl... |

193 |
Nonlinear programming
- Mangasarian
- 1994
(Show Context)
Citation Context ...represented by a circle around the point. Some points have double circles which indicate that two dual variables u ij l > 0, j = 1, . . . , 3, j #= i. By the complementarity within the KKT conditions =-=[14], u ij l >-=- 0 # A i l (w i -w j ) = (# i - # j ) + 1. Consequently the support vectors are located "closest" to the separating function. In fact, the remainder of the points, those that are not support... |

136 | Training invariant support vector machines
- DeCoste, Schölkopf
(Show Context)
Citation Context ...# is the width parameter) , and two-layer neural networks (K(x, x i ) = S[v(x T x i ) + c] where S(u) is a sigmoid function) [23]. Variants of SVM (10) have proven to be quite successful in paractice =-=[21, 22, 7]-=-. Note that the number of variables in Program (10) remains constant as K(x, x i ) increases in dimensionality. Additionally, the objective function remains quadratic and thus the complexity of the pr... |

132 |
Multisurface method of pattern separation for medical diagnosis applied to breast cytology
- Wolberg, Mangasarian
- 1990
(Show Context)
Citation Context ...ltisurface Method of Mangasarian [12, 13]. This method and it's later extension the Robust Linear Programming (RLP) approach [6] have been used in a highly successfully breast cancer diagnosis system =-=[26]-=-. The second direction is the quadratic programming (QP) methods based on Vapnik's Statistical Learning Theory [24, 25]. Statistical Learning Theory addresses mathematically the problem of how to best... |

125 | Comparing support vector machines with gaussian kernels to radial basis function classi ers
- Schölkopf, Sung, et al.
- 1997
(Show Context)
Citation Context ...# is the width parameter) , and two-layer neural networks (K(x, x i ) = S[v(x T x i ) + c] where S(u) is a sigmoid function) [23]. Variants of SVM (10) have proven to be quite successful in paractice =-=[21, 22, 7]-=-. Note that the number of variables in Program (10) remains constant as K(x, x i ) increases in dimensionality. Additionally, the objective function remains quadratic and thus the complexity of the pr... |

100 |
Theory of Pattern Recognition
- Vapnik, Chervonenkis
- 1974
(Show Context)
Citation Context ... approach [6] have been used in a highly successfully breast cancer diagnosis system [26]. The second direction is the quadratic programming (QP) methods based on Vapnik's Statistical Learning Theory =-=[24, 25]-=-. Statistical Learning Theory addresses mathematically the problem of how to best construct functions that generalize well on future points. The problem of constructing the best linear two-class discr... |

76 |
Minos 5.4 user's guide
- Murtagh, Saunders
- 1993
(Show Context)
Citation Context ...planes and the maximum classification error. The main advantage of the RLP method over the SVM problem is that RLP is a linear program solvable using very robust algorithms such as the Simplex Method =-=[17]-=-. SVM requires the solution of quadratic program that is typically much more computationally costly for the same size problem. In [3], the RLP method was found to generalize as well as the linear SVM ... |

66 | view-based object recognition algorithms using realistic 3d models
- Blanz, Schölkopf, et al.
- 1996
(Show Context)
Citation Context ...# is the width parameter) , and two-layer neural networks (K(x, x i ) = S[v(x T x i ) + c] where S(u) is a sigmoid function) [23]. Variants of SVM (10) have proven to be quite successful in paractice =-=[21, 22, 7]-=-. Note that the number of variables in Program (10) remains constant as K(x, x i ) increases in dimensionality. Additionally, the objective function remains quadratic and thus the complexity of the pr... |

55 |
Linear and Nonlinear Separation of Patterns by Linear Programming
- Mangasarian
- 1965
(Show Context)
Citation Context ...ut related research directions developed for solving the two-class linear discrimination problem. The first is the linear programming (LP) methods stemming from the Multisurface Method of Mangasarian =-=[12, 13]-=-. This method and it's later extension the Robust Linear Programming (RLP) approach [6] have been used in a highly successfully breast cancer diagnosis system [26]. The second direction is the quadrat... |

45 |
The Nature of Statistical Learning Theory; Springer-Verlag
- Vapnik
- 1995
(Show Context)
Citation Context ... the the two research directions have differed in their approach to solving problems with k > 2 classes. The originalSVM method for multiclass problems was to find k separate two-class discrimi-nants =-=[23]-=-. Each discriminant is constructed by separating a single class from all the others. This process requires the solution of k quadratic programs. Whenapplying all k classifiers to the original multicat... |

39 |
Decision tree construction via linear programming
- Bennett
- 1992
(Show Context)
Citation Context ...al [20, 19, 18] use clustering in conjuction with LP to generate neural networks in polynomial time. Another approach is to recursively construct piecewise-linear discriminants using a series of LP's =-=[13, 2, 15]-=-. These approaches could also be used with SVM but we limit discussion to nonlinear discriminants constructed using the SVM kernel-type approaches. After the introduction to the existing multiclass me... |

38 |
Multi-surface method of pattern separation
- Mangasarian
- 1968
(Show Context)
Citation Context ...ut related research directions developed for solving the two-class linear discrimination problem. The first is the linear programming (LP) methods stemming from the Multisurface Method of Mangasarian =-=[12, 13]-=-. This method and it's later extension the Robust Linear Programming (RLP) approach [6] have been used in a highly successfully breast cancer diagnosis system [26]. The second direction is the quadrat... |

36 |
Neural network training via linear programming
- Bennett, Mangasarian
- 1992
(Show Context)
Citation Context ...parate two given sets of points. Thus, it is important to find the linear function that discriminates best between the two sets according to some error minimization criterion. Bennett and Mangasarian =-=[4]-=- minimize the average magnitude of the misclassification errors in the construction of their following robust linear programming problem (RLP). min w,#,y,z 1 #1 e T y + 1 #2 e T z subject to y +A 1 w ... |

24 |
An Algorithm to Generate Radial Basis Function (RBF)-like Nets for Classification Problems
- Roy, Govil, et al.
- 1995
(Show Context)
Citation Context ... functions including polynomial, radial basis function machine, and neural networks. The successful polynomial-time nonlinear methods based on LP use a multi-step approaches. The methods of Roy et al =-=[20, 19, 18]-=- use clustering in conjuction with LP to generate neural networks in polynomial time. Another approach is to recursively construct piecewise-linear discriminants using a series of LP's [13, 2, 15]. Th... |

23 | Multicategory discrimination via linear programming. Optimization Methods and SoBware
- Bennett, Mangasarian
- 1994
(Show Context)
Citation Context ...assification function that is maximized at that point. The LP approach has been to directly construct k classification functions such that for each point the corresponding class function is maximized =-=[5, 6]-=-. The Multicategory Discrimination Method [5, 6] constructs a piecewise-linear discriminate for the k- class problem using a single linear program. We will call this method M-RLP since it is a directi... |

23 |
A polynomial time algorithm for the construction and training of a class of multilayer perceptrons, Neural Networks 6
- Roy, Kim, et al.
- 1993
(Show Context)
Citation Context ... functions including polynomial, radial basis function machine, and neural networks. The successful polynomial-time nonlinear methods based on LP use a multi-step approaches. The methods of Roy et al =-=[20, 19, 18]-=- use clustering in conjuction with LP to generate neural networks in polynomial time. Another approach is to recursively construct piecewise-linear discriminants using a series of LP's [13, 2, 15]. Th... |

19 | Geometry in learning
- Bennett, Bredensteiner
- 1998
(Show Context)
Citation Context ...ent, the two-class linear discrimination methods for SVM and RLP are almost identical. They di#er only in the regularization term used in the objective. We use the regularized form of RLP proposed in =-=[3]-=- which is equivalent to SVM except that a di#erent norm is used for the regularization term. For two-class linear discrimination, RLP generalizes equally well and is more computationally e#cient than ... |

13 | Mathematical programming in machine learning
- Mangasarian
- 1996
(Show Context)
Citation Context ...al [20, 19, 18] use clustering in conjuction with LP to generate neural networks in polynomial time. Another approach is to recursively construct piecewise-linear discriminants using a series of LP's =-=[13, 2, 15]-=-. These approaches could also be used with SVM but we limit discussion to nonlinear discriminants constructed using the SVM kernel-type approaches. After the introduction to the existing multiclass me... |

9 |
de Vel. Comparison of classifiers in high dimensional settings
- Aeberhard, Coomans, et al.
- 1992
(Show Context)
Citation Context ...t S is denoted by arg min x#S f(x). For a vector x in R n , x+ will denote the vector in R n with components (x+ ) i := max{x i , 0}, i = 1, . . . , n. The step function x # will denote the vector in =-=[0, 1]-=- n with components (x # ) i := 0 if (x) i # 0 and (x # ) i := 1 if (x) i > 0, i = 1, . . . , n. For the vector x in R n and the matrix A in R nm , the transpose of x and A are denoted x T and A T resp... |

8 | Serial and parallel multicategory discrimination
- Bennett, Mangasarian
- 1994
(Show Context)
Citation Context ...blem. The first is the linear programming (LP) methods stemming from the Multisurface Method of Mangasarian [12, 13]. This method and it's later extension the Robust Linear Programming (RLP) approach =-=[6]-=- have been used in a highly successfully breast cancer diagnosis system [26]. The second direction is the quadratic programming (QP) methods based on Vapnik's Statistical Learning Theory [24, 25]. Sta... |

6 |
Rule induction in forensic science
- Evett, Spiehler
- 1987
(Show Context)
Citation Context ... protocol (ftp) from the UCI Repository of Machine Learning Databases and Domain Theories [16] at ftp://ftp.ics.uci.edu/pub/machine-learning-databases. Glass Identification Database The Glass dataset =-=[11]-=- is used to identify the origin of a sample of glass through chemical analysis. This dataset is comprised of six classes of 214 points with 9 features. The distribution of points by class is as follow... |

3 |
Pattern classification using linear programming
- Roy, Mukhopadhyay
- 1990
(Show Context)
Citation Context ... functions including polynomial, radial basis function machine, and neural networks. The successful polynomial-time nonlinear methods based on LP use a multi-step approaches. The methods of Roy et al =-=[20, 19, 18]-=- use clustering in conjuction with LP to generate neural networks in polynomial time. Another approach is to recursively construct piecewise-linear discriminants using a series of LP's [13, 2, 15]. Th... |

1 |
Serial and parallel multicategorydiscrimination
- Bennett, Mangasarian
- 1994
(Show Context)
Citation Context ...roblem. The first isthe linear programming (LP) methods stemming from the Multisurface Method of Mangasarian [12, 13]. This method and it's later extension the Robust LinearProgramming (RLP) approach =-=[6]-=- have been used in a highly successfully breast cancer diagnosis system [26]. The second direction is the quadratic programming(QP) methods based on Vapnik's Statistical Learning Theory [24, 25]. Stat... |

1 |
Mathematical programming in machine learning
- McGraw-Hill, York
- 1996
(Show Context)
Citation Context ...re represented by a circle around the point. Some points have double circles which indicate that two dual variables uijl > 0, j = 1, . . . , 3, j 6= i. By thecomplementarity within the KKT conditions =-=[14]-=-, uijl > 0 ) Ail(wi - wj) = (fli - flj ) + 1. Consequently the support vectors are located "closest" to the separating func-tion. In fact, the remainder of the points, those that are not support vecto... |

1 |
MINOS 5.4 user's guide. TechnicalReport SOL 83.20
- Murtagh, Saunders
- 1993
(Show Context)
Citation Context ...planes and themaximum classification error. The main advantage of the RLP method over the SVM problem is that RLP is a linear program solvable using very robust algo-rithms such as the Simplex Method =-=[17]-=-. SVM requires the solution of quadratic program that is typically much more computationally costly for the same sizeproblem. In [3], the RLP method was found to generalize as well as the linear SVM b... |

1 |
Incorporating invariances insupport vector machines
- Scholkopf, Burges, et al.
- 1996
(Show Context)
Citation Context ...and fl is the width parame-ter), and two-layer neural networks ( K(x, xi) = S[v(xT xi) + c] where S(u) is asigmoid function) [23]. Variants of SVM (10) have proven to be quite successful in paractice =-=[21, 22, 7]-=-.Note that the number of variables in Program (10) remains constant as K(x, xi) increases in dimensionality. Additionally, the objective function re-mains quadratic and thus the complexity of the prob... |