## T.S.: Interior point methods for massive support vector machines (2000)

Venue: | Data Mining Institute, Computer Sciences Department, University of Wisconsin |

Citations: | 43 - 1 self |

### BibTeX

@TECHREPORT{Ferris00t.s.:interior,

author = {Michael C. Ferris and Todd and S. Munson},

title = {T.S.: Interior point methods for massive support vector machines},

institution = {Data Mining Institute, Computer Sciences Department, University of Wisconsin},

year = {2000}

}

### Years of Citing Articles

### OpenURL

### Abstract

Abstract. We investigate the use of interior-point methods for solving quadratic programming problems with a small number of linear constraints, where the quadratic term consists of a low-rank update to a positive semidefinite matrix. Several formulations of the support vector machine fit into this category. An interesting feature of these particular problems is the volume of data, which can lead to quadratic programs with between 10 and 100 million variables and, if written explicitly, a dense Q matrix. Our code is based on OOQP, an object-oriented interior-point code, with the linear algebra specialized for the support vector machine application. For the targeted massive problems, all of the data is stored out of core and we overlap computation and input/output to reduce overhead. Results are reported for several linear support vector machine formulations demonstrating that the method is reliable and scalable. Key words. support vector machine, interior-point method, linear algebra AMS subject classifications. 90C51, 90C20, 62H30 PII. S1052623400374379 1. Introduction. Interior-point methods [30] are frequently used to solve large convex quadratic and linear programs for two reasons. First, the number of iterations

### Citations

2277 | A tutorial on support vector machines for pattern recognition
- Burges
- 1998
(Show Context)
Citation Context ...ese problems is to exploit structure using block eliminations. One source of massive problems of this type is the data mining community, where several linear support vector machine (SVM) formulations =-=[28, 1, 2, 19]-=- fit into the framework. A related example is the Huber regression problem [17, 21, 31], which can also be posed as a quadratic program of the type considered. The linear SVM attempts to construct a h... |

2160 | Support-Vector Networks
- Cortes, Vapnik
- 1995
(Show Context)
Citation Context ...e a solution. Concurrent with work described here, Mangasarian and Musicant advocated the use of the Sherman–Morrison–Woodbury update formula in their active set algorithm. Another variant considered =-=[5]-=- is a slight modification of (2.4): (2.5) 1 2 ‖w‖2 2 ν minw,γ,y + 2 ‖y‖2 2 subject to D(Aw − eγ)+y ≥ e. We can also use a one-sided Huber M-estimator [16] for the misclassification error within the li... |

2025 | Learning with Kernels
- Schölkopf, Smola
- 2001
(Show Context)
Citation Context ...variables complementary to w and v. Convergence results for these methods can be found in [30] and are not discussed here. Specializations of the interior-point method to the SVM case can be found in =-=[27]-=-. The Mehrotra predictor-corrector method [23] is a specific type of interior-point method. The iterates for the algorithm are guaranteed to remain interior to the simple bounds; that is, wi > 0, vi >... |

1823 |
Robust Statistics
- Huber
- 1981
(Show Context)
Citation Context ...tive set algorithm. Another variant considered [5] is a slight modification of (2.4): (2.5) 1 2 ‖w‖2 2 ν minw,γ,y + 2 ‖y‖2 2 subject to D(Aw − eγ)+y ≥ e. We can also use a one-sided Huber M-estimator =-=[16]-=- for the misclassification error within the linear SVM. This function is a convex quadratic for small values of its argument and is linear for large values. The resulting quadratic program is a combin... |

1085 |
The Algebraic Eigenvalue Problem
- Wilkinson
- 1965
(Show Context)
Citation Context ...gain, this is a heuristic that does not require a sort of the data being accumulated. For summations involving numbers of the same sign, the accumulation from smallest to largest is as recommended in =-=[29]-=-. The addition of the positive and negative buckets of the same magnitude is designed to alleviate cancellation effects. An example is given in [15] to test the effects of ordering on summations. The ... |

941 |
An Introduction to Support Vector Machines
- Cristianini, Shawe-Taylor
(Show Context)
Citation Context ... the volume of data, which can lead to quadratic programs with between 10 and 100 million variables and, if written explicitly, a dense Q matrix. The large number of practical applications of the SVM =-=[6, 26]-=- is indicative of the importance of robust, scalable algorithms to the data mining and machine learning communities. Sampling techniques [3] can be used to decrease the number of observations needed t... |

847 |
Accuracy and Stability of Numerical Algorithms
- Higham
- 2002
(Show Context)
Citation Context ...cumulate M (p mod L)+1 = M (p mod L)+1 + RT p (C−1)pRp. 5. Merge M = ∑L l=1 Ml. Our merge is implemented by repeatedly adding the L 2 neighbors as depicted in Figure 4.1 (termed pairwise summation in =-=[15]-=-). A similar procedure is used for the vector computations. The code uses L = 8 for the calculations. We note that the above algorithm is dependent on the buffer size read from disk. This dependency i... |

483 |
Sampling Techniques
- Cochran
- 1977
(Show Context)
Citation Context .... The large number of practical applications of the SVM [6, 26] is indicative of the importance of robust, scalable algorithms to the data mining and machine learning communities. Sampling techniques =-=[3]-=- can be used to decrease the number of observations needed to construct a good separating surface. However, if we considered a “global” application and randomly sampled only 1% of the current world po... |

465 | Primal-Dual Interior-Point Methods
- Wright
- 1997
(Show Context)
Citation Context ... and scalable. Key words. support vector machine, interior-point method, linear algebra AMS subject classifications. 90C51, 90C20, 62H30 PII. S1052623400374379 1. Introduction. Interior-point methods =-=[30]-=- are frequently used to solve large convex quadratic and linear programs for two reasons. First, the number of iterations taken is typically either constant or grows very slowly with the problem dimen... |

289 | Sequential minimal optimization: A fast algorithm for training support vector machines
- Platt
- 1998
(Show Context)
Citation Context ... + 1 2xT DAAT Dx + 1 2xT DeeT Dx − eT x 0 ≤ x ≤ ν2e, which are of the desired form. In addition to the papers cited above, several specialized codes have been applied to solve (2.8); for example, see =-=[24]-=-. Once the dual problems above are solved, the hyperplane in the primal problems can be recovered as follows: • w = A T Dx, and γ is the multiplier on e T Dx = 0 for (2.2), (2.5), and (2.6). • w = A T... |

268 |
On the implementation of a (primal-dual) interior point method
- Mehrotra
- 1990
(Show Context)
Citation Context ...ead inherent in such a scheme. As mentioned above, the crucial implementation details are in the linear algebra calculation. Rather than reimplement a standard predictor-corrector interior-point code =-=[23]-=-, we use OOQP [11, 12] as the basis for our work. A key property of OOQP is the object-oriented design, which enables us to tailor the required linear algebra to the application. Our linear algebra im... |

265 |
Monotone operators and the proximal point algorithm
- Rockafellar
- 1976
(Show Context)
Citation Context ...bles us to tailor the required linear algebra to the application. Our linear algebra implementation exploits problem structure while keeping all of the data out of core. A proximal-point modification =-=[25]-=- to the underlying algorithm is also available to improve robustness on some of the SVM formulations considered. We begin in section 2 by formally stating the general optimization problem we are inter... |

252 | SVMTorch: Support Vector Machines for Large-Scale Regression Problems
- Collobert, Bengio
- 2001
(Show Context)
Citation Context ...800 900 1000 Problem Size in Thousands of Observations Fig. 5.3. Total time comparison of the different formulations and SVMTorch with varying problem sizes on the separable dataset. We used SVMTorch =-=[4]-=- for this series of tests because the code is freely available and, according to the documentation, is specifically tailored for large-scale problems. We compiled both codes with the same compiler and... |

193 |
Nonlinear programming
- Mangasarian
- 1994
(Show Context)
Citation Context ...ess, this problem is (2.7) 1 2 ν1 minw,γ,y,t ‖w, γ‖2 2 + subject to D(Aw − eγ)+t + y ≥ e, y ≥ 0. 2 ‖t‖2 2 + ν2eT y As stated, these problems are not in a form matching (2.1). However, the Wolfe duals =-=[18]-=- of (2.2)–(2.7) are, respectively,INTERIOR-POINT METHODS FOR SVM 787 (2.8) 1 2 xT DAA T Dx − e T x minx subject to eT Dx =0, 0 ≤ x ≤ νe, (2.9) minx subject to 1 2xT DAAT Dx + 1 2xT DeeT Dx − eT x 0 ≤... |

97 |
A special Newton-type optimization method
- Fischer
- 1992
(Show Context)
Citation Context ...10−8 7 10−810−4 8 10−41 9 1 104 10 104 108 11 108 ∞INTERIOR-POINT METHODS FOR SVM 795 4.4. Termination criteria. The termination criterion is based on the inf-norm of the Fischer–Burmeister function =-=[9]-=- for the complementarity problem (3.1), with an appropriate modification for the presence of equations [8]. If we denote all variables in (3.1) by x and the affine function on the left of (3.1) by F (... |

92 | Generalized support vector machines
- Mangasarian
- 2000
(Show Context)
Citation Context ...ese problems is to exploit structure using block eliminations. One source of massive problems of this type is the data mining community, where several linear support vector machine (SVM) formulations =-=[28, 1, 2, 19]-=- fit into the framework. A related example is the Huber regression problem [17, 21, 31], which can also be posed as a quadratic program of the type considered. The linear SVM attempts to construct a h... |

88 |
The nature of statistical learning theory, 2nd ed
- Vapnik
- 1995
(Show Context)
Citation Context ...ese problems is to exploit structure using block eliminations. One source of massive problems of this type is the data mining community, where several linear support vector machine (SVM) formulations =-=[28, 1, 2, 19]-=- fit into the framework. A related example is the Huber regression problem [17, 21, 31], which can also be posed as a quadratic program of the type considered. The linear SVM attempts to construct a h... |

66 | D.R.: Successive overrelaxation for support vector machines
- Mangasarian, Musicant
- 1999
(Show Context)
Citation Context ... the solution phase. For example, one formulation incorporates γ into the objective function: (2.3) minw,γ,y 1 2 ‖w, γ‖2 2 + νeT y subject to D(Aw − eγ)+y ≥ e, y ≥ 0. This formulation is described in =-=[20]-=- to allow successive overrelaxation to be applied to the (dual) problem. A different permutation replaces the one-norm of y in (2.3) with the two-norm, such that the nonnegativity constraint on y beco... |

60 | Object-Oriented Software for Quadratic Programming
- Gertz, Wright
- 2003
(Show Context)
Citation Context ...ch a scheme. As mentioned above, the crucial implementation details are in the linear algebra calculation. Rather than reimplement a standard predictor-corrector interior-point code [23], we use OOQP =-=[11, 12]-=- as the basis for our work. A key property of OOQP is the object-oriented design, which enables us to tailor the required linear algebra to the application. Our linear algebra implementation exploits ... |

52 | W.: A framework for measuring changes in data characteristics
- Ganti, Gehrke, et al.
- 2002
(Show Context)
Citation Context ...ng surface. However, if we considered a “global” application and randomly sampled only 1% of the current world population, we would generate a problem with around 60 million observations. Recent work =-=[10]-=- has shown that although a random sampling of 20–30% is sufficient for many applications, sampling even as high as 70–80% can produce statistically significant differences in the models. Furthermore, ... |

48 | Massive data discrimination via linear support vector machines
- Bradley, Mangasarian
(Show Context)
Citation Context |

48 | Multiple centrality corrections in a primal-dual method for linear programming
- Gondzio
- 1996
(Show Context)
Citation Context ... been demonstrated. The linear algebra can be parallelized easily, and further speedups can be realized through storage of the data across multiple disks. More sophisticated corrector implementations =-=[14]-=- of the interior-point code can be used to further reduce the iteration count. These are topics for future work, along with extensions to nonlinear SVM, and techniques to further reduce the number of ... |

29 |
Finite termination of the proximal point algorithm
- Ferris
- 1991
(Show Context)
Citation Context ...for some η>0, possibly iteration dependent, to find a new xi+1 . The algorithm repeatedly solves subproblems of the form (3.7) until convergence occurs. Properties of such algorithms are developed in =-=[25, 7]-=-, where it is shown that if the original problem has a solution, then the proximal-point algorithm converges to a particular element in the solution set of the original problem. Furthermore, each of t... |

29 | Feasible descent algorithms for mixed complementarity problems
- Ferris, Kanzow, et al.
- 1996
(Show Context)
Citation Context ...iteria. The termination criterion is based on the inf-norm of the Fischer–Burmeister function [9] for the complementarity problem (3.1), with an appropriate modification for the presence of equations =-=[8]-=-. If we denote all variables in (3.1) by x and the affine function on the left of (3.1) by F (x), then each component of the Fischer–Burmeister function is defined by φ(xi,Fi(x)) := √ x 2 i + Fi(x) 2 ... |

26 | Robust linear and support vector regression
- L, Musicant
- 2000
(Show Context)
Citation Context ...lems of this type is the data mining community, where several linear support vector machine (SVM) formulations [28, 1, 2, 19] fit into the framework. A related example is the Huber regression problem =-=[17, 21, 31]-=-, which can also be posed as a quadratic program of the type considered. The linear SVM attempts to construct a hyperplane partitioning two sets of observations, where each observation is an element o... |

7 |
Active set support vector machine classification
- Mangasarian, Musicant
- 2000
(Show Context)
Citation Context ...cation error, e T y, and the two-norm of w, the normal to the hyperplane being derived. The relationship between minimizing ‖w‖ 2 and maximizing the margin of separation is described, for example, in =-=[22]-=-. Here, ν is a parameter weighting the two compet-786 MICHAEL C. FERRIS AND TODD S. MUNSON ing goals related to misclassification error and margin of separation. The inequality constraints implement ... |

3 |
The linear ℓ1 estimator and the Huber M-estimator
- Li, Swetits
- 1998
(Show Context)
Citation Context ...lems of this type is the data mining community, where several linear support vector machine (SVM) formulations [28, 1, 2, 19] fit into the framework. A related example is the Huber regression problem =-=[17, 21, 31]-=-, which can also be posed as a quadratic program of the type considered. The linear SVM attempts to construct a hyperplane partitioning two sets of observations, where each observation is an element o... |

2 | On reduced convex QP formulations of monotone LCP problems
- Wright
(Show Context)
Citation Context ...lems of this type is the data mining community, where several linear support vector machine (SVM) formulations [28, 1, 2, 19] fit into the framework. A related example is the Huber regression problem =-=[17, 21, 31]-=-, which can also be posed as a quadratic program of the type considered. The linear SVM attempts to construct a hyperplane partitioning two sets of observations, where each observation is an element o... |