## LIBSVM: a Library for Support Vector Machines (2001)

### Cached

### Download Links

- [www.csie.ntu.edu.tw]
- [www.csie.ntu.edu.tw]
- [www.csie.ntu.edu.tw]
- [fbim.fh-regensburg.de]
- [ntu.csie.org]
- [www.csie.ntu.edu.tw]
- CiteULike

### Other Repositories/Bibliography

Citations: | 3436 - 62 self |

### BibTeX

@MISC{Chang01libsvm:a,

author = {Chih-chung Chang and Chih-Jen Lin},

title = {LIBSVM: a Library for Support Vector Machines},

year = {2001}

}

### Years of Citing Articles

### OpenURL

### Abstract

LIBSVM is a library for support vector machines (SVM). Its goal is to help users can easily use SVM as a tool. In this document, we present all its implementation details. 1

### Citations

8980 | Statistical Learning Theory
- Vapnik
- 1998
(Show Context)
Citation Context ...ations 2.1 C-Support Vector Classication (Binary Case) Given training vectors x i 2 R n ; i = 1; : : : ; l, in two classes, and a vector y 2 R l such that y i 2 f1; 1g, C-SVC (Cortes & Vapnik, 1995; V=-=apn-=-ik, 1998) solves the following primal problem: min w;b; 1 2 w T w + C l X i=1 i (2.1) y i (w T (x i ) + b) 1 i ; i 0; i = 1; : : : ; l: Department of Computer Science and Information Engineering... |

2171 | Support-vector networks
- Cortes, Vapnik
- 1995
(Show Context)
Citation Context ...iscussed in 7. 2 Formulations 2.1 C-Support Vector Classification (Binary Case) Given training vectors xi ∈ R n , i = 1, . . . , l, in two classes, and a vector y ∈ R l such that yi ∈ {1, −1}, C-SVC (=-=Cortes and Vapnik, 1995-=-; Vapnik, 1998) solves the following primal problem: min w,b,ξ 1 2 wT w + C l� i=1 subject to yi(w T φ(xi) + b) ≥ 1 − ξi, ξi ξi ≥ 0, i = 1, . . . , l. (2.1) ∗ Department of Computer Science and Inform... |

1441 |
Making large-Scale SVM Learning Practical
- Joachims
- 1999
(Show Context)
Citation Context ...d Caching Since for many problems the number of free support vectors (i.e. 0sisC) is small, the shrinking technique reduces the size of the working problem without considering some bounded variables (=-=Joachims, 19-=-98). Near the end of the iterative process, the decomposition method identies a possible set A where allsnal free i may reside in. Then instead of solving the whole problem (2.2), the decomposition m... |

1291 | A training algorithm for optimal margin classifiers
- Boser, Guyon, et al.
(Show Context)
Citation Context ...tion of probability outputs. 2 Formulations 2.1 C-Support Vector Classification Given training vectors xi ∈ R n , i = 1, . . . , l, in two classes, and a vector y ∈ R l such that yi ∈ {1, −1}, C-SVC (=-=Boser et al., 1992-=-; Cortes and Vapnik, 1995) solves the ∗ Department of Computer Science, National Taiwan University, Taipei 106, Taiwan (http:// www.csie.ntu.edu.tw/~cjlin). E-mail: cjlin@csie.ntu.edu.tw 1sfollowing p... |

1011 |
Fast training of support vector machines using sequential minimal optimization
- Platt
- 1998
(Show Context)
Citation Context ...y of Q because Qij is in general not zero. In LIBSVM, we consider the decomposition method to conquer this difficulty. Some work on this method are, for example, (Osuna et al., 1997b; Joachims, 1998; =-=Platt, 1998-=-; Saunders et al., 1998). Algorithm 3.1 (Decomposition method) 1. Given a number q ≤ l as the size of the working set. Find α 1 as the initial solution. Set k = 1. 2. If α k is an optimal solution of ... |

701 | Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods
- Platt
- 1999
(Show Context)
Citation Context ...ot of heart scale included in the LIBSVM package classification, we first estimated pairwise class probabilities rij ≈ p(y = i | y = i or j, x) using an improved implementation (Lin et al., 2003) of (=-=Platt, 2000-=-): rij ≈ 1 1 + e A ˆ f+B , (8.1) where A and B are estimated by minimizing the negative log-likelihood function using known training data and their decision values ˆ f. Labels and decision values are ... |

567 | A Comparison of Methods for Multiclass Support Vector Machines - Hsu, Lin - 2002 |

565 | Training Support Vector Machines: An Application to Face Detection
- Osuna, Osuna, et al.
- 1997
(Show Context)
Citation Context ...ulty of solving (3.1) is the density of Q because Qij is in general not zero. In LIBSVM, we consider the decomposition method to conquer this difficulty. Some work 5son this method are, for example, (=-=Osuna et al., 1997-=-a; Joachims, 1998; Platt, 1998). This method modifies only a subset of α per iteration. This subset, denoted as the working set B, leads to a small sub-problem to be minimized in each iteration. An ex... |

506 | Estimating the support of a highdimensional distribution - Schölkopf, Platt, et al. - 1999 |

425 | A Practical Guide to Support Vector Classification - Hsu, Chang, et al. - 2010 |

368 | The pyramid matching kernel: Discriminative classification with sets of image features
- Grauman, Darrell
(Show Context)
Citation Context ... LIBSVM FAQ: http://www.csie.ntu.edu.tw/~cjlin/libsvm/faq.html. 1Table 1: Representative works in some domains that have successfully used LIBSVM. Domain Representative works Computer vision LIBPMK (=-=Grauman and Darrell, 2005-=-) Natural language processing Maltparser (Nivre et al., 2007) Neuroimaging PyMVPA (Hanke et al., 2009) Bioinformatics BDVal (Dorff et al., 2010) 1. SVC: support vector classification (two-class and mu... |

325 | New support vector algorithms
- Schölkopf, Smola, et al.
- 2000
(Show Context)
Citation Context ...r (maybe infinite) dimensional space by the function φ. The decision function is l∑ sign( yiαiK(xi, x) + b). i=1 2.2 ν-Support Vector Classification (Binary Case) The ν-support vector classification (=-=Schölkopf et al., 2000-=-) uses a new parameter ν which let one control the number of support vectors and errors. The parameter ν ∈ (0, 1] is an upper bound on the fraction of training errors and a lower bound of the fraction... |

277 | Interpolation of scattered data: Distance matrices and conditionally positive de nite functions
- Micchelli
- 1986
(Show Context)
Citation Context ...BB is picked. 132. Some kernels have a nice property that φ(xi), i = 1, . . . , l are independent if xi = xj. Thus Q as well as all possible QBB are positive definite. An example is the RBF kernel (=-=Micchelli, 1986-=-). However, for many practical data we have encountered, some of xi, i = 1, . . . , l are the same. Therefore, several rows (columns) of Q are exactly the same so QBB may be singular. However, even if... |

259 | Large margin DAG’s for multiclass classification
- Platt, Cristianini, et al.
- 2000
(Show Context)
Citation Context ...M is trained with all of the examples in the ith class with positive labels, and all other examples with negative labels. We did not consider it as some research work (e.g. (Weston and Watkins, 1998; =-=Platt et al., 2000-=-)) have shown that it does not perform as good as “one-against-one” In addition, though we have to train as many as k(k − 1)/2 classifiers, as each problem is smaller (only data from two classes), the... |

191 | K.: “Improvements to Platt’s SMO algorithm for SVM classifier design”, Neural Computation 13
- Keerthi, Shevade, et al.
- 2001
(Show Context)
Citation Context ...= C:g; j = argminfrf() t d t j y t d t = 1; d t 0; if t = 0; d t 0; if t = C:g; which is the same as (3.5) and (3.6). We also notice that this is also the modication 2 of the algorithms in (Keerth=-=i -=-et al., 2001). We then dene g i ( rf() i if y i = 1; isrf() i if y i = 1; i > 0; (3.9) 7 and g j ( rf() j if y j = 1; jsrf() j if y j = 1; j > 0: (3.10) From (3.4), we know g i g j (3.11) impli... |

177 | Support vector machines: Training and applications
- Osuna, Freund, et al.
- 1997
(Show Context)
Citation Context ...iculty of solving (3.1) is the density of Q because Qij is in general not zero. In LIBSVM, we consider the decomposition method to conquer this difficulty. Some work on this method are, for example, (=-=Osuna et al., 1997-=-b; Joachims, 1998; Platt, 1998; Saunders et al., 1998). Algorithm 3.1 (Decomposition method) 1. Given a number q ≤ l as the size of the working set. Find α 1 as the initial solution. Set k = 1. 2. If ... |

158 | Working set selection using the second order information for training support vector machines
- Fan, Chen, et al.
- 2005
(Show Context)
Citation Context ... ∈ arg max t {−yt∇f(α k )t | t ∈ Iup(α k )}, � j ∈ arg min − t b2it | t ∈ Ilow(α āit k ), −yt∇f(α k )t < −yi∇f(α k � )i . (3.12) 2. Return B = {i, j}. Details of how we choose this working set is in (=-=Fan et al., 2005-=-, Section II). 3.3 Convergence of the Decomposition Method See (Fan et al., 2005, Section III) or (Chen et al., 2006) for a detailed discussion of the convergence of Algorithm 1. 8s3.4 The Decompositi... |

136 |
Another Approach to Polychotomous Classification
- Friedman
- 1996
(Show Context)
Citation Context ... the “one-against-one” approach (Knerr et al., 1990) in which k(k − 1)/2 classifiers are constructed and each one trains data from two different classes. The first use of this strategy on SVM was in (=-=Friedman, 1996-=-; Kreßel, 1999). For training data from the ith and the jth classes, we solve the following two-class classification problem: min w ij ,b ij ,ξ ij 1 2 (wij ) T w ij + C( � (ξ ij )t) subject to (w ij )... |

123 | RSVM reduced support vector machines - Lee, Mangasarian |

109 | On the Convergence of the Decomposition Method for Support Vector Machines
- Lin
- 2001
(Show Context)
Citation Context ...t consider ɛ-SVR as a different from from C-SVR so the same decomposition method is applied. 3.4 Convergence of the Decomposition Method The convergence of decomposition methods was first studied in (=-=Chang et al., 2000-=-) but algorithms discussed there do not coincide with existing implementations. In this section we will discuss only convergence results related to the specific decomposition method in Section 3.2. Fr... |

109 | R.: A note on Platt’s probabilistic outputs for support vector machines
- Lin, Lin, et al.
- 2007
(Show Context)
Citation Context ...9sFigure 1: Contour plot of heart scale included in the LIBSVM package classification, we first estimated pairwise class probabilities rij ≈ p(y = i | y = i or j, x) using an improved implementation (=-=Lin et al., 2003-=-) of (Platt, 2000): rij ≈ 1 1 + e A ˆ f+B , (8.1) where A and B are estimated by minimizing the negative log-likelihood function using known training data and their decision values ˆ f. Labels and dec... |

93 | Maltparser: A language-independent system for data-driven dependency parsing - Nivre, Hall, et al. |

69 |
Single-layer learning revisited: A stepwise procedure for building and training a neural network
- Knerr, Personnaz, et al.
- 1990
(Show Context)
Citation Context ... { alpha[i] = C_i; alpha[j] = C_i - diff; } } else { if(alpha[j] > C_j) // in region 2 { alpha[j] = C_j; alpha[i] = C_j + diff; } } 6 Multi-class classification We use the “one-against-one” approach (=-=Knerr et al., 1990-=-) in which k(k − 1)/2 classifiers are constructed and each one trains data from two different classes. The first use of this strategy on SVM was in (Friedman, 1996; Kreßel, 1999). For training data fr... |

58 | Building Support Vector Machines with Reduced Classifier Complexity - Keerthi, Chapelle, et al. |

58 | A study on sigmoid kernels for svm and the training of non-psd kernels by smo-type methods
- Lin, Lin
- 2003
(Show Context)
Citation Context ...refore, the quadratic problem (3.1) is not convex so may have several local minima. Using a slightly modified Algorithm 3.1, LIBSVM still guarantees to converges to local minima. More details are in (=-=Lin and Lin, 2003-=-). In addition, the discussion here is about asymptotic convergence. We investigate the computational complexity in Section 4.3. 3.5 The Decomposition Method for ν-SVC and ν-SVR Both ν-SVC and ν-SVR c... |

56 |
Pairwise classification and support vector machine
- Kreßel
- 1999
(Show Context)
Citation Context ...t-one” approach (Knerr et al., 1990) in which k(k − 1)/2 classifiers are constructed and each one trains data from two different classes. The first use of this strategy on SVM was in (Friedman, 1996; =-=Kreßel, 1999-=-). For training data from the ith and the jth classes, we solve the following two-class classification problem: min w ij ,b ij ,ξ ij 1 2 (wij ) T w ij + C( � (ξ ij )t) subject to (w ij ) T φ(xt) + b i... |

47 | 2002) “Convergence of a Generalized SMO Algorithm for SVM Classifier Design
- Keerthi, Gilbert
(Show Context)
Citation Context ...t algorithms discussed there do not coincide with existing implementations. In this section we will discuss only convergence results related to the specific decomposition method in Section 3.2. From (=-=Keerthi and Gilbert, 2002-=-) we have Theorem 3.2 Given any ɛ > 0, after a finite number of iterations (3.13) will be satisfied. This theorem establishes the so-called “finite termination” property so we are sure that after fini... |

41 | Support vector machine reference manual
- Saunders, Stitson, et al.
- 1998
(Show Context)
Citation Context ...e Qij is in general not zero. In LIBSVM, we consider the decomposition method to conquer this difficulty. Some work on this method are, for example, (Osuna et al., 1997b; Joachims, 1998; Platt, 1998; =-=Saunders et al., 1998-=-). Algorithm 3.1 (Decomposition method) 1. Given a number q ≤ l as the size of the working set. Find α 1 as the initial solution. Set k = 1. 2. If α k is an optimal solution of (2.2), stop. Otherwise,... |

41 | Pairwise classification and support vector machines - Kressel - 1999 |

33 | Training ν-support vector classifiers: Theory and algorithms
- Chang, Lin
- 2001
(Show Context)
Citation Context ...rs. The parameter ν ∈ (0, 1] is an upper bound on the fraction of training errors and a lower bound of the fraction of support vectors. Details of the algorithm implemented in LIBSVM can be found in (=-=Chang and Lin, 2001-=-). Given training vectors xi ∈ Rn , i = 1, . . . , l, in two classes, and a vector y ∈ Rl such that yi ∈ {1, −1}, the primal form considered is: min w,b,ξ,ρ 1 2 wT w − νρ + 1 l l∑ i=1 ξi yi(w T φ(xi) ... |

27 |
A geometric interpretation of ν-SVM classifiers
- Crisp, Burges
- 2000
(Show Context)
Citation Context ...ubject to yi(w T φ(xi) + b) ≥ ρ − ξi, min α ξi ξi ≥ 0, i = 1, . . . , l, ρ ≥ 0. 1 2 αT Qα subject to 0 ≤ αi ≤ 1/l, i = 1, . . . , l, (2.3) sgn( e T α ≥ ν, y T α = 0. l� yiαi(K(xi, x) + b)). i=1 2sIn (=-=Crisp and Burges, 2000-=-; Chang and Lin, 2001), it has been shown that e T α ≥ ν can be replaced by e T α = ν. With this property, in LIBSVM, we solve a scaled version of (2.3): min α 1 2 αT Qα subject to 0 ≤ αi ≤ 1, i = 1, ... |

20 | A study on SMO-type decomposition methods for support vector machines
- Chen, Fan, et al.
(Show Context)
Citation Context ...� )i . (3.12) 2. Return B = {i, j}. Details of how we choose this working set is in (Fan et al., 2005, Section II). 3.3 Convergence of the Decomposition Method See (Fan et al., 2005, Section III) or (=-=Chen et al., 2006-=-) for a detailed discussion of the convergence of Algorithm 1. 8s3.4 The Decomposition Method for ν-SVC and ν-SVR Both ν-SVC and ν-SVR can be considered as the following general form: Define min α The... |

19 | A formal analysis of stopping criteria of decomposition methods for support vector machines - Lin |

17 |
Polynomial-time decomposition algorithms for support vector machines
- Hush, Scovel
- 2003
(Show Context)
Citation Context ...Note that if shrinking is incorporated, l will gradually decrease during iterations. Unfortunately, so far we do not know much about the complexity of the number of iterations. An earlier work is in (=-=Hush and Scovel, 2003-=-). However, its result applies only to decomposition methods discussed in (Chang et al., 2000) but not LIBSVM or other existing software. 5 Multi-class classification We use the “one-against-one” appr... |

16 | Asymptotic convergence of an SMO algorithm without any assumptions - Lin - 2002 |

14 | Linear convergence of a decomposition method for support vector machines
- Lin
- 2001
(Show Context)
Citation Context ...rameter ν ∈ (0, 1] is an upper bound on the fraction of training errors and a lower bound of the fraction of support vectors. Details of the algorithm implemented in LIBSVM can be found in (Chang and =-=Lin, 2001-=-). Given training vectors xi ∈ Rn , i = 1, . . . , l, in two classes, and a vector y ∈ Rl such that yi ∈ {1, −1}, the primal form considered is: min w,b,ξ,ρ 1 2 wT w − νρ + 1 l l∑ i=1 ξi yi(w T φ(xi) ... |

8 | Large margin DAGS for multiclass classi - Platt, Cristianini, et al. - 2000 |

8 | Training ν-support vector regression: Theory and algorithms
- Chang, Lin
(Show Context)
Citation Context ...(α)i − ∇f(α)j (i.e. larger KKT violations). This was first proposed in (Keerthi and Gilbert, 2002). Some details for classification and regression can be seen in (Chang and Lin, 2001, Section 4) and (=-=Chang and Lin, 2002-=-), respectively. The stopping criterion will be described in Section 3.7. 113.6 Analytical Solutions Now (3.3) is a simple problem with only two variables: 1 [ ] min αi αj αi,αj 2 [ ] [ ] Qii Qij αi ... |

8 | Svm-optimization and steepest-descent line search - List, Simon - 2009 |

8 | On the convergence of a modified version of SVMlight algorithm - Palagi, Sciandrone |

7 | Simple probabilistic predictions for support vector regression
- Lin, Weng
- 2004
(Show Context)
Citation Context ...ture the shape of ζi’s. Thus, we propose to model ζi by zeromean Gaussian and Laplace, or equivalently, model the conditional distribution of y given ˆ f(x) by Gaussian and Laplace with mean ˆ f(x). (=-=Lin and Weng, 2004-=-) discussed a method to judge whether a Laplace and Gaussian distribution should be used. Moreover, they experimentally show that in all cases 22sthey have tried, Laplace is better. Thus, here we cons... |

6 | 2001b). Training -support vector classi Theory and algorithms - Chang, Lin |

6 | Maximum-gain working set selection for support vector machines - Glasmachers, Igel |

6 | General polynomial time decomposition algorithms - List, Simon |

6 | Fast and scalable local kernel machines - Segata, Blanzieri |

5 | A note on the decomposition methods for support vector regression
- Liao, Lin, et al.
(Show Context)
Citation Context ...Theorem 4.1), it has been shown that if we use only the two variables selected by the method in Section 3.2, the property αiα ∗ i = 0, i = 1, . . . , l still holds throughout all iterations. Then in (=-=Liao et al., 2002-=-) we show that even if we expand the working set and sub-problem to have four variables, in most decomposition iterations, only the two originally selected variables are changed. In other words, no ma... |

3 | Pairwise classi and support vector machines - Kreel - 1999 |

2 | Asymptotic convergence of an SMO algorithm without any assumptions - C-J - 2002 |

1 | Training -support vector regression: Theory and algorithms - Chang - 2001 |

1 |
A geometric interpretation of -SVM classi
- Burges
- 1999
(Show Context)
Citation Context ...n 1 2 T Q 0 i 1=l; i = 1; : : : ; l; (2.3) e T ; y T = 0: where Q ij y i y j K(x i ; x j ). The decision function is: f(x) = sign( l X i=1 y i i (K(x i ; x) + b)): 2 In (Crisp & Burges, 1999; Chang & Lin, 2001a), it has been shown that e T can be replaced by e T = . With this property, in LIBSVM, we solve a scaled version of (2.3): min 1 2 T Q 0 i 1; i = 1; : : : ; l;... |