## SVM Optimization: Inverse Dependence on Training Set Size

### Cached

### Download Links

Citations: | 52 - 13 self |

### BibTeX

@MISC{Srebro_svmoptimization:,

author = {Nathan Srebro},

title = {SVM Optimization: Inverse Dependence on Training Set Size},

year = {}

}

### Years of Citing Articles

### OpenURL

### Abstract

We discuss how the runtime of SVM optimization should decrease as the size of the training data increases. We present theoretical and empirical results demonstrating how a simple subgradient descent approach indeed displays such behavior, at least for linear kernels. 1.

### Citations

1011 |
Fast training of support vector machines using sequential minimal optimization
- Platt
- 1998
(Show Context)
Citation Context ...tions also theoretically increasing with m. To avoid a cubic dependence on m, many modern SVM solvers use “decomposition techniques”: Only a subset of the dual variables is updated at each iteration (=-=Platt, 1998-=-; Joachims, 1998). It is possible to establish linear convergence for specific decomposition methods (e.g. Lin, 2002). However, a careful examination of this analysis reveals that the number of iterat... |

468 | Making large-scale support vector machine learning practical
- Joachims
- 1999
(Show Context)
Citation Context ...eoretically increasing with m. To avoid a cubic dependence on m, many modern SVM solvers use “decomposition techniques”: Only a subset of the dual variables is updated at each iteration (Platt, 1998; =-=Joachims, 1998-=-). It is possible to establish linear convergence for specific decomposition methods (e.g. Lin, 2002). However, a careful examination of this analysis reveals that the number of iterations before the ... |

319 |
Training linear svms in linear time
- Joachims
- 2006
(Show Context)
Citation Context ...ith a more moderate scaling on the data set size were presented. The flip side is that these approaches typically have much worse dependence on the optimization accuracy. A recent example is SVMPerf (=-=Joachims, 2006-=-), an optimization method that uses a cutting planes approach for training linear SVMs. Smola et al. (2008) showed that SVM-Perf can find a solution with accuracy ɛ in time O(md/(λɛ)). Although SVM-Pe... |

280 | Pegasos: Primal estimated sub-gradient solver for svm
- Shalev-Shwartz, Singer, et al.
- 2011
(Show Context)
Citation Context ...le data? In Section 5, we present both a theoretical analysis and a thorough empirical study demonstrating that, at least for linear kernels, the runtime of the subgradient descent optimizer PEGASOS (=-=Shalev-Shwartz et al., 2007-=-) does indeed decrease as more data is made available. 2. Background We briefly introduce the SVM setting and the notation used in this paper, and survey the standard runtime analysis of several optim... |

257 | Rademacher and Gaussian complexities: Risk bounds and structural results - Bartlett, Mendelson |

133 | The tradeoffs of large scale learning - Bottou, Bousquet - 2007 |

46 | Large scale online learning
- Bottou, LeCun
- 2003
(Show Context)
Citation Context ... set as a function of runtime (#iterations), for various training set sizes. The insert is a cartoon depicting a hypothetical situation discussed in the text.sscent for unregularized linear learning (=-=Bottou & LeCun, 2004-=-), this is not the case here. Unfortunately we are not aware of an efficient one-pass optimizer for SVMs. 6. Discussion We suggest here a new way of studying and understanding the runtime of training:... |

40 | Bundle methods for machine learning
- Smola, Vishwanathan, et al.
- 2007
(Show Context)
Citation Context ...acc ɛ4 � , matching that in the analysis of dual decomposition methods above. It should be noted that SVM-Perf’s runtime has been reported to have only a logarithmic dependence on 1/ɛacc in practice (=-=Smola et al., 2008-=-). If that were the case, the runtime guarantee would drop to Õ � 4 d�w0� ɛ3 � , perhaps explaining the faster runtime of SVM-Perf on large data sets in practice. As for the stochastic gradient optimi... |

19 |
A formal analysis of stopping criteria of decomposition methods for support vector machines
- Lin
(Show Context)
Citation Context ...on techniques”: Only a subset of the dual variables is updated at each iteration (Platt, 1998; Joachims, 1998). It is possible to establish linear convergence for specific decomposition methods (e.g. =-=Lin, 2002-=-). However, a careful examination of this analysis reveals that the number of iterations before the linearly convergent stage can grow as m 2 . In fact, Bottou and Lin (2007) argue that any method tha... |

17 | Support vector machine solvers - Bottou, Lin - 2007 |

2 |
Fast convergence rates for excess regularized risk with application to SVM. http://ttic.uchicago. edu/˜karthik/con.pdf
- Sridharan, Shalev-Shwartz, et al.
- 2008
(Show Context)
Citation Context ...d by the empirical degradation: For all w with �w� 2 ≤ 2/λ (a larger norm would yield a worse SVM objective than w=0, and so can be disqualified), with probability at least 1−δ over the training set (=-=Sridharan, 2008-=-): fλ(w)−fλ(w ∗ � ) ≤ 2 ˆfλ(w) − ˆ fλ(w ∗ � ) +O + (2) � 1 � log δ λm where [z]+ = max(z,0). Recalling that ˜w is an ɛaccaccurate minimizer of ˆ fλ(w), we have: fλ( ˜w) − fλ(w ∗ � 1 � log δ ) ≤ 2ɛacc ... |