• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

DMCA

Pegasos: Primal Estimated sub-gradient solver for SVM

Cached

  • Download as a PDF

Download Links

  • [www.cs.huji.ac.il]
  • [eprints.pascal-network.org]
  • [www.robots.ox.ac.uk]
  • [ttic.uchicago.edu]
  • [ttic.uchicago.edu]
  • [www.magicbroom.info]
  • [imls.engr.oregonstate.edu]
  • [www.cs.huji.ac.il]
  • [ttic.uchicago.edu]
  • [www.machinelearning.org]
  • [www.cs.huji.ac.il]
  • [ttic.uchicago.edu]
  • [eprints.pascal-network.org]
  • [alliance.seas.upenn.edu]
  • [ttic.uchicago.edu]
  • [www.cs.huji.ac.il]
  • [ttic.uchicago.edu]
  • [www.cs.huji.ac.il]
  • [www.cs.cmu.edu]

  • Save to List
  • Add to Collection
  • Correct Errors
  • Monitor Changes
by Shai Shalev-Shwartz , Yoram Singer , Nathan Srebro , Andrew Cotter
Citations:542 - 20 self
  • Summary
  • Citations
  • Active Bibliography
  • Co-citation
  • Clustered Documents
  • Version History

BibTeX

@MISC{Shalev-Shwartz_pegasos:primal,
    author = {Shai Shalev-Shwartz and Yoram Singer and Nathan Srebro and Andrew Cotter},
    title = {Pegasos: Primal Estimated sub-gradient solver for SVM },
    year = {}
}

Share

Facebook Twitter Reddit Bibsonomy

OpenURL

 

Abstract

We describe and analyze a simple and effective stochastic sub-gradient descent algorithm for solving the optimization problem cast by Support Vector Machines (SVM). We prove that the number of iterations required to obtain a solution of accuracy ɛ is Õ(1/ɛ), where each iteration operates on a single training example. In contrast, previous analyses of stochastic gradient descent methods for SVMs require Ω(1/ɛ2) iterations. As in previously devised SVM solvers, the number of iterations also scales linearly with 1/λ, where λ is the regularization parameter of SVM. For a linear kernel, the total run-time of our method is Õ(d/(λɛ)), where d is a bound on the number of non-zero features in each example. Since the run-time does not depend directly on the size of the training set, the resulting algorithm is especially suited for learning from large datasets. Our approach also extends to non-linear kernels while working solely on the primal objective function, though in this case the runtime does depend linearly on the training set size. Our algorithm is particularly well suited for large text classification problems, where we demonstrate an order-of-magnitude speedup over previous SVM learning methods.

Keyphrases

sub-gradient solver    large text classification problem    total run-time    large datasets    non-zero feature    regularization parameter    optimization problem cast    linear kernel    non-linear kernel    svms require    svm solver    single training example    training set    support vector machine    primal objective function    previous analysis    effective stochastic sub-gradient descent algorithm    previous svm learning method    order-of-magnitude speedup    stochastic gradient descent method   

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University