• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

DMCA

Pegasos: Primal estimated sub-gradient solver for SVM (2007)

Cached

  • Download as a PDF

Download Links

  • [www.cs.huji.ac.il]
  • [eprints.pascal-network.org]
  • [www.robots.ox.ac.uk]
  • [ttic.uchicago.edu]
  • [ttic.uchicago.edu]
  • [www.magicbroom.info]
  • [imls.engr.oregonstate.edu]
  • [www.cs.huji.ac.il]
  • [ttic.uchicago.edu]
  • [www.machinelearning.org]
  • [www.cs.huji.ac.il]
  • [ttic.uchicago.edu]
  • [eprints.pascal-network.org]
  • [alliance.seas.upenn.edu]
  • [ttic.uchicago.edu]
  • [www.cs.huji.ac.il]
  • [ttic.uchicago.edu]
  • [www.cs.huji.ac.il]
  • [www.cs.cmu.edu]

  • Other Repositories/Bibliography

  • DBLP
  • Save to List
  • Add to Collection
  • Correct Errors
  • Monitor Changes
by Shai Shalev-Shwartz , Yoram Singer , Nathan Srebro
Venue:IN ICML
Citations:542 - 20 self
  • Summary
  • Citations
  • Active Bibliography
  • Co-citation
  • Clustered Documents
  • Version History

BibTeX

@INPROCEEDINGS{Shalev-Shwartz07pegasos:primal,
    author = {Shai Shalev-Shwartz and Yoram Singer and Nathan Srebro},
    title = {Pegasos: Primal estimated sub-gradient solver for SVM},
    booktitle = {IN ICML},
    year = {2007},
    pages = {807--814},
    publisher = {}
}

Share

Facebook Twitter Reddit Bibsonomy

OpenURL

 

Abstract

We describe and analyze a simple and effective iterative algorithm for solving the optimization problem cast by Support Vector Machines (SVM). Our method alternates between stochastic gradient descent steps and projection steps. We prove that the number of iterations required to obtain a solution of accuracy ǫ is Õ(1/ǫ). In contrast, previous analyses of stochastic gradient descent methods require Ω(1/ǫ2) iterations. As in previously devised SVM solvers, the number of iterations also scales linearly with 1/λ, where λ is the regularization parameter of SVM. For a linear kernel, the total run-time of our method is Õ(d/(λǫ)), where d is a bound on the number of non-zero features in each example. Since the run-time does not depend directly on the size of the training set, the resulting algorithm is especially suited for learning from large datasets. Our approach can seamlessly be adapted to employ non-linear kernels while working solely on the primal objective function. We demonstrate the efficiency and applicability of our approach by conducting experiments on large text classification problems, comparing our solver to existing state-of-the-art SVM solvers. For example, it takes less than 5 seconds for our solver to converge when solving a text classification problem from Reuters Corpus Volume 1 (RCV1) with 800,000 training examples.

Keyphrases

sub-gradient solver    svm solver    previous analysis    stochastic gradient descent method    large text classification problem    total run-time    optimization problem cast    non-zero feature    linear kernel    large datasets    regularization parameter    primal objective function    projection step    effective iterative algorithm    training example    text classification problem    training set    support vector machine    reuters corpus volume    state-of-the-art svm solver    non-linear kernel    stochastic gradient descent step   

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University