• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations

Tools

Sorted by:
Try your query at:
Semantic Scholar Scholar Academic
Google Bing DBLP
Results 1 - 10 of 1,009
Next 10 →

A scaled conjugate gradient algorithm for fast supervised learning

by Martin F. Møller - NEURAL NETWORKS , 1993
"... A supervised learning algorithm (Scaled Conjugate Gradient, SCG) with superlinear convergence rate is introduced. The algorithm is based upon a class of optimization techniques well known in numerical analysis as the Conjugate Gradient Methods. SCG uses second order information from the neural netwo ..."
Abstract - Cited by 451 (0 self) - Add to MetaCart
A supervised learning algorithm (Scaled Conjugate Gradient, SCG) with superlinear convergence rate is introduced. The algorithm is based upon a class of optimization techniques well known in numerical analysis as the Conjugate Gradient Methods. SCG uses second order information from the neural

Pegasos: Primal Estimated sub-gradient solver for SVM

by Shai Shalev-Shwartz, Yoram Singer, Nathan Srebro, Andrew Cotter
"... We describe and analyze a simple and effective stochastic sub-gradient descent algorithm for solving the optimization problem cast by Support Vector Machines (SVM). We prove that the number of iterations required to obtain a solution of accuracy ɛ is Õ(1/ɛ), where each iteration operates on a singl ..."
Abstract - Cited by 542 (20 self) - Add to MetaCart
single training example. In contrast, previous analyses of stochastic gradient descent methods for SVMs require Ω(1/ɛ2) iterations. As in previously devised SVM solvers, the number of iterations also scales linearly with 1/λ, where λ is the regularization parameter of SVM. For a linear kernel, the total

ATOMIC DECOMPOSITION BY BASIS PURSUIT

by Scott Shaobing Chen , David L. Donoho , Michael A. Saunders , 1995
"... The Time-Frequency and Time-Scale communities have recently developed a large number of overcomplete waveform dictionaries -- stationary wavelets, wavelet packets, cosine packets, chirplets, and warplets, to name a few. Decomposition into overcomplete systems is not unique, and several methods for d ..."
Abstract - Cited by 2728 (61 self) - Add to MetaCart
successfully only because of recent advances in linear programming by interior-point methods. We obtain reasonable success with a primal-dual logarithmic barrier method and conjugate-gradient solver.

Large-scale machine learning with stochastic gradient descent

by Léon Bottou - in COMPSTAT , 2010
"... Abstract. During the last decade, the data sizes have grown faster than the speed of processors. In this context, the capabilities of statistical machine learning methods is limited by the computing time rather than the sample size. A more precise analysis uncovers qualitatively different tradeoffs ..."
Abstract - Cited by 163 (1 self) - Add to MetaCart
for the case of small-scale and large-scale learning problems. The large-scale case involves the computational complexity of the underlying optimization algorithm in non-trivial ways. Unlikely optimization algorithms such as stochastic gradient descent show amazing performance for large-scale problems

Algorithm 851: CG DESCENT, a conjugate gradient method with guaranteed descent

by William W. Hager, Hongchao Zhang - ACM Trans. Math. Softw , 2006
"... Recently, a new nonlinear conjugate gradient scheme was developed which satisfies the descent condition gT kdk ≤ − 7 8 ‖gk‖2 and which is globally convergent whenever the line search fulfills the Wolfe conditions. This article studies the convergence behavior of the algorithm; extensive numerical t ..."
Abstract - Cited by 25 (4 self) - Add to MetaCart
Recently, a new nonlinear conjugate gradient scheme was developed which satisfies the descent condition gT kdk ≤ − 7 8 ‖gk‖2 and which is globally convergent whenever the line search fulfills the Wolfe conditions. This article studies the convergence behavior of the algorithm; extensive numerical

An interior-point method for large-scale l1-regularized logistic regression

by Kwangmoo Koh, Seung-jean Kim, Stephen Boyd, Yi Lin - Journal of Machine Learning Research , 2007
"... Logistic regression with ℓ1 regularization has been proposed as a promising method for feature selection in classification problems. In this paper we describe an efficient interior-point method for solving large-scale ℓ1-regularized logistic regression problems. Small problems with up to a thousand ..."
Abstract - Cited by 290 (9 self) - Add to MetaCart
or so features and examples can be solved in seconds on a PC; medium sized problems, with tens of thousands of features and examples, can be solved in tens of seconds (assuming some sparsity in the data). A variation on the basic method, that uses a preconditioned conjugate gradient method to compute

Optimization Schemes for Neural Networks

by Canbridge Cb Pz, T. T. Jervis, T. T. Jervis, W.J. Fitzgerald, W. J. Fitzgerald - Cambridge Univ. Eng. Dept., U.K., Tech. Rep. CUED/FINFENG/TR , 1993
"... Training neural networks need not be a slow, computationally expensive process. The reason it is seen as such might be the traditional emphasis on gradient descent for optimization. Conjugate gradient descent is an efficient optimization scheme for the weights of neural networks. This work includes ..."
Abstract - Add to MetaCart
. The calculation is exact and computationally cheap. The report is in the nature of a tutorial. Gradient descent is reviewed and the backpropagation algorithm, used to find the gradients, is derived. Then a number of alternative optimization strategies are described: ffl Conjugate gradient descent ffl Scaled

A Comparison of Algorithms for Maximum Entropy Parameter Estimation

by Robert Malouf
"... A comparison of algorithms for maximum entropy parameter estimation Conditional maximum entropy (ME) models provide a general purpose machine learning technique which has been successfully applied to fields as diverse as computer vision and econometrics, and which is used for a wide variety of class ..."
Abstract - Cited by 290 (2 self) - Add to MetaCart
parameters. In this paper, we consider a number of algorithms for estimating the parameters of ME models, including iterative scaling, gradient ascent, conjugate gradient, and variable metric methods. Surprisingly, the standardly used iterative scaling algorithms perform quite poorly in comparison

Conjugate Directions for Stochastic Gradient Descent

by Nicol N. Schraudolph, Thore Graepel - In Dorronsoro (2002 , 2002
"... The method of conjugate gradients provides a very eective way to optimize large, deterministic systems by gradient descent. In its standard form, however, it is not amenable to stochastic approximation of the gradient. Here we explore ideas from conjugate gradient in the stochastic (online) sett ..."
Abstract - Cited by 6 (2 self) - Add to MetaCart
The method of conjugate gradients provides a very eective way to optimize large, deterministic systems by gradient descent. In its standard form, however, it is not amenable to stochastic approximation of the gradient. Here we explore ideas from conjugate gradient in the stochastic (online

Particle Swarm Weight Initialization In Multi-Layer Perceptron Artificial Neural Networks

by Perceptron Artificial, Frans van den Bergh - Proceedings of the 1991 International Conference on Artificial Neural Networks, ICANN-91 , 1999
"... Many training algorithms (like gradient descent, for example) use random initial weights. These algorithms are rather sensitive to their starting position in the error space, which is represented by their initial weights. This paper shows that the training performance can be improved significantly b ..."
Abstract - Add to MetaCart
this type of network is the training phase, which can be error prone and slow, due to its non-linear nature. Many powerful optimization algorithms have been devised, most of which have been based on the simple gradient descent algorithm. Examples of these include the conjugate gradient descent, scaled
Next 10 →
Results 1 - 10 of 1,009
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University