Results 1  10
of
1,009
A scaled conjugate gradient algorithm for fast supervised learning
 NEURAL NETWORKS
, 1993
"... A supervised learning algorithm (Scaled Conjugate Gradient, SCG) with superlinear convergence rate is introduced. The algorithm is based upon a class of optimization techniques well known in numerical analysis as the Conjugate Gradient Methods. SCG uses second order information from the neural netwo ..."
Abstract

Cited by 451 (0 self)
 Add to MetaCart
A supervised learning algorithm (Scaled Conjugate Gradient, SCG) with superlinear convergence rate is introduced. The algorithm is based upon a class of optimization techniques well known in numerical analysis as the Conjugate Gradient Methods. SCG uses second order information from the neural
Pegasos: Primal Estimated subgradient solver for SVM
"... We describe and analyze a simple and effective stochastic subgradient descent algorithm for solving the optimization problem cast by Support Vector Machines (SVM). We prove that the number of iterations required to obtain a solution of accuracy ɛ is Õ(1/ɛ), where each iteration operates on a singl ..."
Abstract

Cited by 542 (20 self)
 Add to MetaCart
single training example. In contrast, previous analyses of stochastic gradient descent methods for SVMs require Ω(1/ɛ2) iterations. As in previously devised SVM solvers, the number of iterations also scales linearly with 1/λ, where λ is the regularization parameter of SVM. For a linear kernel, the total
ATOMIC DECOMPOSITION BY BASIS PURSUIT
, 1995
"... The TimeFrequency and TimeScale communities have recently developed a large number of overcomplete waveform dictionaries  stationary wavelets, wavelet packets, cosine packets, chirplets, and warplets, to name a few. Decomposition into overcomplete systems is not unique, and several methods for d ..."
Abstract

Cited by 2728 (61 self)
 Add to MetaCart
successfully only because of recent advances in linear programming by interiorpoint methods. We obtain reasonable success with a primaldual logarithmic barrier method and conjugategradient solver.
Largescale machine learning with stochastic gradient descent
 in COMPSTAT
, 2010
"... Abstract. During the last decade, the data sizes have grown faster than the speed of processors. In this context, the capabilities of statistical machine learning methods is limited by the computing time rather than the sample size. A more precise analysis uncovers qualitatively different tradeoffs ..."
Abstract

Cited by 163 (1 self)
 Add to MetaCart
for the case of smallscale and largescale learning problems. The largescale case involves the computational complexity of the underlying optimization algorithm in nontrivial ways. Unlikely optimization algorithms such as stochastic gradient descent show amazing performance for largescale problems
Algorithm 851: CG DESCENT, a conjugate gradient method with guaranteed descent
 ACM Trans. Math. Softw
, 2006
"... Recently, a new nonlinear conjugate gradient scheme was developed which satisfies the descent condition gT kdk ≤ − 7 8 ‖gk‖2 and which is globally convergent whenever the line search fulfills the Wolfe conditions. This article studies the convergence behavior of the algorithm; extensive numerical t ..."
Abstract

Cited by 25 (4 self)
 Add to MetaCart
Recently, a new nonlinear conjugate gradient scheme was developed which satisfies the descent condition gT kdk ≤ − 7 8 ‖gk‖2 and which is globally convergent whenever the line search fulfills the Wolfe conditions. This article studies the convergence behavior of the algorithm; extensive numerical
An interiorpoint method for largescale l1regularized logistic regression
 Journal of Machine Learning Research
, 2007
"... Logistic regression with ℓ1 regularization has been proposed as a promising method for feature selection in classification problems. In this paper we describe an efficient interiorpoint method for solving largescale ℓ1regularized logistic regression problems. Small problems with up to a thousand ..."
Abstract

Cited by 290 (9 self)
 Add to MetaCart
or so features and examples can be solved in seconds on a PC; medium sized problems, with tens of thousands of features and examples, can be solved in tens of seconds (assuming some sparsity in the data). A variation on the basic method, that uses a preconditioned conjugate gradient method to compute
Optimization Schemes for Neural Networks
 Cambridge Univ. Eng. Dept., U.K., Tech. Rep. CUED/FINFENG/TR
, 1993
"... Training neural networks need not be a slow, computationally expensive process. The reason it is seen as such might be the traditional emphasis on gradient descent for optimization. Conjugate gradient descent is an efficient optimization scheme for the weights of neural networks. This work includes ..."
Abstract
 Add to MetaCart
. The calculation is exact and computationally cheap. The report is in the nature of a tutorial. Gradient descent is reviewed and the backpropagation algorithm, used to find the gradients, is derived. Then a number of alternative optimization strategies are described: ffl Conjugate gradient descent ffl Scaled
A Comparison of Algorithms for Maximum Entropy Parameter Estimation
"... A comparison of algorithms for maximum entropy parameter estimation Conditional maximum entropy (ME) models provide a general purpose machine learning technique which has been successfully applied to fields as diverse as computer vision and econometrics, and which is used for a wide variety of class ..."
Abstract

Cited by 290 (2 self)
 Add to MetaCart
parameters. In this paper, we consider a number of algorithms for estimating the parameters of ME models, including iterative scaling, gradient ascent, conjugate gradient, and variable metric methods. Surprisingly, the standardly used iterative scaling algorithms perform quite poorly in comparison
Conjugate Directions for Stochastic Gradient Descent
 In Dorronsoro (2002
, 2002
"... The method of conjugate gradients provides a very eective way to optimize large, deterministic systems by gradient descent. In its standard form, however, it is not amenable to stochastic approximation of the gradient. Here we explore ideas from conjugate gradient in the stochastic (online) sett ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
The method of conjugate gradients provides a very eective way to optimize large, deterministic systems by gradient descent. In its standard form, however, it is not amenable to stochastic approximation of the gradient. Here we explore ideas from conjugate gradient in the stochastic (online
Particle Swarm Weight Initialization In MultiLayer Perceptron Artificial Neural Networks
 Proceedings of the 1991 International Conference on Artificial Neural Networks, ICANN91
, 1999
"... Many training algorithms (like gradient descent, for example) use random initial weights. These algorithms are rather sensitive to their starting position in the error space, which is represented by their initial weights. This paper shows that the training performance can be improved significantly b ..."
Abstract
 Add to MetaCart
this type of network is the training phase, which can be error prone and slow, due to its nonlinear nature. Many powerful optimization algorithms have been devised, most of which have been based on the simple gradient descent algorithm. Examples of these include the conjugate gradient descent, scaled
Results 1  10
of
1,009