Results 1 - 10
of
1,009
A scaled conjugate gradient algorithm for fast supervised learning
- NEURAL NETWORKS
, 1993
"... A supervised learning algorithm (Scaled Conjugate Gradient, SCG) with superlinear convergence rate is introduced. The algorithm is based upon a class of optimization techniques well known in numerical analysis as the Conjugate Gradient Methods. SCG uses second order information from the neural netwo ..."
Abstract
-
Cited by 451 (0 self)
- Add to MetaCart
A supervised learning algorithm (Scaled Conjugate Gradient, SCG) with superlinear convergence rate is introduced. The algorithm is based upon a class of optimization techniques well known in numerical analysis as the Conjugate Gradient Methods. SCG uses second order information from the neural
Pegasos: Primal Estimated sub-gradient solver for SVM
"... We describe and analyze a simple and effective stochastic sub-gradient descent algorithm for solving the optimization problem cast by Support Vector Machines (SVM). We prove that the number of iterations required to obtain a solution of accuracy ɛ is Õ(1/ɛ), where each iteration operates on a singl ..."
Abstract
-
Cited by 542 (20 self)
- Add to MetaCart
single training example. In contrast, previous analyses of stochastic gradient descent methods for SVMs require Ω(1/ɛ2) iterations. As in previously devised SVM solvers, the number of iterations also scales linearly with 1/λ, where λ is the regularization parameter of SVM. For a linear kernel, the total
ATOMIC DECOMPOSITION BY BASIS PURSUIT
, 1995
"... The Time-Frequency and Time-Scale communities have recently developed a large number of overcomplete waveform dictionaries -- stationary wavelets, wavelet packets, cosine packets, chirplets, and warplets, to name a few. Decomposition into overcomplete systems is not unique, and several methods for d ..."
Abstract
-
Cited by 2728 (61 self)
- Add to MetaCart
successfully only because of recent advances in linear programming by interior-point methods. We obtain reasonable success with a primal-dual logarithmic barrier method and conjugate-gradient solver.
Large-scale machine learning with stochastic gradient descent
- in COMPSTAT
, 2010
"... Abstract. During the last decade, the data sizes have grown faster than the speed of processors. In this context, the capabilities of statistical machine learning methods is limited by the computing time rather than the sample size. A more precise analysis uncovers qualitatively different tradeoffs ..."
Abstract
-
Cited by 163 (1 self)
- Add to MetaCart
for the case of small-scale and large-scale learning problems. The large-scale case involves the computational complexity of the underlying optimization algorithm in non-trivial ways. Unlikely optimization algorithms such as stochastic gradient descent show amazing performance for large-scale problems
Algorithm 851: CG DESCENT, a conjugate gradient method with guaranteed descent
- ACM Trans. Math. Softw
, 2006
"... Recently, a new nonlinear conjugate gradient scheme was developed which satisfies the descent condition gT kdk ≤ − 7 8 ‖gk‖2 and which is globally convergent whenever the line search fulfills the Wolfe conditions. This article studies the convergence behavior of the algorithm; extensive numerical t ..."
Abstract
-
Cited by 25 (4 self)
- Add to MetaCart
Recently, a new nonlinear conjugate gradient scheme was developed which satisfies the descent condition gT kdk ≤ − 7 8 ‖gk‖2 and which is globally convergent whenever the line search fulfills the Wolfe conditions. This article studies the convergence behavior of the algorithm; extensive numerical
An interior-point method for large-scale l1-regularized logistic regression
- Journal of Machine Learning Research
, 2007
"... Logistic regression with ℓ1 regularization has been proposed as a promising method for feature selection in classification problems. In this paper we describe an efficient interior-point method for solving large-scale ℓ1-regularized logistic regression problems. Small problems with up to a thousand ..."
Abstract
-
Cited by 290 (9 self)
- Add to MetaCart
or so features and examples can be solved in seconds on a PC; medium sized problems, with tens of thousands of features and examples, can be solved in tens of seconds (assuming some sparsity in the data). A variation on the basic method, that uses a preconditioned conjugate gradient method to compute
Optimization Schemes for Neural Networks
- Cambridge Univ. Eng. Dept., U.K., Tech. Rep. CUED/FINFENG/TR
, 1993
"... Training neural networks need not be a slow, computationally expensive process. The reason it is seen as such might be the traditional emphasis on gradient descent for optimization. Conjugate gradient descent is an efficient optimization scheme for the weights of neural networks. This work includes ..."
Abstract
- Add to MetaCart
. The calculation is exact and computationally cheap. The report is in the nature of a tutorial. Gradient descent is reviewed and the backpropagation algorithm, used to find the gradients, is derived. Then a number of alternative optimization strategies are described: ffl Conjugate gradient descent ffl Scaled
A Comparison of Algorithms for Maximum Entropy Parameter Estimation
"... A comparison of algorithms for maximum entropy parameter estimation Conditional maximum entropy (ME) models provide a general purpose machine learning technique which has been successfully applied to fields as diverse as computer vision and econometrics, and which is used for a wide variety of class ..."
Abstract
-
Cited by 290 (2 self)
- Add to MetaCart
parameters. In this paper, we consider a number of algorithms for estimating the parameters of ME models, including iterative scaling, gradient ascent, conjugate gradient, and variable metric methods. Surprisingly, the standardly used iterative scaling algorithms perform quite poorly in comparison
Conjugate Directions for Stochastic Gradient Descent
- In Dorronsoro (2002
, 2002
"... The method of conjugate gradients provides a very eective way to optimize large, deterministic systems by gradient descent. In its standard form, however, it is not amenable to stochastic approximation of the gradient. Here we explore ideas from conjugate gradient in the stochastic (online) sett ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
The method of conjugate gradients provides a very eective way to optimize large, deterministic systems by gradient descent. In its standard form, however, it is not amenable to stochastic approximation of the gradient. Here we explore ideas from conjugate gradient in the stochastic (online
Particle Swarm Weight Initialization In Multi-Layer Perceptron Artificial Neural Networks
- Proceedings of the 1991 International Conference on Artificial Neural Networks, ICANN-91
, 1999
"... Many training algorithms (like gradient descent, for example) use random initial weights. These algorithms are rather sensitive to their starting position in the error space, which is represented by their initial weights. This paper shows that the training performance can be improved significantly b ..."
Abstract
- Add to MetaCart
this type of network is the training phase, which can be error prone and slow, due to its non-linear nature. Many powerful optimization algorithms have been devised, most of which have been based on the simple gradient descent algorithm. Examples of these include the conjugate gradient descent, scaled
Results 1 - 10
of
1,009