Results 1 - 10
of
77
First and Second-Order Methods for Learning: between Steepest Descent and Newton's Method
- Neural Computation
, 1992
"... On-line first order backpropagation is sufficiently fast and effective for many large-scale classification problems but for very high precision mappings, batch processing may be the method of choice. This paper reviews first- and second-order optimization methods for learning in feedforward neura ..."
Abstract
-
Cited by 108 (6 self)
- Add to MetaCart
On-line first order backpropagation is sufficiently fast and effective for many large-scale classification problems but for very high precision mappings, batch processing may be the method of choice. This paper reviews first- and second-order optimization methods for learning in feedforward neural networks. The viewpoint is that of optimization: many methods can be cast in the language of optimization techniques, allowing the transfer to neural nets of detailed results about computational complexity and safety procedures to ensure convergence and to avoid numerical problems. The review is not intended to deliver detailed prescriptions for the most appropriate methods in specific applications, but to illustrate the main characteristics of the different methods and their mutual relations.
Efficient Back Prop
, 1996
"... HINE Parameters X0, X1, ....Xp Output E0, E1,....Ep Error Desired Output D0, D1,...Dp Y0, Y1,...Yp Input w w0 w1 AT&T Laboratories (c) COST FUNCTION Output E0, E1,....Ep Error Desired Output D0, D1,...Dp Y0, Y1,...Yp X0, X1, ....Xp Input Parameters w B R A COMPUTING THE GRADIENT WITH BACKPROPAGATIO ..."
Abstract
-
Cited by 93 (16 self)
- Add to MetaCart
HINE Parameters X0, X1, ....Xp Output E0, E1,....Ep Error Desired Output D0, D1,...Dp Y0, Y1,...Yp Input w w0 w1 AT&T Laboratories (c) COST FUNCTION Output E0, E1,....Ep Error Desired Output D0, D1,...Dp Y0, Y1,...Yp X0, X1, ....Xp Input Parameters w B R A COMPUTING THE GRADIENT WITH BACKPROPAGATION O = A(I1, I2) dI1 = dO ¶ A ¶ I1 dI2 = dO ¶ A ¶ I2 - The learning machine is composed of modules (e.g. layers) - Each module can do two things: 1- compute its outputs from its inputs (FPROP) 2- compute gradient vectors at its inputs from gradient vectors at its outputs (BPROP) A O, dO I1, dI1 I2, dI2 AT&T Laboratories (c) AN INTERESTING SPECIAL CASE: MULTILAYER NETWORKS X0, X1, ....Xp Output Desired Output D0, D1,...Dp Y0, Y1,...Yp Input || D - Y || 2 2 1 WX F() WX F() Mean Square Error Parameters (weights + biases) w Weight matrix E0, E1,....Ep Sigmoids + Biase
Neural Network Toolbox For Use with Matlab
, 1993
"... this document is furnished under a license agreement. The software may be used or copied only under the terms of the license agreement. No part of this manual may be photocopied or reproduced in any form without prior written consent from The MathWorks, Inc. ..."
Abstract
-
Cited by 82 (0 self)
- Add to MetaCart
this document is furnished under a license agreement. The software may be used or copied only under the terms of the license agreement. No part of this manual may be photocopied or reproduced in any form without prior written consent from The MathWorks, Inc.
Local Gain Adaptation in Stochastic Gradient Descent
- In Proc. Intl. Conf. Artificial Neural Networks
, 1999
"... Gain adaptation algorithms for neural networks typically adjust learning rates by monitoring the correlation between successive gradients. Here we discuss the limitations of this approach, and develop an alternative by extending Sutton's work on linear systems to the general, nonlinear case. The res ..."
Abstract
-
Cited by 42 (9 self)
- Add to MetaCart
Gain adaptation algorithms for neural networks typically adjust learning rates by monitoring the correlation between successive gradients. Here we discuss the limitations of this approach, and develop an alternative by extending Sutton's work on linear systems to the general, nonlinear case. The resulting online algorithms are computationally little more expensive than other acceleration techniques, do not assume statistical independence between successive training patterns, and do not require an arbitrary smoothing parameter. In our benchmark experiments, they consistently outperform other acceleration methods, and show remarkable robustness when faced with noni. i.d. sampling of the input space.
Improving the Rprop Learning Algorithm
- PROCEEDINGS OF THE SECOND INTERNATIONAL SYMPOSIUM ON NEURAL COMPUTATION (NC 2000)
, 2000
"... The Rprop algorithm proposed by Riedmiller and Braun is one of the best performing first-order learning methods for neural networks. We introduce modifications of the algorithm that improve its learning speed. The resulting speedup is experimentally shown for a set of neural network learning tasks a ..."
Abstract
-
Cited by 35 (7 self)
- Add to MetaCart
The Rprop algorithm proposed by Riedmiller and Braun is one of the best performing first-order learning methods for neural networks. We introduce modifications of the algorithm that improve its learning speed. The resulting speedup is experimentally shown for a set of neural network learning tasks as well as for artificial error surfaces.
Improving the Convergence of the Backpropagation Algorithm Using Learning Rate Adaptation Methods
, 1999
"... This article focuses on gradient-based backpropagation algorithms that use either a common adaptive learning rate for all weights or an individual adaptive learning rate for each weight and apply the Goldstein/Armijo line search. The learning-rate adaptation is based on descent techniques and estima ..."
Abstract
-
Cited by 19 (13 self)
- Add to MetaCart
This article focuses on gradient-based backpropagation algorithms that use either a common adaptive learning rate for all weights or an individual adaptive learning rate for each weight and apply the Goldstein/Armijo line search. The learning-rate adaptation is based on descent techniques and estimates of the local Lipschitz constant that are obtained without additional error function and gradient evaluations. The proposed algorithms improve the backpropagation training in terms of both convergence rate and convergence characteristics, such as stable learning and robustness to oscillations. Simulations are conducted to compare and evaluate the convergence behavior of these gradient-based training algorithms with several popular training methods.
The Parallel Transfer of Task Knowledge Using Dynamic Learning Rates Based on a Measure of Relatedness
- Connection Science Special Issue: Transfer in Inductive Systems
, 1996
"... With a distinction made between two forms of task knowledge transfer, representational and functional, jMTL, a modified version of the MTL method of functional (parallel) transfer, is introduced. The jMTL method employs a separate learning rate, j k , for each task output node k. j k varies as a f ..."
Abstract
-
Cited by 17 (3 self)
- Add to MetaCart
With a distinction made between two forms of task knowledge transfer, representational and functional, jMTL, a modified version of the MTL method of functional (parallel) transfer, is introduced. The jMTL method employs a separate learning rate, j k , for each task output node k. j k varies as a function of a measure of relatedness, R k , between the kth task and the primary task of interest. Results of experiments demonstrate the ability of jMTL to dynamically select the most related source task(s) for the functional transfer of prior domain knowledge. The jMTL method of learning is nearly equivalent to standard MTL when all parallel tasks are sufficiently related to the primary task, and is similar to single task learning when none of the parallel tasks are related to the primary task. 1 Introduction The concepts and results presented here represent current work from our research into systems of artificial neural networks which use prior task knowledge to decrease the training t...
A Class of Gradient Unconstrained Minimization Algorithms With Adaptive Stepsize
, 1999
"... In this paper the development, convergence theory and numerical testing of a class of gradient unconstrained minimization algorithms with adaptive stepsize are presented. The proposed class comprises four algorithms: the first two incorporate techniques for the adaptation of a common stepsize for al ..."
Abstract
-
Cited by 13 (11 self)
- Add to MetaCart
In this paper the development, convergence theory and numerical testing of a class of gradient unconstrained minimization algorithms with adaptive stepsize are presented. The proposed class comprises four algorithms: the first two incorporate techniques for the adaptation of a common stepsize for all coordinate directions and the other two allow an individual adaptive stepsize along each coordinate direction. All the algorithms are computationally efficient and possess interesting convergence properties utilizing estimates of the Lipschitz constant that are obtained without additional function or gradient evaluations. The algorithms have been implemented and tested on some well-known test cases as well as on real-life artificial neural network applications and the results have been very satisfactory.

