Results 1  10
of
154
Efficient BackProp
, 1998
"... . The convergence of backpropagation learning is analyzed so as to explain common phenomenon observed by practitioners. Many undesirable behaviors of backprop can be avoided with tricks that are rarely exposed in serious technical publications. This paper gives some of those tricks, and offers expl ..."
Abstract

Cited by 209 (31 self)
 Add to MetaCart
. The convergence of backpropagation learning is analyzed so as to explain common phenomenon observed by practitioners. Many undesirable behaviors of backprop can be avoided with tricks that are rarely exposed in serious technical publications. This paper gives some of those tricks, and offers explanations of why they work. Many authors have suggested that secondorder optimization methods are advantageous for neural net training. It is shown that most "classical" secondorder methods are impractical for large neural networks. A few methods are proposed that do not have these limitations. 1 Introduction Backpropagation is a very popular neural network learning algorithm because it is conceptually simple, computationally efficient, and because it often works. However, getting it to work well, and sometimes to work at all, can seem more of an art than a science. Designing and training a network using backprop requires making many seemingly arbitrary choices such as the number ...
First and SecondOrder Methods for Learning: between Steepest Descent and Newton's Method
 Neural Computation
, 1992
"... Online first order backpropagation is sufficiently fast and effective for many largescale classification problems but for very high precision mappings, batch processing may be the method of choice. This paper reviews first and secondorder optimization methods for learning in feedforward neura ..."
Abstract

Cited by 174 (7 self)
 Add to MetaCart
Online first order backpropagation is sufficiently fast and effective for many largescale classification problems but for very high precision mappings, batch processing may be the method of choice. This paper reviews first and secondorder optimization methods for learning in feedforward neural networks. The viewpoint is that of optimization: many methods can be cast in the language of optimization techniques, allowing the transfer to neural nets of detailed results about computational complexity and safety procedures to ensure convergence and to avoid numerical problems. The review is not intended to deliver detailed prescriptions for the most appropriate methods in specific applications, but to illustrate the main characteristics of the different methods and their mutual relations.
Local Gain Adaptation in Stochastic Gradient Descent
 In Proc. Intl. Conf. Artificial Neural Networks
, 1999
"... Gain adaptation algorithms for neural networks typically adjust learning rates by monitoring the correlation between successive gradients. Here we discuss the limitations of this approach, and develop an alternative by extending Sutton's work on linear systems to the general, nonlinear case. Th ..."
Abstract

Cited by 70 (12 self)
 Add to MetaCart
(Show Context)
Gain adaptation algorithms for neural networks typically adjust learning rates by monitoring the correlation between successive gradients. Here we discuss the limitations of this approach, and develop an alternative by extending Sutton's work on linear systems to the general, nonlinear case. The resulting online algorithms are computationally little more expensive than other acceleration techniques, do not assume statistical independence between successive training patterns, and do not require an arbitrary smoothing parameter. In our benchmark experiments, they consistently outperform other acceleration methods, and show remarkable robustness when faced with noni. i.d. sampling of the input space.
Improving the Rprop Learning Algorithm
 PROCEEDINGS OF THE SECOND INTERNATIONAL SYMPOSIUM ON NEURAL COMPUTATION (NC 2000)
, 2000
"... The Rprop algorithm proposed by Riedmiller and Braun is one of the best performing firstorder learning methods for neural networks. We introduce modifications of the algorithm that improve its learning speed. The resulting speedup is experimentally shown for a set of neural network learning tasks a ..."
Abstract

Cited by 57 (7 self)
 Add to MetaCart
The Rprop algorithm proposed by Riedmiller and Braun is one of the best performing firstorder learning methods for neural networks. We introduce modifications of the algorithm that improve its learning speed. The resulting speedup is experimentally shown for a set of neural network learning tasks as well as for artificial error surfaces.
Improving the convergence of the backpropagation algorithm using learning rate adaptation methods
 Neural Computation
, 1999
"... ..."
The Parallel Transfer of Task Knowledge Using Dynamic Learning Rates Based on a Measure of Relatedness
 Connection Science Special Issue: Transfer in Inductive Systems
, 1996
"... With a distinction made between two forms of task knowledge transfer, representational and functional, jMTL, a modified version of the MTL method of functional (parallel) transfer, is introduced. The jMTL method employs a separate learning rate, j k , for each task output node k. j k varies as a f ..."
Abstract

Cited by 36 (8 self)
 Add to MetaCart
With a distinction made between two forms of task knowledge transfer, representational and functional, jMTL, a modified version of the MTL method of functional (parallel) transfer, is introduced. The jMTL method employs a separate learning rate, j k , for each task output node k. j k varies as a function of a measure of relatedness, R k , between the kth task and the primary task of interest. Results of experiments demonstrate the ability of jMTL to dynamically select the most related source task(s) for the functional transfer of prior domain knowledge. The jMTL method of learning is nearly equivalent to standard MTL when all parallel tasks are sufficiently related to the primary task, and is similar to single task learning when none of the parallel tasks are related to the primary task. 1 Introduction The concepts and results presented here represent current work from our research into systems of artificial neural networks which use prior task knowledge to decrease the training t...
A neural network approach to offline signature verification using directional PDF
 Pattern Recognition
, 1996
"... AbstraetA neural network approach is proposed to build the first stage of an Automatic Handwritten Signature Verification System. The directional Probability Density Function was used as a global shape factor and its discriminating power was enhanced by reducing its cardinality via filtering. Vari ..."
Abstract

Cited by 29 (6 self)
 Add to MetaCart
AbstraetA neural network approach is proposed to build the first stage of an Automatic Handwritten Signature Verification System. The directional Probability Density Function was used as a global shape factor and its discriminating power was enhanced by reducing its cardinality via filtering. Various experimental protocols were used to implement the backpropagation network (BPN) classifier. A comparison, on the same database and with the same decision rule, shows that he BPN classifier isclearly better than the threshold classifier and compares favourably with the kNearestNeighbour classifier. Pattern recognition Classifiers Neural networks Backpropagation Automatic signature verification Directional probability density function
A Class of Gradient Unconstrained Minimization Algorithms With Adaptive Stepsize
, 1999
"... In this paper the development, convergence theory and numerical testing of a class of gradient unconstrained minimization algorithms with adaptive stepsize are presented. The proposed class comprises four algorithms: the first two incorporate techniques for the adaptation of a common stepsize for al ..."
Abstract

Cited by 28 (15 self)
 Add to MetaCart
In this paper the development, convergence theory and numerical testing of a class of gradient unconstrained minimization algorithms with adaptive stepsize are presented. The proposed class comprises four algorithms: the first two incorporate techniques for the adaptation of a common stepsize for all coordinate directions and the other two allow an individual adaptive stepsize along each coordinate direction. All the algorithms are computationally efficient and possess interesting convergence properties utilizing estimates of the Lipschitz constant that are obtained without additional function or gradient evaluations. The algorithms have been implemented and tested on some wellknown test cases as well as on reallife artificial neural network applications and the results have been very satisfactory.