Results 1  10
of
20
First and SecondOrder Methods for Learning: between Steepest Descent and Newton's Method
 Neural Computation
, 1992
"... Online first order backpropagation is sufficiently fast and effective for many largescale classification problems but for very high precision mappings, batch processing may be the method of choice. This paper reviews first and secondorder optimization methods for learning in feedforward neura ..."
Abstract

Cited by 126 (6 self)
 Add to MetaCart
Online first order backpropagation is sufficiently fast and effective for many largescale classification problems but for very high precision mappings, batch processing may be the method of choice. This paper reviews first and secondorder optimization methods for learning in feedforward neural networks. The viewpoint is that of optimization: many methods can be cast in the language of optimization techniques, allowing the transfer to neural nets of detailed results about computational complexity and safety procedures to ensure convergence and to avoid numerical problems. The review is not intended to deliver detailed prescriptions for the most appropriate methods in specific applications, but to illustrate the main characteristics of the different methods and their mutual relations.
Efficient BackProp
, 1998
"... . The convergence of backpropagation learning is analyzed so as to explain common phenomenon observed by practitioners. Many undesirable behaviors of backprop can be avoided with tricks that are rarely exposed in serious technical publications. This paper gives some of those tricks, and offers expl ..."
Abstract

Cited by 125 (24 self)
 Add to MetaCart
. The convergence of backpropagation learning is analyzed so as to explain common phenomenon observed by practitioners. Many undesirable behaviors of backprop can be avoided with tricks that are rarely exposed in serious technical publications. This paper gives some of those tricks, and offers explanations of why they work. Many authors have suggested that secondorder optimization methods are advantageous for neural net training. It is shown that most "classical" secondorder methods are impractical for large neural networks. A few methods are proposed that do not have these limitations. 1 Introduction Backpropagation is a very popular neural network learning algorithm because it is conceptually simple, computationally efficient, and because it often works. However, getting it to work well, and sometimes to work at all, can seem more of an art than a science. Designing and training a network using backprop requires making many seemingly arbitrary choices such as the number ...
Local Gain Adaptation in Stochastic Gradient Descent
 In Proc. Intl. Conf. Artificial Neural Networks
, 1999
"... Gain adaptation algorithms for neural networks typically adjust learning rates by monitoring the correlation between successive gradients. Here we discuss the limitations of this approach, and develop an alternative by extending Sutton's work on linear systems to the general, nonlinear case. The res ..."
Abstract

Cited by 58 (13 self)
 Add to MetaCart
Gain adaptation algorithms for neural networks typically adjust learning rates by monitoring the correlation between successive gradients. Here we discuss the limitations of this approach, and develop an alternative by extending Sutton's work on linear systems to the general, nonlinear case. The resulting online algorithms are computationally little more expensive than other acceleration techniques, do not assume statistical independence between successive training patterns, and do not require an arbitrary smoothing parameter. In our benchmark experiments, they consistently outperform other acceleration methods, and show remarkable robustness when faced with noni. i.d. sampling of the input space.
Comparison of Optimized Backpropagation Algorithms
 Proc. of ESANN'93, Brussels
, 1993
"... Backpropagation is one of the most famous training algorithms for multilayer perceptrons. Unfortunately it can be very slow for practical applications. Over the last years many improvement strategies have been developed to speed up backpropagation. It's very difficult to compare these different tech ..."
Abstract

Cited by 39 (1 self)
 Add to MetaCart
Backpropagation is one of the most famous training algorithms for multilayer perceptrons. Unfortunately it can be very slow for practical applications. Over the last years many improvement strategies have been developed to speed up backpropagation. It's very difficult to compare these different techniques, because most of them have been tested on various specific data sets. Most of the reported results are based on some kind of tiny and artificial training sets like XOR, encoder or decoder. It's very doubtful if these results hold for more complicate practical application. In this report an overview of many different speedup techniques is given. All of them were assessed by a very hard practical classification task, which consists of a big medical data set. As you will see many of these optimized algorithms fail in learning the data set. 1 Introduction This report is intended to summarize our experience using many different speedup techniques for the backpropagation algorithm. We have...
Improving the Convergence of the Backpropagation Algorithm Using Learning Rate Adaptation Methods
, 1999
"... This article focuses on gradientbased backpropagation algorithms that use either a common adaptive learning rate for all weights or an individual adaptive learning rate for each weight and apply the Goldstein/Armijo line search. The learningrate adaptation is based on descent techniques and estima ..."
Abstract

Cited by 27 (15 self)
 Add to MetaCart
This article focuses on gradientbased backpropagation algorithms that use either a common adaptive learning rate for all weights or an individual adaptive learning rate for each weight and apply the Goldstein/Armijo line search. The learningrate adaptation is based on descent techniques and estimates of the local Lipschitz constant that are obtained without additional error function and gradient evaluations. The proposed algorithms improve the backpropagation training in terms of both convergence rate and convergence characteristics, such as stable learning and robustness to oscillations. Simulations are conducted to compare and evaluate the convergence behavior of these gradientbased training algorithms with several popular training methods.
Transferring Previously Learned BackPropagation Neural Networks To New Learning Tasks
, 1993
"... ..."
Nonmonotone Methods for Backpropagation Training with Adaptive Learning Rate
 In: Proceedings of the IEEE International Joint Conference on Neural Networks (IJCNN’99). Washington D.C
, 1999
"... In this paper, we present nonmonotone methods for feedforward neural network training, i.e. training methods in which error function values are allowed to increase at some iterations. More specifically, at each epoch we impose that the current error function value must satisfy an Armijotype criteri ..."
Abstract

Cited by 13 (9 self)
 Add to MetaCart
In this paper, we present nonmonotone methods for feedforward neural network training, i.e. training methods in which error function values are allowed to increase at some iterations. More specifically, at each epoch we impose that the current error function value must satisfy an Armijotype criterion, with respect to the maximum error function value of M previous epochs. A strategy to dynamically adapt M is suggested and two training algorithms with adaptive learning rates that successfully employ the above mentioned acceptability criterion are proposed. Experimental results show that the nonmonotone learning strategy improves the convergence speed and the success rate of the methods considered.
Efficient Training of FeedForward Neural Networks
, 1997
"... : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 61 A.2 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 61 A.2.1 Motivation : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 61 A.3 Optimization strategy : : : : : : : : : : : : ..."
Abstract

Cited by 12 (0 self)
 Add to MetaCart
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 61 A.2 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 61 A.2.1 Motivation : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 61 A.3 Optimization strategy : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 62 A.4 The Backpropagation algorithm : : : : : : : : : : : : : : : : : : : : : : : : 63 A.5 Conjugate direction methods : : : : : : : : : : : : : : : : : : : : : : : : : : 63 A.5.1 Conjugate gradients : : : : : : : : : : : : : : : : : : : : : : : : : : 65 A.5.2 The CGL algorithm : : : : : : : : : : : : : : : : : : : : : : : : : : : 67 A.5.3 The BFGS algorithm : : : : : : : : : : : : : : : : : : : : : : : : : : 67 A.6 The SCG algorithm : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 67 A.7 Test results : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 70 A.7.1 Comparison metric : : : : : : : : : : : : : : : : : : : : : : : :...
Deterministic Nonmonotone Strategies for Effective Training of Multilayer Perceptrons
"... In this paper, we present deterministic nonmonotone learning strategies for multilayer perceptrons (MLPs), i.e., deterministic training algorithms in which error function values are allowed to increase at some epochs. To this end, we argue that the current error function value must satisfy a nonmono ..."
Abstract

Cited by 6 (4 self)
 Add to MetaCart
In this paper, we present deterministic nonmonotone learning strategies for multilayer perceptrons (MLPs), i.e., deterministic training algorithms in which error function values are allowed to increase at some epochs. To this end, we argue that the current error function value must satisfy a nonmonotone criterion with respect to the maximum error function value of the previous epochs, and we propose a subprocedure to dynamically compute . The nonmonotone strategy can be incorporated in any batch training algorithm and provides fast, stable, and reliable learning. Experimental results in different classes of problems show that this approach improves the convergence speed and success percentage of firstorder training algorithms and alleviates the need for finetuning problemdepended heuristic parameters.
Globally Convergent Algorithms With Local Learning Rates
 IEEE Tr. Neural Networks
"... In this paper, a new generalized theoretical result is presented that underpins the development of globally convergent firstorder batch training algorithms which employ local learning rates. This result allows us to equip algorithms of this class with a strategy for adapting the overall direction o ..."
Abstract

Cited by 5 (3 self)
 Add to MetaCart
In this paper, a new generalized theoretical result is presented that underpins the development of globally convergent firstorder batch training algorithms which employ local learning rates. This result allows us to equip algorithms of this class with a strategy for adapting the overall direction of search to a descent one. In this way, a decrease of the batcherror measure at each training iteration is ensured, and convergence of the sequence of weight iterates to a local minimizer of the batch error function is obtained from remote initial weights. The effectiveness of the theoretical result is illustrated in three application examples by comparing two wellknown training algorithms with local learning rates to their globally convergent modifications.