Results 1  10
of
47
On The Problem Of Local Minima In Backpropagation
 IEEE Transactions on Pattern Analysis and Machine Intelligence
, 1992
"... Supervised Learning in MultiLayered Neural Networks (MLNs) has been recently proposed through the wellknown Backpropagation algorithm. This is a gradient method which can get stuck in local minima, as simple examples can show. In this paper, some conditions on the network architecture and the lear ..."
Abstract

Cited by 82 (17 self)
 Add to MetaCart
(Show Context)
Supervised Learning in MultiLayered Neural Networks (MLNs) has been recently proposed through the wellknown Backpropagation algorithm. This is a gradient method which can get stuck in local minima, as simple examples can show. In this paper, some conditions on the network architecture and the learning environment are proposed which ensure the convergence of the Backpropagation algorithm. It is proven in particular that the convergence holds if the classes are linearlyseparable. In this case, the experience gained in several experiments shows that MLNs exceed perceptrons in generalization to new examples. Index Terms MultiLayered Networks, learning environment, Backpropagation, pattern recognition, linearlyseparable classes. I. Introduction Supervised learning in MultiLayered Networks can be accomplished thanks to Backpropagation (BP ) ([19, 25, 31]). Its application to several different subjects [25], and, particularly, to pattern recognition ([3, 6, 8, 20, 27, 29]), has bee...
Fast Exact Multiplication by the Hessian
 Neural Computation
, 1994
"... Just storing the Hessian H (the matrix of second derivatives d^2 E/dw_i dw_j of the error E with respect to each pair of weights) of a large neural network is difficult. Since a common use of a large matrix like H is to compute its product with various vectors, we derive a technique that directly ca ..."
Abstract

Cited by 79 (4 self)
 Add to MetaCart
(Show Context)
Just storing the Hessian H (the matrix of second derivatives d^2 E/dw_i dw_j of the error E with respect to each pair of weights) of a large neural network is difficult. Since a common use of a large matrix like H is to compute its product with various vectors, we derive a technique that directly calculates Hv, where v is an arbitrary vector. This allows H to be treated as a generalized sparse matrix. To calculate Hv, we first define a differential operator R{f(w)} = (d/dr)f(w + rv)_{r=0}, note that R{grad_w} = Hv and R{w} = v, and then apply R{} to the equations used to compute grad_w. The result is an exact and numerically stable procedure for computing Hv, which takes about as much computation, and is about as local, as a gradient evaluation. We then apply the technique to backpropagation networks, recurrent backpropagation, and stochastic Boltzmann Machines. Finally, we show that this technique can be used at the heart of many iterative techniques for computing various properties of H, obviating the need for direct methods.
Fast Curvature MatrixVector Products for SecondOrder Gradient Descent
 Neural Computation
, 2002
"... We propose a generic method for iteratively approximating various secondorder gradient steps  Newton, GaussNewton, LevenbergMarquardt, and natural gradient  in linear time per iteration, using special curvature matrixvector products that can be computed in O(n). Two recent acceleration techn ..."
Abstract

Cited by 48 (14 self)
 Add to MetaCart
We propose a generic method for iteratively approximating various secondorder gradient steps  Newton, GaussNewton, LevenbergMarquardt, and natural gradient  in linear time per iteration, using special curvature matrixvector products that can be computed in O(n). Two recent acceleration techniques for online learning, matrix momentum and stochastic metadescent (SMD), in fact implement this approach. Since both were originally derived by very different routes, this o ers fresh insight into their operation, resulting in further improvements to SMD.
Control of a Nonholonomic Mobile Robot Using Neural Networks
 IEEE Transactions on Neural Networks
, 1998
"... Abstract — A control structure that makes possible the integration of a kinematic controller and a neural network (NN) computedtorque controller for nonholonomic mobile robots is presented. A combined kinematic/torque control law is developed using backstepping and stability is guaranteed by Lyapun ..."
Abstract

Cited by 48 (0 self)
 Add to MetaCart
(Show Context)
Abstract — A control structure that makes possible the integration of a kinematic controller and a neural network (NN) computedtorque controller for nonholonomic mobile robots is presented. A combined kinematic/torque control law is developed using backstepping and stability is guaranteed by Lyapunov theory. This control algorithm can be applied to the three basic nonholonomic navigation problems: tracking a reference trajectory, path following, and stabilization about a desired posture. Moreover, the NN controller proposed in this work can deal with unmodeled bounded disturbances and/or unstructured unmodeled dynamics in the vehicle. Online NN weight tuning algorithms do no require offline learning yet guarantee small tracking errors and bounded control signals are utilized. Index Terms — Backstepping control, Lyapunov stability, mobile robots, neural networks, nonholonomic systems.
Fast Training Algorithms For MultiLayer Neural Nets
, 1993
"... Training a multilayer neural net by backpropagation is slow and requires arbitrary choices regarding the number of hidden units and layers. This paper describes an algorithm which is much faster than backpropagation and for which it is not necessary to specify the number of hidden units in advance ..."
Abstract

Cited by 32 (0 self)
 Add to MetaCart
Training a multilayer neural net by backpropagation is slow and requires arbitrary choices regarding the number of hidden units and layers. This paper describes an algorithm which is much faster than backpropagation and for which it is not necessary to specify the number of hidden units in advance. The relationship with other fast pattern recognition algorithms, such as algorithms based on kd trees, is mentioned. The algorithm has been implemented and tested on articial problems such as the parity problem and on real problems arising in speech recognition. Experimental results, including training times and recognition accuracy, are given. Generally, the algorithm achieves accuracy as good as or better than nets trained using backpropagation, and the training process is much faster than backpropagation. Accuracy is comparable to that for the \nearest neighbour" algorithm, which is slower and requires more storage space. Comments Only the Abstract is given here. The full paper ap...
Sofge, editors. Handbook of intelligent control
, 1992
"... This book is an outgrowth of discussions that got started in at least three workshops sponsored by the National Science Foundation (NSF):.A workshop on neurocontrol and aerospace applications held in October 1990, under joint sponsorship from McDonnell Douglas and the NSF programs in Dynamic Systems ..."
Abstract

Cited by 18 (0 self)
 Add to MetaCart
(Show Context)
This book is an outgrowth of discussions that got started in at least three workshops sponsored by the National Science Foundation (NSF):.A workshop on neurocontrol and aerospace applications held in October 1990, under joint sponsorship from McDonnell Douglas and the NSF programs in Dynamic Systems and Control and Neuroengineering.A workshop on intelligent control held in October 1990, under joint sponsorship from NSF and the Electric Power Research Institute, to scope out plans for a major new joint initiative in intelligent control involving a number of NSF programs.A workshop on neural networks in chemical processing, held at NSF in JanuaryFebruary 1991, sponsored by the NSF program in Chemical Reaction Processes The goal of this book is to provide an authoritative source for two kinds of information: (1) fundamental new designs, at the cutting edge of true intelligent control, as well as opportunities for future research to improve on these designs; (2) important realworld applications, including test problems that constitute a challenge to the entire control community. Included in this book are a series of realistic test problems, worked out through lengthy discussions between NASA, NetJroDyne, NSF, McDonnell Douglas, and Honeywell, which are more than just benchmarks for evaluating intelligent control designs. Anyone who contributes to solving these problems may well be playing a crucial role in making possible the future development of hypersonic vehicles and subsequently the
Model Selection: Beyond the Bayesian/Frequentist Divide
"... The principle of parsimony also known as “Ockham’s razor ” has inspired many theories of model selection. Yet such theories, all making arguments in favor of parsimony, are based on very different premises and have developed distinct methodologies to derive algorithms. We have organized challenges a ..."
Abstract

Cited by 12 (2 self)
 Add to MetaCart
(Show Context)
The principle of parsimony also known as “Ockham’s razor ” has inspired many theories of model selection. Yet such theories, all making arguments in favor of parsimony, are based on very different premises and have developed distinct methodologies to derive algorithms. We have organized challenges and edited a special issue of JMLR and several conference proceedings around the theme of model selection. In this editorial, we revisit the problem of avoiding overfitting in light of the latest results. We note the remarkable convergence of theories as different as Bayesian theory, Minimum Description Length, bias/variance tradeoff, Structural Risk Minimization, and regularization, in some approaches. We also present new and interesting examples of the complementarity of theories leading to hybrid algorithms, neither frequentist, nor Bayesian, or perhaps both frequentist and Bayesian!
A Neural Network Training Algorithm Utilizing Multiple Sets of Linear Equations
"... A fast algorithm is presented for the training of multilayer perceptron neural networks, which uses separate error functions for each hidden unit and solves multiple sets of linear equations. The algorithm builds upon two previously described techniques. In each training iteration, output weight opt ..."
Abstract

Cited by 12 (5 self)
 Add to MetaCart
A fast algorithm is presented for the training of multilayer perceptron neural networks, which uses separate error functions for each hidden unit and solves multiple sets of linear equations. The algorithm builds upon two previously described techniques. In each training iteration, output weight optimization (OWO) solves linear equations to optimize output weights, which are those connecting to output layer net functions. The method of hidden weight optimization (HWO) develops desired hidden unit net signals from delta functions. The resulting hidden unit error functions are minimized with respect to hidden weights, which are those feeding into hidden unit net functions. An algorithm is described for calculating the learning factor for hidden weights. We show that the combined technique, OWOHWO is superior in terms of convergence to standard OWOBP (output weight optimizationbackpropagation) which uses OWO to update output weights and backpropagation to update hidden weights. We also...
LargeScale Nonlinear Constrained Optimization: A Current Survey
, 1994
"... . Much progress has been made in constrained nonlinear optimization in the past ten years, but most largescale problems still represent a considerable obstacle. In this survey paper we will attempt to give an overview of the current approaches, including interior and exterior methods and algorithm ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
. Much progress has been made in constrained nonlinear optimization in the past ten years, but most largescale problems still represent a considerable obstacle. In this survey paper we will attempt to give an overview of the current approaches, including interior and exterior methods and algorithms based upon trust regions and line searches. In addition, the importance of software, numerical linear algebra and testing will be addressed. We will try to explain why the difficulties arise, how attempts are being made to overcome them and some of the problems that still remain. Although there will be some emphasis on the LANCELOT and CUTE projects, the intention is to give a broad picture of the stateoftheart. 1 IBM T.J. Watson Research Center, P.O.Box 218, Yorktown Heights, NY 10598, USA 2 Parallel Algorithms Team, CERFACS, 42 Ave. G. Coriolis, 31057 Toulouse Cedex, France 3 Central Computing Department, Rutherford Appleton Laboratory, Chilton, Oxfordshire, OX11 0QX, England ...
Efficient algorithm for training neural networks with one hidden layer
 In Proc. IJCNN
, 1999
"... Abstract Efficient second order algorithm for training feedforward neural networks is presented. The algorithm has a similar convergence rate as the LavenbergMarquardt (LM) method and it is less computationally intensive and requires less memory. This is especially important for large neural netw ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
(Show Context)
Abstract Efficient second order algorithm for training feedforward neural networks is presented. The algorithm has a similar convergence rate as the LavenbergMarquardt (LM) method and it is less computationally intensive and requires less memory. This is especially important for large neural networks where the LM algorithm becomes impractical. Algorithm was verified with several examples. 1.