Results 1  10
of
33
On The Problem Of Local Minima In Backpropagation
 IEEE Transactions on Pattern Analysis and Machine Intelligence
, 1992
"... Supervised Learning in MultiLayered Neural Networks (MLNs) has been recently proposed through the wellknown Backpropagation algorithm. This is a gradient method which can get stuck in local minima, as simple examples can show. In this paper, some conditions on the network architecture and the lear ..."
Abstract

Cited by 72 (17 self)
 Add to MetaCart
Supervised Learning in MultiLayered Neural Networks (MLNs) has been recently proposed through the wellknown Backpropagation algorithm. This is a gradient method which can get stuck in local minima, as simple examples can show. In this paper, some conditions on the network architecture and the learning environment are proposed which ensure the convergence of the Backpropagation algorithm. It is proven in particular that the convergence holds if the classes are linearlyseparable. In this case, the experience gained in several experiments shows that MLNs exceed perceptrons in generalization to new examples. Index Terms MultiLayered Networks, learning environment, Backpropagation, pattern recognition, linearlyseparable classes. I. Introduction Supervised learning in MultiLayered Networks can be accomplished thanks to Backpropagation (BP ) ([19, 25, 31]). Its application to several different subjects [25], and, particularly, to pattern recognition ([3, 6, 8, 20, 27, 29]), has bee...
Fast Exact Multiplication by the Hessian
 Neural Computation
, 1994
"... Just storing the Hessian H (the matrix of second derivatives d^2 E/dw_i dw_j of the error E with respect to each pair of weights) of a large neural network is difficult. Since a common use of a large matrix like H is to compute its product with various vectors, we derive a technique that directly ca ..."
Abstract

Cited by 70 (4 self)
 Add to MetaCart
Just storing the Hessian H (the matrix of second derivatives d^2 E/dw_i dw_j of the error E with respect to each pair of weights) of a large neural network is difficult. Since a common use of a large matrix like H is to compute its product with various vectors, we derive a technique that directly calculates Hv, where v is an arbitrary vector. This allows H to be treated as a generalized sparse matrix. To calculate Hv, we first define a differential operator R{f(w)} = (d/dr)f(w + rv)_{r=0}, note that R{grad_w} = Hv and R{w} = v, and then apply R{} to the equations used to compute grad_w. The result is an exact and numerically stable procedure for computing Hv, which takes about as much computation, and is about as local, as a gradient evaluation. We then apply the technique to backpropagation networks, recurrent backpropagation, and stochastic Boltzmann Machines. Finally, we show that this technique can be used at the heart of many iterative techniques for computing various properties of H, obviating the need for direct methods.
Fast Curvature MatrixVector Products for SecondOrder Gradient Descent
 Neural Computation
, 2002
"... We propose a generic method for iteratively approximating various secondorder gradient steps  Newton, GaussNewton, LevenbergMarquardt, and natural gradient  in linear time per iteration, using special curvature matrixvector products that can be computed in O(n). Two recent acceleration techn ..."
Abstract

Cited by 38 (14 self)
 Add to MetaCart
We propose a generic method for iteratively approximating various secondorder gradient steps  Newton, GaussNewton, LevenbergMarquardt, and natural gradient  in linear time per iteration, using special curvature matrixvector products that can be computed in O(n). Two recent acceleration techniques for online learning, matrix momentum and stochastic metadescent (SMD), in fact implement this approach. Since both were originally derived by very different routes, this o ers fresh insight into their operation, resulting in further improvements to SMD.
Fast Training Algorithms For MultiLayer Neural Nets
, 1993
"... Training a multilayer neural net by backpropagation is slow and requires arbitrary choices regarding the number of hidden units and layers. This paper describes an algorithm which is much faster than backpropagation and for which it is not necessary to specify the number of hidden units in advance ..."
Abstract

Cited by 29 (0 self)
 Add to MetaCart
Training a multilayer neural net by backpropagation is slow and requires arbitrary choices regarding the number of hidden units and layers. This paper describes an algorithm which is much faster than backpropagation and for which it is not necessary to specify the number of hidden units in advance. The relationship with other fast pattern recognition algorithms, such as algorithms based on kd trees, is mentioned. The algorithm has been implemented and tested on articial problems such as the parity problem and on real problems arising in speech recognition. Experimental results, including training times and recognition accuracy, are given. Generally, the algorithm achieves accuracy as good as or better than nets trained using backpropagation, and the training process is much faster than backpropagation. Accuracy is comparable to that for the \nearest neighbour" algorithm, which is slower and requires more storage space. Comments Only the Abstract is given here. The full paper ap...
Sofge, editors. Handbook of intelligent control
, 1992
"... This book is an outgrowth of discussions that got started in at least three workshops sponsored by the National Science Foundation (NSF):.A workshop on neurocontrol and aerospace applications held in October 1990, under joint sponsorship from McDonnell Douglas and the NSF programs in Dynamic Systems ..."
Abstract

Cited by 18 (0 self)
 Add to MetaCart
This book is an outgrowth of discussions that got started in at least three workshops sponsored by the National Science Foundation (NSF):.A workshop on neurocontrol and aerospace applications held in October 1990, under joint sponsorship from McDonnell Douglas and the NSF programs in Dynamic Systems and Control and Neuroengineering.A workshop on intelligent control held in October 1990, under joint sponsorship from NSF and the Electric Power Research Institute, to scope out plans for a major new joint initiative in intelligent control involving a number of NSF programs.A workshop on neural networks in chemical processing, held at NSF in JanuaryFebruary 1991, sponsored by the NSF program in Chemical Reaction Processes The goal of this book is to provide an authoritative source for two kinds of information: (1) fundamental new designs, at the cutting edge of true intelligent control, as well as opportunities for future research to improve on these designs; (2) important realworld applications, including test problems that constitute a challenge to the entire control community. Included in this book are a series of realistic test problems, worked out through lengthy discussions between NASA, NetJroDyne, NSF, McDonnell Douglas, and Honeywell, which are more than just benchmarks for evaluating intelligent control designs. Anyone who contributes to solving these problems may well be playing a crucial role in making possible the future development of hypersonic vehicles and subsequently the
LargeScale Nonlinear Constrained Optimization: A Current Survey
, 1994
"... . Much progress has been made in constrained nonlinear optimization in the past ten years, but most largescale problems still represent a considerable obstacle. In this survey paper we will attempt to give an overview of the current approaches, including interior and exterior methods and algorithm ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
. Much progress has been made in constrained nonlinear optimization in the past ten years, but most largescale problems still represent a considerable obstacle. In this survey paper we will attempt to give an overview of the current approaches, including interior and exterior methods and algorithms based upon trust regions and line searches. In addition, the importance of software, numerical linear algebra and testing will be addressed. We will try to explain why the difficulties arise, how attempts are being made to overcome them and some of the problems that still remain. Although there will be some emphasis on the LANCELOT and CUTE projects, the intention is to give a broad picture of the stateoftheart. 1 IBM T.J. Watson Research Center, P.O.Box 218, Yorktown Heights, NY 10598, USA 2 Parallel Algorithms Team, CERFACS, 42 Ave. G. Coriolis, 31057 Toulouse Cedex, France 3 Central Computing Department, Rutherford Appleton Laboratory, Chilton, Oxfordshire, OX11 0QX, England ...
A Neural Network Training Algorithm Utilizing Multiple Sets of Linear Equations
"... A fast algorithm is presented for the training of multilayer perceptron neural networks, which uses separate error functions for each hidden unit and solves multiple sets of linear equations. The algorithm builds upon two previously described techniques. In each training iteration, output weight opt ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
A fast algorithm is presented for the training of multilayer perceptron neural networks, which uses separate error functions for each hidden unit and solves multiple sets of linear equations. The algorithm builds upon two previously described techniques. In each training iteration, output weight optimization (OWO) solves linear equations to optimize output weights, which are those connecting to output layer net functions. The method of hidden weight optimization (HWO) develops desired hidden unit net signals from delta functions. The resulting hidden unit error functions are minimized with respect to hidden weights, which are those feeding into hidden unit net functions. An algorithm is described for calculating the learning factor for hidden weights. We show that the combined technique, OWOHWO is superior in terms of convergence to standard OWOBP (output weight optimizationbackpropagation) which uses OWO to update output weights and backpropagation to update hidden weights. We also...
Implementation and Comparison of Growing Neural Gas, Growing Cell Structures and Fuzzy Artmap
, 1997
"... ..."
Model Selection: Beyond the Bayesian/Frequentist Divide
"... The principle of parsimony also known as “Ockham’s razor ” has inspired many theories of model selection. Yet such theories, all making arguments in favor of parsimony, are based on very different premises and have developed distinct methodologies to derive algorithms. We have organized challenges a ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
The principle of parsimony also known as “Ockham’s razor ” has inspired many theories of model selection. Yet such theories, all making arguments in favor of parsimony, are based on very different premises and have developed distinct methodologies to derive algorithms. We have organized challenges and edited a special issue of JMLR and several conference proceedings around the theme of model selection. In this editorial, we revisit the problem of avoiding overfitting in light of the latest results. We note the remarkable convergence of theories as different as Bayesian theory, Minimum Description Length, bias/variance tradeoff, Structural Risk Minimization, and regularization, in some approaches. We also present new and interesting examples of the complementarity of theories leading to hybrid algorithms, neither frequentist, nor Bayesian, or perhaps both frequentist and Bayesian!
ClassificationBased Objective Functions
 Machine Learning. In
, 2007
"... Abstract. Backpropagation, similar to most learning algorithms that can form complex decision surfaces, is prone to overfitting. This work presents classificationbased objective functions, an intuitive approach to training artificial neural networks on classification problems. Classificationbased ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
Abstract. Backpropagation, similar to most learning algorithms that can form complex decision surfaces, is prone to overfitting. This work presents classificationbased objective functions, an intuitive approach to training artificial neural networks on classification problems. Classificationbased learning attempts to guide the network directly to correct pattern classification rather than using an implicit search of common error minimization heuristics, such as sumsquarederror (SSE) and crossentropy (CE). CB1 is presented here as a novel objective function for learning classification problems. It seeks to directly minimize classification error by backpropagating error only on misclassified patterns from culprit output nodes. CB1 discourages weight saturation and overfitting and achieves higher accuracy on classification problems than optimizing SSE or CE. Experiments on a large OCR data set have shown CB1 to significantly increase generalization accuracy over SSE or CE optimization, from 97.86 % and 98.10%, respectively, to 99.11%. Comparable results are achieved over several data sets from the UC Irvine Machine Learning Database Repository, with an average increase in accuracy from 90.7 % and 91.3 % using optimized SSE and CE networks, respectively, to 92.1 % for CB1. Analysis indicates that CB1 performs a fundamentally different search of the feature space than optimizing SSE or CE and produces significantly different solutions.