Results 1 - 10
of
26
On The Problem Of Local Minima In Backpropagation
- IEEE Transactions on Pattern Analysis and Machine Intelligence
, 1992
"... Supervised Learning in Multi-Layered Neural Networks (MLNs) has been recently proposed through the well-known Backpropagation algorithm. This is a gradient method which can get stuck in local minima, as simple examples can show. In this paper, some conditions on the network architecture and the lear ..."
Abstract
-
Cited by 60 (16 self)
- Add to MetaCart
Supervised Learning in Multi-Layered Neural Networks (MLNs) has been recently proposed through the well-known Backpropagation algorithm. This is a gradient method which can get stuck in local minima, as simple examples can show. In this paper, some conditions on the network architecture and the learning environment are proposed which ensure the convergence of the Backpropagation algorithm. It is proven in particular that the convergence holds if the classes are linearly-separable. In this case, the experience gained in several experiments shows that MLNs exceed perceptrons in generalization to new examples. Index Terms- Multi-Layered Networks, learning environment, Backpropagation, pattern recognition, linearly-separable classes. I. Introduction Supervised learning in Multi-Layered Networks can be accomplished thanks to Backpropagation (BP ) ([19, 25, 31]). Its application to several different subjects [25], and, particularly, to pattern recognition ([3, 6, 8, 20, 27, 29]), has bee...
Fast Exact Multiplication by the Hessian
- Neural Computation
, 1994
"... Just storing the Hessian H (the matrix of second derivatives d^2 E/dw_i dw_j of the error E with respect to each pair of weights) of a large neural network is difficult. Since a common use of a large matrix like H is to compute its product with various vectors, we derive a technique that directly ca ..."
Abstract
-
Cited by 54 (3 self)
- Add to MetaCart
Just storing the Hessian H (the matrix of second derivatives d^2 E/dw_i dw_j of the error E with respect to each pair of weights) of a large neural network is difficult. Since a common use of a large matrix like H is to compute its product with various vectors, we derive a technique that directly calculates Hv, where v is an arbitrary vector. This allows H to be treated as a generalized sparse matrix. To calculate Hv, we first define a differential operator R{f(w)} = (d/dr)f(w + rv)|_{r=0}, note that R{grad_w} = Hv and R{w} = v, and then apply R{} to the equations used to compute grad_w. The result is an exact and numerically stable procedure for computing Hv, which takes about as much computation, and is about as local, as a gradient evaluation. We then apply the technique to backpropagation networks, recurrent backpropagation, and stochastic Boltzmann Machines. Finally, we show that this technique can be used at the heart of many iterative techniques for computing various properties of H, obviating the need for direct methods.
Fast Training Algorithms For Multi-Layer Neural Nets
, 1993
"... Training a multilayer neural net by back-propagation is slow and requires arbitrary choices regarding the number of hidden units and layers. This paper describes an algorithm which is much faster than back-propagation and for which it is not necessary to specify the number of hidden units in advance ..."
Abstract
-
Cited by 25 (0 self)
- Add to MetaCart
Training a multilayer neural net by back-propagation is slow and requires arbitrary choices regarding the number of hidden units and layers. This paper describes an algorithm which is much faster than back-propagation and for which it is not necessary to specify the number of hidden units in advance. The relationship with other fast pattern recognition algorithms, such as algorithms based on k-d trees, is mentioned. The algorithm has been implemented and tested on articial problems such as the parity problem and on real problems arising in speech recognition. Experimental results, including training times and recognition accuracy, are given. Generally, the algorithm achieves accuracy as good as or better than nets trained using back-propagation, and the training process is much faster than back-propagation. Accuracy is comparable to that for the \nearest neighbour" algorithm, which is slower and requires more storage space. Comments Only the Abstract is given here. The full paper ap...
Fast Curvature Matrix-Vector Products for Second-Order Gradient Descent
- Neural Computation
, 2002
"... We propose a generic method for iteratively approximating various second-order gradient steps -- Newton, Gauss-Newton, Levenberg-Marquardt, and natural gradient -- in linear time per iteration, using special curvature matrix-vector products that can be computed in O(n). Two recent acceleration techn ..."
Abstract
-
Cited by 25 (11 self)
- Add to MetaCart
We propose a generic method for iteratively approximating various second-order gradient steps -- Newton, Gauss-Newton, Levenberg-Marquardt, and natural gradient -- in linear time per iteration, using special curvature matrix-vector products that can be computed in O(n). Two recent acceleration techniques for online learning, matrix momentum and stochastic meta-descent (SMD), in fact implement this approach. Since both were originally derived by very different routes, this o ers fresh insight into their operation, resulting in further improvements to SMD.
Sofge, editors. Handbook of intelligent control
, 1992
"... This book is an outgrowth of discussions that got started in at least three workshops sponsored by the National Science Foundation (NSF):.A workshop on neurocontrol and aerospace applications held in October 1990, under joint sponsorship from McDonnell Douglas and the NSF programs in Dynamic Systems ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
This book is an outgrowth of discussions that got started in at least three workshops sponsored by the National Science Foundation (NSF):.A workshop on neurocontrol and aerospace applications held in October 1990, under joint sponsorship from McDonnell Douglas and the NSF programs in Dynamic Systems and Control and Neuroengineering.A workshop on intelligent control held in October 1990, under joint sponsorship from NSF and the Electric Power Research Institute, to scope out plans for a major new joint initiative in intelligent control involving a number of NSF programs.A workshop on neural networks in chemical processing, held at NSF in January-February 1991, sponsored by the NSF program in Chemical Reaction Processes The goal of this book is to provide an authoritative source for two kinds of information: (1) fundamental new designs, at the cutting edge of true intelligent control, as well as opportunities for future research to improve on these designs; (2) important real-world applications, including test problems that constitute a challenge to the entire control community. Included in this book are a series of realistic test problems, worked out through lengthy discussions between NASA, NetJroDyne, NSF, McDonnell Douglas, and Honeywell, which are more than just benchmarks for evaluating intelligent control designs. Anyone who contributes to solving these problems may well be playing a crucial role in making possible the future development of hypersonic vehicles and subsequently the
Large-Scale Nonlinear Constrained Optimization: A Current Survey
, 1994
"... . Much progress has been made in constrained nonlinear optimization in the past ten years, but most large-scale problems still represent a considerable obstacle. In this survey paper we will attempt to give an overview of the current approaches, including interior and exterior methods and algorithm ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
. Much progress has been made in constrained nonlinear optimization in the past ten years, but most large-scale problems still represent a considerable obstacle. In this survey paper we will attempt to give an overview of the current approaches, including interior and exterior methods and algorithms based upon trust regions and line searches. In addition, the importance of software, numerical linear algebra and testing will be addressed. We will try to explain why the difficulties arise, how attempts are being made to overcome them and some of the problems that still remain. Although there will be some emphasis on the LANCELOT and CUTE projects, the intention is to give a broad picture of the state-of-the-art. 1 IBM T.J. Watson Research Center, P.O.Box 218, Yorktown Heights, NY 10598, USA 2 Parallel Algorithms Team, CERFACS, 42 Ave. G. Coriolis, 31057 Toulouse Cedex, France 3 Central Computing Department, Rutherford Appleton Laboratory, Chilton, Oxfordshire, OX11 0QX, England ...
Classification-Based Objective Functions
- Machine Learning. In
, 2007
"... Abstract. Backpropagation, similar to most learning algorithms that can form complex decision surfaces, is prone to overfitting. This work presents classification-based objective functions, an intuitive approach to training artificial neural networks on classification problems. Classification-based ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
Abstract. Backpropagation, similar to most learning algorithms that can form complex decision surfaces, is prone to overfitting. This work presents classification-based objective functions, an intuitive approach to training artificial neural networks on classification problems. Classification-based learning attempts to guide the network directly to correct pattern classification rather than using an implicit search of common error minimization heuristics, such as sum-squared-error (SSE) and cross-entropy (CE). CB1 is presented here as a novel objective function for learning classification problems. It seeks to directly minimize classification error by backpropagating error only on misclassified patterns from culprit output nodes. CB1 discourages weight saturation and overfitting and achieves higher accuracy on classification problems than optimizing SSE or CE. Experiments on a large OCR data set have shown CB1 to significantly increase generalization accuracy over SSE or CE optimization, from 97.86 % and 98.10%, respectively, to 99.11%. Comparable results are achieved over several data sets from the UC Irvine Machine Learning Database Repository, with an average increase in accuracy from 90.7 % and 91.3 % using optimized SSE and CE networks, respectively, to 92.1 % for CB1. Analysis indicates that CB1 performs a fundamentally different search of the feature space than optimizing SSE or CE and produces significantly different solutions.
A Neural Network Training Algorithm Utilizing Multiple Sets of Linear Equations
"... A fast algorithm is presented for the training of multilayer perceptron neural networks, which uses separate error functions for each hidden unit and solves multiple sets of linear equations. The algorithm builds upon two previously described techniques. In each training iteration, output weight opt ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
A fast algorithm is presented for the training of multilayer perceptron neural networks, which uses separate error functions for each hidden unit and solves multiple sets of linear equations. The algorithm builds upon two previously described techniques. In each training iteration, output weight optimization (OWO) solves linear equations to optimize output weights, which are those connecting to output layer net functions. The method of hidden weight optimization (HWO) develops desired hidden unit net signals from delta functions. The resulting hidden unit error functions are minimized with respect to hidden weights, which are those feeding into hidden unit net functions. An algorithm is described for calculating the learning factor for hidden weights. We show that the combined technique, OWO-HWO is superior in terms of convergence to standard OWO-BP (output weight optimization-backpropagation) which uses OWO to update output weights and backpropagation to update hidden weights. We also...
A Survey of Current Techniques for Reinforcement Learning
- Report LiTH-ISY-I-1391, Computer Vision Laboratory, S--581 83 Linkoping
, 1992
"... This survey considers response generating systems that improve their behaviour using reinforcement learning. The difference between unsupervised learning, supervised learning, and reinforcement learning is described. Two general problems concerning learning systems are presented; the credit assignme ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
This survey considers response generating systems that improve their behaviour using reinforcement learning. The difference between unsupervised learning, supervised learning, and reinforcement learning is described. Two general problems concerning learning systems are presented; the credit assignment problem and the problem of perceptual aliasing. Notations and some general issues concerning reinforcement learning systems are presented. Reinforcement learning systems are further divided into two main classes; memory mapping and projective mapping systems. Each of these classes is described and some examples are presented. Some other approaches are mentioned that do not fit into the two main classes. Finally some issues not covered by the surveyed articles are discussed, and some comments on the subject are made. Contents 1 Introduction 2 1.1 The Credit-Assignment Problem : : : : : : : : : : : : : : : : : 4 1.1.1 Temporal Difference Methods : : : : : : : : : : : : : : 5 1.1.2 Dynami...
Ersoy: A Parallel Implementation of Backpropagation Neural Network on MassPar
- MP-1, Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA’95
, 1995
"... In this paper, we explore the parallel implementation of the backpropagation algorithm with and without hidden layers on MasPar MP-1. This implementation is based on a SIMD architecture, and uses a backpropagation model. Our implementation uses weight batching versus on-line updating of the weights ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
In this paper, we explore the parallel implementation of the backpropagation algorithm with and without hidden layers on MasPar MP-1. This implementation is based on a SIMD architecture, and uses a backpropagation model. Our implementation uses weight batching versus on-line updating of the weights which is used by most serial and parallel implementations of backpropagation. This method results in a smoother convergence to a solution which is comparable to that of the popular method. Versus various systolic array implementations of the backpropagation algorithm which are data driven, and exploit pipelined parallelism, we have developed a true SIMD algorithm which is control driven and exploits two types of parallelism inherent in backpropagation feedforward, layered neural networks, namely architectural parallelism and data parallelism. Most importantly, the processing time is reduced both theoretically and experimentally by the order of 3000 for a network with 7-100-10 input-hidden-output neurons and 1188 training. With this algorithm we have achieved speeds of up to 70 Million Connections per Second (MCS) as throughput of a network stage and over 180 Million Connection UPdates per Second (MCUPS) for training the above network. This is the fastest performance of the standard backpropagation algorithm reported to date.

