Results 1 
8 of
8
Connectionist Learning Procedures
 ARTIFICIAL INTELLIGENCE
, 1989
"... A major goal of research on networks of neuronlike processing units is to discover efficient learning procedures that allow these networks to construct complex internal representations of their environment. The learning procedures must be capable of modifying the connection strengths in such a way ..."
Abstract

Cited by 339 (6 self)
 Add to MetaCart
A major goal of research on networks of neuronlike processing units is to discover efficient learning procedures that allow these networks to construct complex internal representations of their environment. The learning procedures must be capable of modifying the connection strengths in such a way that internal units which are not part of the input or output come to represent important features of the task domain. Several interesting gradientdescent procedures have recently been discovered. Each connection computes the derivative, with respect to the connection strength, of a global measure of the error in the performance of the network. The strength is then adjusted in the direction that decreases the error. These relatively simple, gradientdescent learning procedures work well for small tasks and the new challenge is to find ways of improving their convergence rate and their generalization abilities so that they can be applied to larger, more realistic tasks.
A scaled conjugate gradient algorithm for fast supervised learning
 NEURAL NETWORKS
, 1993
"... A supervised learning algorithm (Scaled Conjugate Gradient, SCG) with superlinear convergence rate is introduced. The algorithm is based upon a class of optimization techniques well known in numerical analysis as the Conjugate Gradient Methods. SCG uses second order information from the neural netwo ..."
Abstract

Cited by 300 (0 self)
 Add to MetaCart
A supervised learning algorithm (Scaled Conjugate Gradient, SCG) with superlinear convergence rate is introduced. The algorithm is based upon a class of optimization techniques well known in numerical analysis as the Conjugate Gradient Methods. SCG uses second order information from the neural network but requires only O(N) memory usage, where N is the number of weights in the network. The performance of SCG is benchmarked against the performance of the standard backpropagation algorithm (BP) [13], the conjugate gradient backpropagation (CGB) [6] and the onestep BroydenFletcherGoldfarbShanno memoryless quasiNewton algorithm (BFGS) [1]. SCG yields a speedup of at least an order of magnitude relative to BP. The speedup depends on the convergence criterion, i.e., the bigger demand for reduction in error the bigger the speedup. SCG is fully automated including no user dependent parameters and avoids a time consuming linesearch, which CGB and BFGS uses in each iteration in order to determine an appropriate step size.
Incorporating problem dependent structural information in the architecture of a neural network often lowers the overall complexity. The smaller the complexity of the neural network relative to the problem domain, the bigger the possibility that the weight space contains long ravines characterized by sharp curvature. While BP is inefficient on these ravine phenomena, it is shown that SCG handles them effectively.
Successes And Failures Of Backpropagation: A Theoretical Investigation
 Progress in Neural Networks. Ablex Publishing
"... Introduction Backpropagation is probably the most widely applied neural network learning algorithm. Backprop's popularity is related to its ability to deal with complex multidimensional mappings. In the words of Werbos [56] the algorithm goes \beyond regression". Backprop 's theory is related to m ..."
Abstract

Cited by 13 (3 self)
 Add to MetaCart
Introduction Backpropagation is probably the most widely applied neural network learning algorithm. Backprop's popularity is related to its ability to deal with complex multidimensional mappings. In the words of Werbos [56] the algorithm goes \beyond regression". Backprop 's theory is related to many disciplines and has been developed by several dierent research groups. As pointed out by le Cun [38], to some extent, the basic elements of the theory can be traced back to the famous book of Bryson and Ho[9]. A more explicit statement of the algorithm has been proposed by Werbos [56], Parker [43], le Cun [36], and members of the PDP group [44]. Although many researchers have contributed in dierent ways in the development and proposition of dierent aspects of Backprop, there is no question that Rumelhart and the PDP group have the credit for the current high diusion of the algorithm. As Widrow points out in [57], what is actually new with Backprop is the adoption of \squashing
Efficient Training of FeedForward Neural Networks
, 1997
"... : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 61 A.2 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 61 A.2.1 Motivation : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 61 A.3 Optimization strategy : : : : : : : : : : : : ..."
Abstract

Cited by 12 (0 self)
 Add to MetaCart
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 61 A.2 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 61 A.2.1 Motivation : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 61 A.3 Optimization strategy : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 62 A.4 The Backpropagation algorithm : : : : : : : : : : : : : : : : : : : : : : : : 63 A.5 Conjugate direction methods : : : : : : : : : : : : : : : : : : : : : : : : : : 63 A.5.1 Conjugate gradients : : : : : : : : : : : : : : : : : : : : : : : : : : 65 A.5.2 The CGL algorithm : : : : : : : : : : : : : : : : : : : : : : : : : : : 67 A.5.3 The BFGS algorithm : : : : : : : : : : : : : : : : : : : : : : : : : : 67 A.6 The SCG algorithm : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 67 A.7 Test results : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 70 A.7.1 Comparison metric : : : : : : : : : : : : : : : : : : : : : : : :...
Generalization by Neural Networks
 IEEE Trans. on Knowledge and Data Eng
, 1992
"... Neural networks have traditionally been applied to recognition problems, and most learning algorithms are tailored to those problems. We discuss the requirements of learning for generalization, where the traditional methods based on gradient descent have limited success. We present a new stochast ..."
Abstract

Cited by 11 (2 self)
 Add to MetaCart
Neural networks have traditionally been applied to recognition problems, and most learning algorithms are tailored to those problems. We discuss the requirements of learning for generalization, where the traditional methods based on gradient descent have limited success. We present a new stochastic learning algorithm based on simulated annealing in weight space. We verify the convergence properties and feasibility of the algorithm. We also describe an implementation of the algorithm and validation experiments. 1. Introduction Neural networks are being applied to a wide variety of applications from speech generation[1], to handwriting recognition[2]. Last decade has seen great advances in design of neural networks for a class of problems called recognition problems, and in design of learning algorithms[35, 57]. The learning of weights for neural network for many recognition problem is no longer a difficult task. However, designing a neural network for generalization problem is ...
Semidynamic Point Sets for PolynomialTime Learning
"... Many problems currently addressed by using the socalled "Artificial Neural Network " systems can be viewed as "learning" problems in which the task is to perform a correct classification of a set of examples and induce a general rule from them. These problems often use the "learningbyexample para ..."
Abstract
 Add to MetaCart
Many problems currently addressed by using the socalled "Artificial Neural Network " systems can be viewed as "learning" problems in which the task is to perform a correct classification of a set of examples and induce a general rule from them. These problems often use the "learningbyexample paradigm" so they deal with databases in which information is coded as points in a multidimensional space.
ORIGINAL CONTRIBUTION A Scaled Conjugate Gradient Algorithm for Fast Supervised Learning
, 1991
"... AbstractA supervised learning algorithm (Scaled Conjugate Gradient, SCG) is introduced TIw pelformance of SCG is benchmarked against that of the standard back propagation algorithm (BP) ( Rumelhart. Hinton. & 14"illiams. 1986), the conjugate gradient algorithm with line search ( CGL) ( Johansson ..."
Abstract
 Add to MetaCart
AbstractA supervised learning algorithm (Scaled Conjugate Gradient, SCG) is introduced TIw pelformance of SCG is benchmarked against that of the standard back propagation algorithm (BP) ( Rumelhart. Hinton. & 14"illiams. 1986), the conjugate gradient algorithm with line search ( CGL) ( Johansson, Dowla. & Goodman, 1990) and the onestep BroydenFletcherGold./arbShanno memoriless quasiNewton algorithm ( BFGS) ( Battiti, 1990). SCG is lhllyautomated, inJudes no critical userdependent parametepw, and avoids a time consuming line search, which CGL and BFGS use in each iteration in order to determine an appropriate step size. E.¥periments show that SCG is considerablyJhster than BP, CGL, and BFGS. KeywordsFeedforward neural network, Supervised learning, Optimization, Conjugate gradient algorithms. 1.I. Motivation
ARTIFICIAL INTELLIGENCE 357 A Parallel Network that Learns to Play Backgammon
"... A class of connectionist networks is described that has learned to play backgammon at an intermediatetoadvanced level. The networks were trained by backpropagation learning on a large set of sample positions evaluated by a human expert. In actual match play against humans and conventional compute ..."
Abstract
 Add to MetaCart
A class of connectionist networks is described that has learned to play backgammon at an intermediatetoadvanced level. The networks were trained by backpropagation learning on a large set of sample positions evaluated by a human expert. In actual match play against humans and conventional computer programs, the networks have demonstrated substantial ability to generalize on the basis of expert knowledge of the game. This is possibly the most complex domain yet studied with connectionist learning. New techniques were needed to overcome problems due to the scale and complexity of the task. These include techniques for intelligent design of training set examples and efficient coding schemes, and procedures for escaping from local minima. We suggest how these techniques might be used in applications of network learning to general largescale, difficult "realworld " problem domains. I.