## An empirical study of learning speed in back-propagation networks (1988)

### Cached

### Download Links

Citations: | 241 - 0 self |

### BibTeX

@TECHREPORT{Fahlman88anempirical,

author = {Scott E. Fahlman},

title = {An empirical study of learning speed in back-propagation networks},

institution = {},

year = {1988}

}

### Years of Citing Articles

### OpenURL

### Abstract

Most connectionist or "neural network" learning systems use some form of the back-propagation algorithm. However, back-propagation learning is too slow for many applications, and it scales up poorly as tasks become larger and more complex. The factors governing learning speed are poorly understood. I have begun a systematic, empirical study of learning speed in backprop-like algorithms, measured against a variety of benchmark problems. The goal is twofold: to develop faster learning algorithms and to contribute to the development of a methodology that will be of value in future studies of this kind. This paper is a progress report describing the results obtained during the first six months of this study. To date I have looked only at a limited set of benchmark problems, but the results on these are encouraging: I have developed a new learning algorithm that is faster than standard backprop by an order of magnitude or more and that appears to scale up very well as the problem size increases.

### Citations

3058 |
Learning Internal Representations by Error Propagation
- Rumelhart, Hinton, et al.
- 1986
(Show Context)
Citation Context ...erview of this area and [10], chapters 1 - 8, for a detailed treatment. When I refer to "standard back-propagation" in this paper, I mean the back-propagation algorithm with momentum, as des=-=cribed in [9]-=-. The greatest single obstacle to the widespread use of connectionist learning networks in real-world applications is the slow speed at which the current algorithms learn. At present, the fastest lear... |

575 |
Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences
- Werbos
- 1974
(Show Context)
Citation Context ...ions is the slow speed at which the current algorithms learn. At present, the fastest learning algorithm for most purposes is the algorithm that is generally known as "back-propagation" or &=-=quot;backprop" [6, 7, 9, 18]-=-. The back-propagation learning algorithm runs faster than earlier learning methods, but it is still much slower than we would like. Even on relatively simple problems, standard back-propagation often... |

366 |
Increased Rates of Convergence Through Learning Rate Adaptation
- Jacobs
- 1988
(Show Context)
Citation Context ...her trials may take an anomalously long time; mixing these long trials into an average may give a distorted picture of the data. How should such results be reported? One option, used by Robert Jacobs =-=[5]-=-, is simply to report the failures in one column and the average of the successful trials in another. The problem with this is that it becomes hard to choose between a learning method with fewer failu... |

85 | Experiments on learning by back-propagation
- Plaut, Nowlan, et al.
- 1986
(Show Context)
Citation Context ...ese trials. Changing r by a large amount, either up or down, led to greatly increased learning times. A value of 1.0 or 2.0 seemed as good as any other and better than most. Plaut, Nowlan, and Hinton =-=[12]-=- present some analysis suggesting that it may be beneficial to use different values of e for different weights in the network. Specifically, they suggest that the e value used in tuning each weight sh... |

60 |
Learning Algorithms for Connectionist Networks: Applied Gradient Methods of Nonlinear Optimization
- Watrous
- 1987
(Show Context)
Citation Context ...or some more sophisticated optimization technique. Unfortunately, it requires a very costly global computation to derive the true second derivative, so some approximation is used. Parker [8], Watrous =-=[17]-=-, and Becker and LeCun [1] have all been active in this area. Watrous has implemented two such algorithms and tried them on the XOR problem. He claims some improvement over back-propagation, but it do... |

53 |
Une procedure dâ€™apprentissage pour reseau a seuil assymetrique
- LeCun
- 1985
(Show Context)
Citation Context ...ions is the slow speed at which the current algorithms learn. At present, the fastest learning algorithm for most purposes is the algorithm that is generally known as "back-propagation" or &=-=quot;backprop" [6, 7, 9, 18]-=-. The back-propagation learning algorithm runs faster than earlier learning methods, but it is still much slower than we would like. Even on relatively simple problems, standard back-propagation often... |

51 |
Scaling relationships in back-propagation learning: dependence on training set size, Complex Sys tems 1
- Tesauro
- 1987
(Show Context)
Citation Context ...nit, and the problem is to train the weights in this network so that the output unit will turn on if one or the other of the inputs is on, but not both. Some researchers, notably Tesauro and Janssens =-=[16]-=-, generalize this to the N-input parity problem: the output is to be on if an odd number of inputs are on. The XOR/parity problem looms large in the history and theory of connectionist models (see [11... |

37 |
Neural network simulation at warp speed: How we got 17 million connections per second
- Pomerleau, Gusciora, et al.
- 1988
(Show Context)
Citation Context ...r computers or to implement the network elements directly in VLSI chips. A number of groups are working on faster implementations, including a group at CMU that is using the 10-processor Warp machine =-=[13]-=-. This work is important, but even if we had a network implemented directly in hardware our slow learning algorithms would still limit the range of problems we could attack. Advances in learning algor... |

36 |
Optimal algorithm for adaptive networks: Second order back propagation, second order direct propagation, and second order Hebbian learning
- Parker
- 1987
(Show Context)
Citation Context ...ton's method or some more sophisticated optimization technique. Unfortunately, it requires a very costly global computation to derive the true second derivative, so some approximation is used. Parker =-=[8]-=-, Watrous [17], and Becker and LeCun [1] have all been active in this area. Watrous has implemented two such algorithms and tried them on the XOR problem. He claims some improvement over back-propagat... |

18 |
Speech recognition with backpropagation
- Franzini
- 1987
(Show Context)
Citation Context ...undoff error and the potential for truncating very small values, such units may never recover. What can we do about this? One possibility, suggested by James McClelland and tested by Michael Franzini =-=[4]-=-, is to use an error measure that goes to infinity as the sigmoid-prime function goes to zero. This is mathematically elegant, and it seemed to work fairly well, but it is hard to implement. I chose t... |

17 |
Successfully using peak learning rates of 10 (and greater) in back-propagation networks with the heuristic learning algorithm
- Cater
- 1987
(Show Context)
Citation Context ...easing it otherwise. Jacobs [5] has conducted an empirical study comparing standard backprop with momentum to a rule that dynamically adjusts a separate learning-rate parameter for each weight. Cater =-=[2]-=- uses a more complex heuristic for adjusting the learning rate. All of these methods improve the overall learning speed to some degree. The other kind of approach makes explicit use of the second deri... |

13 |
Connectionist architectures for artificial intelligence
- Fahlman, Hinton
- 1987
(Show Context)
Citation Context ...xpressed or implied, of these agencies or of the U.S. Government. 1 1. Introduction Note: In this paper I will not attempt to review the basic ideas of connectionism or back-propagation learning. See =-=[3] for a bri-=-ef overview of this area and [10], chapters 1 - 8, for a detailed treatment. When I refer to "standard back-propagation" in this paper, I mean the back-propagation algorithm with momentum, a... |

13 |
An Improved Three-layer Back Propagation Algorithm
- Stornetta, Huberman
- 1987
(Show Context)
Citation Context ...ero, ranging from-0.5 to +0.5 rather than from 0.0 to 1.0. The input and output patterns, and the thresholds used to detect successful learning are shifted down by 0.5 as well. Stornetta and Huberman =-=[15]-=- advocate the use of such symmetric units. The empirical results they report are sketchy, but they claim speedups ranging from 10% to 50%, depending on the problem. It occurred to me that, with symmet... |

7 | Accelerated Learning in back-Propagation Nets
- Schmidhuber
- 1988
(Show Context)
Citation Context ...te, and it is unclear how well it scales up. I suspect that an analysis of Plait's real-time adjustments would show that he is doing something very similar to what quickprop does. Juergen Schmidhuber =-=[14]-=- has investigated this same class of problems up to 64-6-64 using two methods: first, he used standard backprop, but he adjusted the weights after every presentation of a training example rather than ... |

6 |
The Feasibility of Applying Numerical Optimization Techniques to Back-Propagation
- Becker, LeCun
- 1988
(Show Context)
Citation Context ...optimization technique. Unfortunately, it requires a very costly global computation to derive the true second derivative, so some approximation is used. Parker [8], Watrous [17], and Becker and LeCun =-=[1]-=- have all been active in this area. Watrous has implemented two such algorithms and tried them on the XOR problem. He claims some improvement over back-propagation, but it does not appear that his met... |