## Empirical evaluation of the improved Rprop learning algorithms (2003)

### Cached

### Download Links

- [sci2s.ugr.es]
- [www.fizyka.umk.pl]
- [www.neuroinformatik.ruhr-uni-bochum.de]
- DBLP

### Other Repositories/Bibliography

Citations: | 59 - 17 self |

### BibTeX

@MISC{Igel03empiricalevaluation,

author = {Christian Igel and Michael Hüsken},

title = {Empirical evaluation of the improved Rprop learning algorithms},

year = {2003}

}

### OpenURL

### Abstract

### Citations

838 |
Deterministic nonperiodic flow
- Lorenz
- 1963
(Show Context)
Citation Context ...he better performance of iRprop + compared to iRprop − is not significant most of the time. 3.4 Lorenz Time Series 3.4.1 Problem and Model Description The task is the modeling of the Lorenz attractor =-=[10]-=-, which is defined by the three coupled differential equations ˙x(t) = σ (y(t) − x(t)) ˙y(t) = −x(t) z(t)+rx(t) − y(t) (9) ˙z(t) = x(t) y(t) − bz(t) . For the chosen parameter values σ = 16, r =45.92 ... |

644 | A direct adaptive method for faster backpropagation learning: The RPROP algorithm
- Riedmiller, Braun
- 1993
(Show Context)
Citation Context ...evert a step. This combines the strictly “global” approach, where the complete previous update for all weights is reversed if E (t) >γE (t−1) (γ =1.0 in [22]; 1.0 <γ≤ 1.05 in [25]), with the ideas in =-=[18, 24]-=-. Compared to Rprop + only one additional variable, the previous error E (t−1) , has to be stored. We refer to this modified algorithm as iRprop + throughout the remainder of this paper. Figure 2 summ... |

415 |
Individual comparisons by ranking methods
- Wilcoxon
- 1945
(Show Context)
Citation Context ...itializations were the same for all the learning algorithms. In order to analyze whether the differences between the error trajectories are significant, every 10 propagations a Wilcoxon rank sum test =-=[26]-=- has been performed (all statements refer to a significance level of 5 %; however, most differences are highly significant). In the following, only the most important results are reported. 7smedian er... |

339 | Increased rates of convergence through learning rate adaptat ion - Jacobs - 1988 |

307 | An introduction to the conjugate gradient method without the agonizing pain
- Shewchuk
- 1994
(Show Context)
Citation Context ... from Rprop − . Fahlman’s Quickprop (cf. [2, 15]), the BFGS (cf. [14]) algorithm, a quasi-Newton method, which iteratively estimates the inverse of the Hessian, and the conjugate gradient method (cf. =-=[14, 20]-=-). We employ the nonlinear CG algorithm using the Polak-Ribière method. The search direction is reset to the negative gradient direction whenever a search direction is computed that is not a descent d... |

252 |
Fast-learning variations on back-propagation: An empirical study
- Fahlman
- 1989
(Show Context)
Citation Context ...· η− , ∆min � � � ∂E (t) − sign ∂wij � · ∆ (t) ij Figure 3: The iRprop − algorithm without weight-backtracking. The proposed algorithm differs only in one line from Rprop − . Fahlman’s Quickprop (cf. =-=[2, 15]-=-), the BFGS (cf. [14]) algorithm, a quasi-Newton method, which iteratively estimates the inverse of the Hessian, and the conjugate gradient method (cf. [14, 20]). We employ the nonlinear CG algorithm ... |

156 | Adapting arbitrary normal mutation distributions in evolution strategies: The covariance matrix adaptation
- Hansen, Ostermeier
(Show Context)
Citation Context ...tisfy 〈oi, oj〉 =0ifi �= j and ||oi|| = 1. The parameter a controls the condition of the problem. This test function is a generalization of the artificial error surface proposed in [21]. It is used in =-=[3, 4]-=- for analyzing the local search properties of evolutionary algorithms. Rprop is not invariant under rotation of the coordinate system; its performance strongly depends on the choice of o1,...,on (not ... |

132 |
Multisurface method of pattern separation for medical diagnosis applied to breast cytology
- Wolberg, Mangasarian
- 1990
(Show Context)
Citation Context ...00 1200 1600 2000 propagations (computational costs) Figure 5: Medians of the training errors for the cancer problem. 3.2 Cancer Classification 3.2.1 Problem and Model Description The cancer1 problem =-=[27]-=- is also a real-world classification task: Based on 9 inputs describing a tumor, the task is to classify it as either benign or malignant. The data set consists of 350 patterns. Again a 1-of-2 encodin... |

127 | Efficient backprop
- LeCun, Bottou, et al.
- 1998
(Show Context)
Citation Context ...ems. An idea about the magnitude of a for neural networks comes from the analysis of the Hessian matrix. It is argued that a typical Hessian has few small, many medium, and few very large eigenvalues =-=[9]-=-. This leads to a ratio between the longest and the shortest axis that is much larger than 10 3 . As the Rprop algorithms depend only on the sign of the derivative and the ranking of the error values,... |

123 |
Accelerating the Convergence of the Back-Propagation Method. Biological Cybernetics 59
- Vogl, Mangis, et al.
- 1988
(Show Context)
Citation Context ...f the Rprop algorithm as proposed in [18] implements a general concept for improving network training termed weight-backtracking, which means retracting a previous update for some or all weights, cf. =-=[22, 24, 25]-=-. Whether to take back a step or not is decided by means of a heuristic. After adjusting the step-sizes according to (3), the weight updates ∆wij are determined. Two cases are distinguished. If the si... |

101 |
PROBEN1, A set of benchmarks and benchmarking rules for neural network training algorithms
- Prechelt
- 1994
(Show Context)
Citation Context ...ural network benchmark problems. Two classification tasks (the cancer1 and diabetes1 data sets, both from the UCI repository of machine learning database, as given in the PROBEN1 benchmark collection =-=[12]-=-) and two regression problems (predicting the sunspots and Lorenz time series) are considered. As models, we employ feed-forward neural networks for classification and the modeling of the sunspots tim... |

82 | Advanced Supervised Learning in Multi-Layer Perceptron from Back Propagation to Adaptative Learning Algorithms
- Riedmiller
- 1994
(Show Context)
Citation Context ...ld one care for yet another one? The learning algorithms proposed in this article are not totally new approaches, but modifications of the two established Rprop (resilient backpropagation) algorithms =-=[17, 18, 6]-=-. The improved versions maintain the advantageous properties of the originals: 1. Rprop as proposed by Riedmiller and Braun is (already) very fast and accurate. The reader is referred to [8, 17, 18, 1... |

63 |
Supersab: fast adaptive backpropagation with good scaling properties
- tollenaere
- 1990
(Show Context)
Citation Context ...wij possesses the same sign for consecutive steps, the step-size is increased, whereas if it changes sign, the step-size is decreased (the same principle is also used in other learning methods, e.g., =-=[7, 24]-=-). The step-sizes are bounded by the parameters ∆min and ∆max. Note that by (1) and (2) the following holds: ∂E (t) ∂∆ (t−1) ij = ∂E(t) ∂w (t) ij ∂w (t) ij ∂∆ (t−1) = − ij ∂E � (t) (t−1) ∂E sign ∂wij ... |

52 | A Study of Experimental Evaluations of Neural Network Learning Algorithms: Current Research Practice. Rapport technique 19/94, Fakultat fur Informatik, Universitat
- Prechelt
(Show Context)
Citation Context ...hs, we have compared the new variants of Rprop with the two original ones and three widely spread optimization techniques, on three real world neural network learning problems, plus one realistic one =-=[13]-=-. Basically, the outcomes of all of our experiments are comparable: iRprop + has turned out to be superior in the initial phase of learning, whereas in some experiments BFGS performs better in a later... |

50 | Convergence Properties of Evolution Strategies with the Derandomized Covariance Matrix Adaptation: The (µ/µI,λ)-CMA-ES
- Hansen, Ostermeier
- 1997
(Show Context)
Citation Context ...tisfy 〈oi, oj〉 =0ifi �= j and ||oi|| = 1. The parameter a controls the condition of the problem. This test function is a generalization of the artificial error surface proposed in [21]. It is used in =-=[3, 4]-=- for analyzing the local search properties of evolutionary algorithms. Rprop is not invariant under rotation of the coordinate system; its performance strongly depends on the choice of o1,...,on (not ... |

43 | M.: Improving the rprop learning algorithm
- Igel, Hüskel
- 2000
(Show Context)
Citation Context ...ld one care for yet another one? The learning algorithms proposed in this article are not totally new approaches, but modifications of the two established Rprop (resilient backpropagation) algorithms =-=[17, 18, 6]-=-. The improved versions maintain the advantageous properties of the originals: 1. Rprop as proposed by Riedmiller and Braun is (already) very fast and accurate. The reader is referred to [8, 17, 18, 1... |

42 |
Speeding up backpropagation
- Silva, Almeida
- 1990
(Show Context)
Citation Context ...f the Rprop algorithm as proposed in [18] implements a general concept for improving network training termed weight-backtracking, which means retracting a previous update for some or all weights, cf. =-=[22, 24, 25]-=-. Whether to take back a step or not is decided by means of a heuristic. After adjusting the step-sizes according to (3), the weight updates ∆wij are determined. Two cases are distinguished. If the si... |

39 | Comparison of Optimized Backpropagation Algorithms
- Schiffman, Joost, et al.
- 1989
(Show Context)
Citation Context ...s [17, 18, 6]. The improved versions maintain the advantageous properties of the originals: 1. Rprop as proposed by Riedmiller and Braun is (already) very fast and accurate. The reader is referred to =-=[8, 17, 18, 19]-=- for comparisons of Rprop with other supervised learning techniques and to the review of learning methods in [15]. 2. The Rprop algorithms are known to be very robust with respect to their internal pa... |

38 |
Acceleration techniques for the back–propagation algorithm
- Silva, Almeida
- 1990
(Show Context)
Citation Context ...1 ≤ i, j ≤ n they satisfy 〈oi, oj〉 =0ifi �= j and ||oi|| = 1. The parameter a controls the condition of the problem. This test function is a generalization of the artificial error surface proposed in =-=[21]-=-. It is used in [3, 4] for analyzing the local search properties of evolutionary algorithms. Rprop is not invariant under rotation of the coordinate system; its performance strongly depends on the cho... |

19 | Speeding up backpropagation algorithms by using cross-entropy combined with pattern normalization
- Joost, Schiffmann
- 1999
(Show Context)
Citation Context ...s [17, 18, 6]. The improved versions maintain the advantageous properties of the originals: 1. Rprop as proposed by Riedmiller and Braun is (already) very fast and accurate. The reader is referred to =-=[8, 17, 18, 19]-=- for comparisons of Rprop with other supervised learning techniques and to the review of learning methods in [15]. 2. The Rprop algorithms are known to be very robust with respect to their internal pa... |

17 |
Neural Smithing
- Reed, Marks
- 1999
(Show Context)
Citation Context ...er and Braun is (already) very fast and accurate. The reader is referred to [8, 17, 18, 19] for comparisons of Rprop with other supervised learning techniques and to the review of learning methods in =-=[15]-=-. 2. The Rprop algorithms are known to be very robust with respect to their internal parameters [16, 17, 18]. In addition, these parameters are comparatively intuitive and therefore easy to adjust. 3.... |

16 | Deterministic nonperiodic ‡ow - Lorenz - 1963 |

8 |
Neuronale Netze: Optimierung durch Lernen und Evolution
- Braun
- 1997
(Show Context)
Citation Context ...ration of the Rprop + algorithm with weightbacktracking [18] (left column) and of the Rprop − algorithm without weightbacktracking scheme [17] (right column). 2.2 Rprop without Weight-Backtracking In =-=[1, 17]-=- a different version of the Rprop algorithm is described. The weightbacktracking is omitted and the right hand side of (5) is used in all cases. Hence, there is no need to store the previous weight up... |

8 | An extended Elman net for modeling time series
- Stagge, Sendhoff
- 1997
(Show Context)
Citation Context ...error percentages, whereas the inset plot shows the average for iRprop + and the BFGS algorithm.. For this task, a recurrent neural network is chosen. More precisely, we use an extended Elman network =-=[23]-=- with a single memory layer. For all activation functions the hyperbolic tangent is employed. Hence, input and output data are normalized to lie between -0.7 and 0.7. Because the weights in recurrent ... |

7 |
Target detection through image processing and resilient propagation algorithms. Neurocomputing 2000
- Patnaik, Rajan
(Show Context)
Citation Context ... for applications where the gradient is numerically estimated or the error is noisy. 6. Rprop is easy to implement and not susceptible to numerical problems. A hardware implementation is described in =-=[11]-=-. In this study, strong empirical evidence is given that the new methods outperform the original ones in terms of speed. The empirical evaluation makes use of four neural network benchmark problems (t... |

5 |
Fitness distributions: Tools for designing efficient evolutionary computations
- Igel, Chellapilla
- 1999
(Show Context)
Citation Context ...f w and the basis vectors are the same for each Rprop, but different for each trial. During optimization, we computed characteristic features of the steps on the error surface for subsequent analysis =-=[5]-=-. 4.2 Results The performance of the four Rprop algorithms depends on the condition of the test function, but only for very small and unrealistic values of a (a � 3) the original methods converge fast... |

3 |
Untersuchungen zu Konvergenz und Generalisierungsfahigkeit uberwachter Lernverfahren im SNNS
- Riedmiller
- 1993
(Show Context)
Citation Context ...for each of the following settings with condition a =10 and dimension n =2. Wefixedη − to the standard value of 0.5, and varied η + ∈ {1.05, 1.1, 1.15,...,1.6}, i.e., even beyond the sensible choices =-=[16]-=-. In a second set of experiments, we fixed η + to the standard value of 1.2 and varied η − ∈ {0.4, 0.45, 0.5,...,0.8}. In all experiments, the new algorithms clearly outperformed the original ones. To... |

3 | Fitness distributions: Tools for designing e#cient evolutionary computations - Igel, Chellapilla - 1999 |

1 | Fitness distributions: tools for designing e cient evolutionary computations - Igel, Chellapilla - 1999 |

1 | Schi mann, Speeding up backpropagation algorithms by using cross–entropy combined with pattern normalization - Joost, W - 1998 |

1 | Speeding up backpropagation, in: R. Eckmiller (Ed.), Advanced Neural Computers - Silva, Almeida - 1990 |