## An Efficient Improvement of the RPROP Algorithm (2003)

Venue: | In Proceedings of the First International Workshop on Artificial Neural Networks in Pattern Recognition (ANNPR-03 |

Citations: | 3 - 1 self |

### BibTeX

@INPROCEEDINGS{Anastasiadis03anefficient,

author = {Aristoklis D. Anastasiadis and Uxbridge Ub Ph},

title = {An Efficient Improvement of the RPROP Algorithm},

booktitle = {In Proceedings of the First International Workshop on Artificial Neural Networks in Pattern Recognition (ANNPR-03},

year = {2003}

}

### OpenURL

### Abstract

This paper introduces an efficient modification of the Rprop algorithm for training neural networks. The convergence of the new algorithm can be justified theoretically, and its performance is investigated empirically through simulation experiments using some pattern classification benchmarks. Numerical evidence shows that the algorithm exhibits improved learning speed in all cases, and compares favorably against the Rprop and a recently proposed modification, the iRprop. 1.

### Citations

2732 |
Learning internal representations by error propagation
- Rumelhart, Hinton, et al.
- 1986
(Show Context)
Citation Context ...uction Gradient descent is the most widely used class of algorithms for supervised learning of neural networks. The most popular training algorithm of this category is the batch Back-Propagation (BP) =-=[1]-=-. It is a first order method that minimizes the error function by updating the weights w using the steepest descent method [1]: ( ) t t t+ 1 w = w −η∇E w (1) where E is the batch error measure defined... |

741 |
Aha, UCI repository of machine learning databases, in www.ics.uci.edu/∼mlearn/MLRepository.html
- Murphy, W
- 1992
(Show Context)
Citation Context ...mpare it with the original Rprop [2] and the Improved Rprop (iRprop) proposed recently by Igel and Husken [3]. We have used well-studied problems from the UCI Repository of Machine Learning Databases =-=[16]-=-, as well as problems studied extensively by other researchers in an attempt to reduce as much as possible biases introduced by the size of the weights space and the quality of the training data. We d... |

644 | A direct adaptive method for faster backpropagation learning: The RPROP algorithm
- Riedmiller, Braun
- 1993
(Show Context)
Citation Context ... of Mathematics, University of Patras Artificial Intelligence Research Center (UPAIRC), University of Patras, GR-26110 Patras, Greece Email: vrahatis@math.upatras.gr Backpropagation (Rprop) algorithm =-=[2]-=-. Recently a modification of the Rprop, the so-called Improved Rprop (iRprop) has been proposed [3]. Empirical evaluations of iRprop gave good results, showing that iRprop outperforms in several cases... |

336 |
Introduction to Matrix Computations
- Stewart
- 1974
(Show Context)
Citation Context ...A π that can be partitioned into block tridiagonal form, possibly after a suitable permutation [12]. An algorithmic procedure for transforming a symmetric matrix to a tridiagonal form is presented in =-=[13]-=-. In case the initial points are far from the neighborhood of a local minimizer, then it is possible to equip algorithms with local learning rates (like the one proposed here) with a strategy for adap... |

101 |
PROBEN1, A set of benchmarks and benchmarking rules for neural network training algorithms
- Prechelt
- 1994
(Show Context)
Citation Context ...n We have used four pattern classification problems from the UCI repository of machine learning database, namely cancer1, diabetes1, thyroid1, genes2, as described in the PROBEN1 benchmark collection =-=[17]-=-. In all experiments, we use the same parameters as suggested in [2], and feed-forward neural networks with sigmoid hidden and output nodes. The notation I-H-O is used to denote network architecture w... |

73 |
Introduction to nonlinear optimization
- Scales
- 1985
(Show Context)
Citation Context ... the gradient of the error function is available at the endpoints of an interval of uncertainty, it is necessary to evaluate function information at an interior point in order to reduce this interval =-=[8]-=-. This is because it is possible to decide whether the corresponding interval brackets a local minimum simply by looking the function values between two successive epochs t and t-1 (i.e. E(t-1) and E(... |

59 | Empirical evaluation of the improved rprop learning algorithms
- Igel, Hüsken
(Show Context)
Citation Context ...of Patras, GR-26110 Patras, Greece Email: vrahatis@math.upatras.gr Backpropagation (Rprop) algorithm [2]. Recently a modification of the Rprop, the so-called Improved Rprop (iRprop) has been proposed =-=[3]-=-. Empirical evaluations of iRprop gave good results, showing that iRprop outperforms in several cases the Quickprop and Conjugate gradient algorithms [3]. One problem inherent with gradient descent me... |

46 |
Evolving neural networks
- Fogel, Fogel, et al.
- 1990
(Show Context)
Citation Context ... performance. This problem can be overcome through the use of global optimization. Various algorithms of this category have been employed, including simulated annealing (SA) [4], evolutionary methods =-=[5]-=-, random methods, and deterministic searches [6]. Global optimization, however, is considered computationally expensive, which could be a significant problem particularly for large networks [7]. In th... |

30 |
Iterative Methods for Solving Partial Difference Equations of Elliptic Type
- Young
- 1950
(Show Context)
Citation Context ... 1. Pseudocode of the modified Rprop Jacobi scheme, which fulfils the assumptions of Theorem 1 of [10]. A detailed description is outside the scope of this paper. ij Remark : The Property A π : Young =-=[11]-=- has discovered a class of matrices described as having property A π that can be partitioned into block tridiagonal form, possibly after a suitable permutation [12]. An algorithmic procedure for trans... |

27 | Improving the Convergence of the Back-propagation Algorithm Using Learning Rate Adaptation Methods
- Magoulas, Vrahatis, et al.
- 1999
(Show Context)
Citation Context ...tion factor that is used to update the midpoint of the considered interval. The choice of q has an influence on the number of error function evaluations required to obtain an acceptable weight vector =-=[15]-=-. In practice, one iteration of the bisection method along each weight direction is recommended, as exact subminimization to obtain accurate approximations of the subminimizer in each direction requir... |

21 |
K.: Bisection is optimal
- SIKORSKI
- 1982
(Show Context)
Citation Context ...within the given interval [ a , b] and it is a global convergence method. Moreover it has a great advantage since it is optimal, i.e. it possesses asymptotically the best possible rate of convergence =-=[9]-=-. Also, the number of iterations of the bisection method that are required for the attainment of an approximate minimizer within the interval [ a , b] to a predetermined accuracy ε is knowns−1 beforeh... |

11 |
Deterministic global optimal FNN training algorithms
- Tang, Koehler
- 1994
(Show Context)
Citation Context ...gh the use of global optimization. Various algorithms of this category have been employed, including simulated annealing (SA) [4], evolutionary methods [5], random methods, and deterministic searches =-=[6]-=-. Global optimization, however, is considered computationally expensive, which could be a significant problem particularly for large networks [7]. In this paper, we propose to combine a quick and comp... |

8 | From linear to nonlinear iterative methods
- Vrahatis, Magoulas, et al.
- 2003
(Show Context)
Citation Context ...training method that updates the weights according to the criteria * mentioned above converges to w . Proof: It can be shown that the proof of the above theorem follows from the proof of Theorem 1 of =-=[10]-=- by observing that the training method forms a nonlinear Initialise: t=0; set the maximum number of epochs T, calculate E(0); set q=1; for all weights wij − + ∂E( t −1) (i,j=1,…N) set ∀i, j : ∆ij ( t)... |

7 |
Iterative Solution Methods, Cambridge Univ
- Axelsson
- 1994
(Show Context)
Citation Context ...j Remark : The Property A π : Young [11] has discovered a class of matrices described as having property A π that can be partitioned into block tridiagonal form, possibly after a suitable permutation =-=[12]-=-. An algorithmic procedure for transforming a symmetric matrix to a tridiagonal form is presented in [13]. In case the initial points are far from the neighborhood of a local minimizer, then it is pos... |

5 | Globally Convergent Algorithms with Local Learning Rates
- Magoulas, Vrahatis
- 2002
(Show Context)
Citation Context ... descent one. In this way, a decrease of the function value can be ensured at each iteration; and convergence to a local minimizer of the objective function from remote initial points can be achieved =-=[14]-=- (see the detailed description of [14] on how this can be applied to any adaptive gradient-based algorithms with individual step-sizes). Based on the above theoretical discussion we propose in Figure ... |

2 |
A globally optimal annealing learning algorithm for multilayer perceptrons with applications
- Fang, Li
- 1990
(Show Context)
Citation Context ... they often result in poor performance. This problem can be overcome through the use of global optimization. Various algorithms of this category have been employed, including simulated annealing (SA) =-=[4]-=-, evolutionary methods [5], random methods, and deterministic searches [6]. Global optimization, however, is considered computationally expensive, which could be a significant problem particularly for... |

1 |
Simulated Annealing and Weight Decay
- Treadgold, Gedeon
(Show Context)
Citation Context ...ethods [5], random methods, and deterministic searches [6]. Global optimization, however, is considered computationally expensive, which could be a significant problem particularly for large networks =-=[7]-=-. In this paper, we propose to combine a quick and computationally cheap gradient descent algorithm, Rprop, with more ‘global’ information like the magnitude of the network batch error, in order to im... |