## Nonmonotone Methods for Backpropagation Training with Adaptive Learning Rate (1999)

Venue: | In: Proceedings of the IEEE International Joint Conference on Neural Networks (IJCNN'99), Washington D.C |

### BibTeX

@INPROCEEDINGS{Plagianako99nonmonotonemethods,

author = {V.P. Plagianako and M.N. Vrahatis},

title = {Nonmonotone Methods for Backpropagation Training with Adaptive Learning Rate},

booktitle = {In: Proceedings of the IEEE International Joint Conference on Neural Networks (IJCNN'99), Washington D.C},

year = {1999},

pages = {1762--1767}

}

### OpenURL

### Abstract

this paper wepr8[3 t nonmonotone methodsfor feedfor arneurE networ tr#P83Eq i.e. tr282P# methods in whicherPO function valuesar allowed to incrE2O at some iter6#OEq8 Mor specifically, at each epoch we impose that thecur#0 terPP function value must satisfy anAr[286O ypecr206Eq8# with rth ect to the maximumer[2 function value of Mpr]6[PE epochs. A str]OEq to dynamically adapt M is suggested and twotr[#[6E algorEq82 with adaptive lear6 ingr ates that successfully employ the above mentioned acceptability cr[[#P#E ar pr[ osed. ExperE28 tal rl sults show that the nonmonotone lear2]3 str][02 impr ves the conver[2## speed and the successr ate of the methods consider#O Introduction The batch tr[P#2O of a Feedfor ar Neur0 Networ (FNN) is consistent with the theor ofunconstr2P32 optimization and can be viewed as the minimization of the function E; that is to find aminimizer w # = (w # 1 ,w # 2 ,...,w # n ) # IR n , such that: w # = min w#IR n E(w), (1) wher E is the batch er[# measur defined as the sum--of--squarO2PErErEPE er--o function over the entir tr3]3#E set. The widely used batch Back--Pr22Eq3833 (BP) [28] is a fir##O0Eq# neurE networ tr#]O6E algorEq#[ which minimizes the er[[ function using the steepest descent method [8]: w k+1 = w k - ##E(w k ), (2) wher k indicatesitertesE8 (k =0, 1,...), the gr0[6# t vector is usually computed by the back--pr23Eq83P2 of the er88 thr03] the layer of the FNN (see [28]) and # is a constantheur[Eq8]#80 chosenlear]#0 rar Appr0 ]O]2E learO]2 rar help to avoid converO320 to a saddle point or a maximum. In pr]#P08E a small constant learP#8 r ate is chosen (0 <#<1) inor0P to secur the conver6[#3 of the BP tr]32P# algor#Eq and to avoid oscillations in adir3]PEq wher theer02 function is steep. It is well known that this appr...