## Backpropagation-Decorrelation: online recurrent learning with O(N) complexity

Citations: | 31 - 3 self |

### BibTeX

@MISC{Steil_backpropagation-decorrelation:online,

author = {Jochen J. Steil},

title = { Backpropagation-Decorrelation: online recurrent learning with O(N) complexity},

year = {}

}

### Years of Citing Articles

### OpenURL

### Abstract

We introduce a new learning rule for fully recurrent neural networks which we call Backpropagation-Decorrelation rule (BPDC). It combines important principles: one-step backpropagation of errors and the usage of temporal memory in the network dynamics by means of decorrelation of activations. The BPDC rule is derived and theoretically justified from regarding learning as a constraint optimization problem and applies uniformly in discrete and continuous time. It is very easy to implement, and has a minimal complexity of 2N multiplications per time-step in the single output case. Nevertheless we obtain fast tracking and excellent performance in some benchmark problems including the Mackey-Glass time-series.

### Citations

1491 | Independent Component Analysis
- Hyvarinen, Karhunen, et al.
- 2001
(Show Context)
Citation Context ...orementioned ideas with the attempt to optimize information processing by means of decorrelation. While decorrelation learning rules are well known for sparse coding and blind source separation [10], =-=[11]-=- and have also been proposed in biological modeling [12], the combination of decorrelation with backpropagtion of teacher induced errors is not common for recurrent trajectory learning. Our new learni... |

243 | Long short-term memory
- Hochreiter, Schmidhuber
- 1997
(Show Context)
Citation Context ...dient, APRL suffers from fading memory like gradient algorithms from the vanishing gradient (cf. [18]) and cannot preserve information for long times like for instance Long-Short term memory networks =-=[19]-=-. For more discussion of the APRL strategy see [15]. D. Backpropagation-decorrelation learning Ck−1 − I) ∆w batch ij (k), The considerations above motivate an approximation which does not try to accum... |

132 | Gradient calculations for dynamic recurrant neural networks: A survey
- Pearlmutter
- 1995
(Show Context)
Citation Context ...one of the main issues in recurrent learning research (see for a review [1]). In the field of gradient based algorithms, some milestones were the reduction of the O(N 4 ) real-time recurrent learning =-=[2]-=- to O(N 3 ) in [3], and the introduction of backpropagation through time (BPTT) in its online version [4], which has O(N 2 ) but is storage demanding. There is ongoing research to devise other efficie... |

116 | Gradient-based learning algorithms for recurrent networks and their computational complexity
- Williams, Zipser
- 1995
(Show Context)
Citation Context ...or their application is the known high complexity of training algorithms. Thus reduction of training complexity has always been one of the main issues in recurrent learning research (see for a review =-=[1]-=-). In the field of gradient based algorithms, some milestones were the reduction of the O(N 4 ) real-time recurrent learning [2] to O(N 3 ) in [3], and the introduction of backpropagation through time... |

115 | An efficient gradient-based algorithm for on-line training of recurrent network trajectories
- Williams, Peng
- 1990
(Show Context)
Citation Context ...sed algorithms, some milestones were the reduction of the O(N 4 ) real-time recurrent learning [2] to O(N 3 ) in [3], and the introduction of backpropagation through time (BPTT) in its online version =-=[4]-=-, which has O(N 2 ) but is storage demanding. There is ongoing research to devise other efficient recurrent learning schemes, in particular employing regularization techniques and partially recurrent ... |

58 |
Adaptive nonlinear system identification with Echo State Networks
- Jaeger
- 2003
(Show Context)
Citation Context ...acher forcing, and decorrelation. Applied only to the output weights, it yields a minimal O(N) complexity at very good performance. The resulting network structure resembles the “echo state networks” =-=[7]-=-, which also use a dynamical reservoir and optimize a linear readout function, but need a prescaling of the weight matrix to obtain a suitable spectral radius and rely on a special local random connec... |

33 | New results on recurrent network training: unifying the algorithms and accelerating convergence. Neural Networks
- Atiya, Parlos
- 2000
(Show Context)
Citation Context ... use as standard backpropagation for feedforward networks, which could attract a wider audience to the usage of recurrent networks, is still lacking. Recently two fruitful new ideas have appeared. In =-=[6]-=-, Atiya and Parlos have derived a new O(N 2 )-efficient algorithm, which is based on the idea to differentiate the error function with respect to the states in order to obtain a “virtual teacher” targ... |

22 | Learning with Recurrent Neural Networks
- Hammer
- 2000
(Show Context)
Citation Context ...demanding. There is ongoing research to devise other efficient recurrent learning schemes, in particular employing regularization techniques and partially recurrent networks (see the recent review in =-=[5]-=-). But most of the efficient existing algorithms are quite complex and in particular the online techniques typically need proper adjustment of learning rates and time-constants. A technique as simple ... |

20 |
The “liquid computer”: A novel strategy for real-time computing on time series
- Natschläger, Maass, et al.
- 2002
(Show Context)
Citation Context ...rmulations of the recurrent dynamics, as APRL (Atiya-Parlos recurrent learning). A second, seemingly very different source of ideas has been developed in [7] under the notion “echo state network” and =-=[8]-=- as “liquid state machine”. Both approaches use recurrent networks as a kind of dynamic reservoir, which stores information about the temporal behavior of the inputs and allows to learn a linear reado... |

17 |
A fixed size storage O(n 3 ) time complexity learning algorithm for fully recurrent continually running networks
- Schmidhuber
- 1992
(Show Context)
Citation Context ...sues in recurrent learning research (see for a review [1]). In the field of gradient based algorithms, some milestones were the reduction of the O(N 4 ) real-time recurrent learning [2] to O(N 3 ) in =-=[3]-=-, and the introduction of backpropagation through time (BPTT) in its online version [4], which has O(N 2 ) but is storage demanding. There is ongoing research to devise other efficient recurrent learn... |

10 | The vanishing gradient problem during learning recurrent neural nets and problem solutions
- Hochreiter
- 1998
(Show Context)
Citation Context ...f the instantaneous error and a momentum term, which decays to zero. Thus, though not following the gradient, APRL suffers from fading memory like gradient algorithms from the vanishing gradient (cf. =-=[18]-=-) and cannot preserve information for long times like for instance Long-Short term memory networks [19]. For more discussion of the APRL strategy see [15]. D. Backpropagation-decorrelation learning Ck... |

9 |
Analyzing the weight dynamics of recurrent learning algorithms,” Neurocomputing
- Schiller, Steil
(Show Context)
Citation Context ... −η [ ( ∂g ∂w ) T ( ∂g ∂w )] −1 ( ) T ∂g ∂g ∆x. (9) ∂w ∂x It is worth noting that this update direction ∆w does not follow the conventional gradient direction as for real time recurrent learning [6], =-=[15]-=- and therefore leads to different weight dynamics, which show a larger sensitivity to transient behavior for online-learning [15]. The batch update (9) can be computed for one output and (K ≫ N) with ... |

8 | Recurrent learning of input-output stable behaviour in function space: A case study with the Roessler attractor
- Steil, Ritter
- 1999
(Show Context)
Citation Context ...= z1+0.2z2, ˙z3 = 0.2+z1z3 − 5.7z3, where z1(t), z3(t) are inputs and z2(t) reference output. This is a mildly complex trajectory learning task in continuous time with no closed form for its solution =-=[20]-=-. Training proceeds on 1200 time steps with ∆t = 0.1 integrated by a 4-th order Runge-Kutta scheme starting from initial conditions [0.495, −0.116., −0.3]. The first 200 steps are disregarded for the ... |

7 | A Conjugate Gradient Learning Algorithm for Recurrent Neural
- Chang, Mak
- 1999
(Show Context)
Citation Context ...T) can be derived from this starting point. Additionally a number of approaches using classical quadratic optimization methods to solve (6), (7) have been introduced, for instance conjugate gradients =-=[13]-=- and Newton methods, mostly under the term second order learning (see [14] and the references therein). B. Virtual Teacher Forcing To minimize (5), we follow a new approach introduced in [6]. The idea... |

5 | A fixed size storage o(n3) time complexity learning algorithm for fully recurrent continually running networks - Schmidhuber - 1992 |

5 | Natural gradient learning for spatio-temporal decorrelation: recurrent network
- Choi, Amari, et al.
(Show Context)
Citation Context ...two aforementioned ideas with the attempt to optimize information processing by means of decorrelation. While decorrelation learning rules are well known for sparse coding and blind source separation =-=[10]-=-, [11] and have also been proposed in biological modeling [12], the combination of decorrelation with backpropagtion of teacher induced errors is not common for recurrent trajectory learning. Our new ... |

4 | A learning rule for dynamic recruitment and decorrelation
- Körding, König
- 2000
(Show Context)
Citation Context ...ion processing by means of decorrelation. While decorrelation learning rules are well known for sparse coding and blind source separation [10], [11] and have also been proposed in biological modeling =-=[12]-=-, the combination of decorrelation with backpropagtion of teacher induced errors is not common for recurrent trajectory learning. Our new learning rule uses three important principles: (i) one-step ba... |

3 |
Local structural stability of recurrent networks with timevarying weights
- Steil
- 2002
(Show Context)
Citation Context ...e in particular for identification and adaptive control tasks. Next steps in the investigation of this algorithm are a theoretical analysis of the network stability with the methods proposed in [20], =-=[21]-=-. Of further interest also is a characterization of the generalization ability with respect to properties of the reservoir or the information transmission rate of the reservoir. We believe that the de... |

2 |
Analysis and comparison of algorithms for training recurrent neural networks
- Schiller
- 2003
(Show Context)
Citation Context ...DC rule tend to center activations at zero (except the output neuron) and to decorrelate them. The update complexity depends on the recursive computation of Ck → Ck+1 and adds to a total of 7N 2 + 4N =-=[17]-=-. We refer to this algorithm as APRL-online. We can rewrite (10) also as ∆wij(k+1) = η ∆t + η ∆t (C−1 k [ −1 Ck fk ] j γi(k + 1) Ck−1C−1 k−1 − C−1 k−1 ) k−1 r=0 ∑ [fr] j γi(r + 1) = η [ −1 Ck ∆t fk ] ... |

1 |
On the weight dynamcis of recurrent learning
- Schiller, Steil
(Show Context)
Citation Context ... allows to learn a linear readout function. There is, however, an interesting connection between these ideas: for the – most common and most important – case of a single output neuron it was shown in =-=[9]-=- that APRL also leads to a functional decomposition of the trained networks into a fast adapting readout layer and a slowly changing dynamic reservoir. In the reservoir weight changes are highly coupl... |

1 | A conjugate gradient learning algorith for recurrent neural networks - Chang, Mak - 1999 |

1 |
Recurrent Neural Networks: Design and Applications
- Santos, Zuben
- 1999
(Show Context)
Citation Context ...aches using classical quadratic optimization methods to solve (6), (7) have been introduced, for instance conjugate gradients [13] and Newton methods, mostly under the term second order learning (see =-=[14]-=- and the references therein). B. Virtual Teacher Forcing To minimize (5), we follow a new approach introduced in [6]. The idea is to use the constraint equation to compute weight changes to approach a... |

1 |
Attractive periodic sets in descrete-time recurrent networks (with emphasis on fixed-point stability and bifurcations in two-neuron networks
- Tiňo, Horne, et al.
- 2001
(Show Context)
Citation Context ... terms exactly sum up to 1 Note that any system with a common sigmoid activation function can be represented by a dynamically equivalent system with tanh as activation and appropriately scaled inputs =-=[16]-=-.TABLE I COMPLEXITY OF BPDC ALGORITHMS COUNTING MULTIPLICATIONS FOR ONE OUTPUT NEURON algorithm update all weights output weights only APRL batch 3N 2 + 2N 2N 2 + 3N APRL online 7N 2 + 4N 6N 2 + 5N B... |