## Time Series Prediction by Using a Connectionist Network with Internal Delay Lines (1994)

Venue: | Time Series Prediction |

Citations: | 62 - 4 self |

### BibTeX

@INPROCEEDINGS{Wan94timeseries,

author = {Eric Wan},

title = {Time Series Prediction by Using a Connectionist Network with Internal Delay Lines},

booktitle = {Time Series Prediction},

year = {1994},

pages = {195--217},

publisher = {Addison-Wesley}

}

### Years of Citing Articles

### OpenURL

### Abstract

A neural network architecture, which models synapses as Finite Impulse Response (FIR) linear filters, is discussed for use in time series prediction. Analysis and methodology are detailed in the context of the Santa Fe Institute Time Series Prediction Competition. Results of the competition show that the FIR network performed remarkably well on a chaotic laser intensity time series. 1 Introduction The goal of time series prediction or forecasting can be stated succinctly as follows: given a sequence y(1); y(2); : : : y(N) up to time N , find the continuation y(N + 1); y(N + 2)::: The series may arise from the sampling of a continuous time system, and be either stochastic or deterministic in origin. The standard prediction approach involves constructing an underlying model which gives rise to the observed sequence. In the oldest and most studied method, which dates back to Yule [1], a linear autoregression (AR) is fit to the data: y(k) = T X n=1 a(n)y(k \Gamma n) + e(k) = y(k) + ...

### Citations

1217 |
Multilayer Feedforward Networks are Universal Approximators
- Hornik, Stinchcombe, et al.
- 1989
(Show Context)
Citation Context ... y(k \Gamma 2); :::y(k \Gamma T )], which models the series exactly (assuming no noise). The neural network thus forms an approximation to the ideal function g(\Delta). Furthermore, it has been shown =-=[4, 5, 6]-=- that a feedforward neural network N with an arbitrary number of neurons is capable of approximating any uniformly continuous function. These arguments provide the basic motivation for the use of neur... |

1182 |
System Identification: Theory for the User
- Ljung
- 1999
(Show Context)
Citation Context ...ighted sum of past delayed values of the input: y(k) = T X n=0 w(n)x(k \Gamma n): (4) Note that this corresponds to the moving average component of a simple Autoregressive Moving Average (ARMA) model =-=[16, 17]-=-. The FIR filter, in fact, was one of the first basic adaptive elements ever studied [21]. From a biological perspective, the synaptic filter represents a Markov model of signal transmission correspon... |

834 |
Approximation by superpositions of a sigmoidal function
- Cybenko
- 1989
(Show Context)
Citation Context ... y(k \Gamma 2); :::y(k \Gamma T )], which models the series exactly (assuming no noise). The neural network thus forms an approximation to the ideal function g(\Delta). Furthermore, it has been shown =-=[4, 5, 6]-=- that a feedforward neural network N with an arbitrary number of neurons is capable of approximating any uniformly continuous function. These arguments provide the basic motivation for the use of neur... |

608 |
Neural networks and the bias/variance dilemma
- Geman, Bienenstock, et al.
- 1992
(Show Context)
Citation Context ...annot conclude that adaptation will necessarily achieve the optimum for a give structure and training sequence. Issues concerning biased estimators in the context of network learning are presented in =-=[28]-=-. Once the network is trained, long-term iterated prediction is achieved by taking the estimatesy(k) and feeding it back as input to the network:sy(k) = N q [y(k \Gamma 1)]: (16) This closed-loop syst... |

520 | Bayesian interpolation
- MacKay
- 1992
(Show Context)
Citation Context ...bmitted were scaled to much smaller values due to a missinterpretation of the performance measure, nalL = 3:5.) Alternative Bayesian methods for estimating uncertainties have been suggested by MacKay =-=[35, 36]-=- and Skilling [37]. Additional predictions: The complete 10000 point series continuation was provided after the competition. Fig. 13 shows various iterated predictions starting at different locations ... |

504 |
Beyond regression: new tools for prediction and analysis in the behavioral sciences
- Werbos
- 1974
(Show Context)
Citation Context ...arguments provide the basic motivation for the use of neural networks in time series prediction. The use of neural networks for time series prediction is not new. Previous work includes that of Werbos=-=[7, 8]-=-, Lapedes [9], and Weigend et al [10] to cite just a few. The connectionist entries in the SFI Competition attest to the success and significance of networks in the field. In this paper, we focus on a... |

412 | A learning algorithm for continually running fully recurrent neural networks
- Williams, Zipser
- 1989
(Show Context)
Citation Context ...e network is not fed back as input during training. Such a scheme is referred to as equation-error adaptation [25, 26]. The neural network community has more recently adopted the term teacher-forcing =-=[27]-=-. A simple argument for adapting in this fashion is as follows: in a stationary stochastic environment, minimizing the sum of the squared errors e(k) 2 corresponds to minimizing the expectation of the... |

398 | A practical Bayesian framework for back-propagation networks
- MacKay
- 1992
(Show Context)
Citation Context ...bmitted were scaled to much smaller values due to a missinterpretation of the performance measure, nalL = 3:5.) Alternative Bayesian methods for estimating uncertainties have been suggested by MacKay =-=[35, 36]-=- and Skilling [37]. Additional predictions: The complete 10000 point series continuation was provided after the competition. Fig. 13 shows various iterated predictions starting at different locations ... |

320 |
Phoneme recognition using time-delay neural networks
- Waibel, Hanazawa, et al.
- 1989
(Show Context)
Citation Context ...network all connections are modeled as FIR filters. 2.1 Alternative Representations of the FIR Topology A similar network structure incorporating embedded time delays is the Time-Delay Neural Network =-=[13, 14, 22]-=-. TDNNs have recently become popular for use in phoneme classification. A TDNN is typically described as a layered network in which the outputs of a layer are buffered several time steps and then fed ... |

188 |
Geometry from a Time Series
- Packard, Crutchfeld, et al.
- 1980
(Show Context)
Citation Context ...::y(k \Gamma T )] + e(k): (2) 1 Note this model is equally applicable for both scalar and vector sequences. The use of this nonlinear autoregression can be motivated as follows. First, Takens Theorem =-=[2, 3]-=- implies that for a wide class of deterministic systems, there exists a diffeomorphisms(one-to-one differential mapping) between a finite window of the time series [y(k \Gamma 1); y(k \Gamma 2); :::y(... |

188 |
Predicting the Future: Connectionist Approach
- Weigend, Huberman, et al.
- 1990
(Show Context)
Citation Context ...for the use of neural networks in time series prediction. The use of neural networks for time series prediction is not new. Previous work includes that of Werbos[7, 8], Lapedes [9], and Weigend et al =-=[10]-=- to cite just a few. The connectionist entries in the SFI Competition attest to the success and significance of networks in the field. In this paper, we focus on a method for achieving the nonlinear a... |

170 |
A time-delay neural network architecture for isolated word recognition
- Lang, Waibel, et al.
- 1990
(Show Context)
Citation Context ...network all connections are modeled as FIR filters. 2.1 Alternative Representations of the FIR Topology A similar network structure incorporating embedded time delays is the Time-Delay Neural Network =-=[13, 14, 22]-=-. TDNNs have recently become popular for use in phoneme classification. A TDNN is typically described as a layered network in which the outputs of a layer are buffered several time steps and then fed ... |

169 | The Effective Number of Parameters: An Analysis of generalization and regularization in nonlinear learning systems
- Moody
- 1992
(Show Context)
Citation Context ...respond directly to the number of free parameters. We are really trying to fit a function which is constrained by the network topology. Issues concerning effective degrees of freedom are discussed in =-=[38]-=-. One consequence of the large number of parameters can be seen in the extended long term iterated prediction (Figure 11). Due to the excessive number of degrees of freedom, the signal eventually beco... |

165 |
Nonlinear signal processing using neural networks: prediction and system modeling
- Lapedes, Farber
(Show Context)
Citation Context ...e the basic motivation for the use of neural networks in time series prediction. The use of neural networks for time series prediction is not new. Previous work includes that of Werbos[7, 8], Lapedes =-=[9]-=-, and Weigend et al [10] to cite just a few. The connectionist entries in the SFI Competition attest to the success and significance of networks in the field. In this paper, we focus on a method for a... |

82 |
Generalization and network design strategies
- LeCun
- 1989
(Show Context)
Citation Context ... fully connected network which may attempt to analyze the scene all at once. Similar locally symmetric constraints have been motivated for use in pattern classification using "shared weight"=-= networks [23, 24]-=-. 5 x(k) x(k-1) x(k-2) y(k) x(k) x(k) x(k-1) x(k-2) x(k-6) x(k-5) x(k-4) x(k-3) x(k-2) x(k-1) x(k) (a) (b) (c) q -1 q -1 Figure 5: An FIR network with 2nd order taps for all connections is unfolded in... |

81 |
Generalization of backpropagation with application to recurrent gas market model
- Werbos
- 1988
(Show Context)
Citation Context ...arguments provide the basic motivation for the use of neural networks in time series prediction. The use of neural networks for time series prediction is not new. Previous work includes that of Werbos=-=[7, 8]-=-, Lapedes [9], and Weigend et al [10] to cite just a few. The connectionist entries in the SFI Competition attest to the success and significance of networks in the field. In this paper, we focus on a... |

81 |
Modular construction of time-delay neural networks for speech recognition
- Waibel
- 1989
(Show Context)
Citation Context ...network all connections are modeled as FIR filters. 2.1 Alternative Representations of the FIR Topology A similar network structure incorporating embedded time delays is the Time-Delay Neural Network =-=[13, 14, 22]-=-. TDNNs have recently become popular for use in phoneme classification. A TDNN is typically described as a layered network in which the outputs of a layer are buffered several time steps and then fed ... |

78 |
Time Series Analysis - Univariate and Multivariate Methods
- Wei
- 1990
(Show Context)
Citation Context ...ighted sum of past delayed values of the input: y(k) = T X n=0 w(n)x(k \Gamma n): (4) Note that this corresponds to the moving average component of a simple Autoregressive Moving Average (ARMA) model =-=[16, 17]-=-. The FIR filter, in fact, was one of the first basic adaptive elements ever studied [21]. From a biological perspective, the synaptic filter represents a Markov model of signal transmission correspon... |

73 |
On a method of investigating periodicities in disturbed series, with special reference to wolfer’s sunspot numbers
- Yule
- 1927
(Show Context)
Citation Context ...nistic in origin. The standard prediction approach involves constructing an underlying model which gives rise to the observed sequence. In the oldest and most studied method, which dates back to Yule =-=[1]-=-, a linear autoregression (AR) is fit to the data: y(k) = T X n=1 a(n)y(k \Gamma n) + e(k) =sy(k) + e(k): (1) This AR model forms y(k) as a weighted sum of past values of the sequence. The single step... |

55 |
Detecting Strange Attractors in Fluid Turbulence, Dynamical Systems and Turbulence, edited by
- Takens
- 1981
(Show Context)
Citation Context ...::y(k \Gamma T )] + e(k): (2) 1 Note this model is equally applicable for both scalar and vector sequences. The use of this nonlinear autoregression can be motivated as follows. First, Takens Theorem =-=[2, 3]-=- implies that for a wide class of deterministic systems, there exists a diffeomorphisms(one-to-one differential mapping) between a finite window of the time series [y(k \Gamma 1); y(k \Gamma 2); :::y(... |

55 |
Adaptive IIR filtering
- Shynk
- 1989
(Show Context)
Citation Context ...tation for even a linear autoregression suffers from convergence to a biased closed-loop solution (i.e. ` = ` + bias, where ` corresponds to the optimal set of closed loop autoregression parameters.) =-=[26, 29]-=-. An alternative configuration which adapts the closed loop system directly might seem more prudent. Such a set up is referred to as output-error adaptation. For the linear case, the method results in... |

49 |
Capabilities of Three-Layered Perceptrons
- Irie, Miyake
- 1988
(Show Context)
Citation Context ... y(k \Gamma 2); :::y(k \Gamma T )], which models the series exactly (assuming no noise). The neural network thus forms an approximation to the ideal function g(\Delta). Furthermore, it has been shown =-=[4, 5, 6]-=- that a feedforward neural network N with an arbitrary number of neurons is capable of approximating any uniformly continuous function. These arguments provide the basic motivation for the use of neur... |

49 |
Adaptive Control, The Model Reference Approach
- Landau
- 1979
(Show Context)
Citation Context ...t up is referred to as output-error adaptation. For the linear case, the method results in a estimator that is not biased. Paradoxically, however, the linear predictor may converge to a local minimum =-=[30, 31, 32]-=-. Furthermore, the adaptation algorithms themselves becomes more complicated and less reliable due to the feedback. As a consequence, we will not consider the output-error approach with neural network... |

38 |
The LMS algorithm with delayed coefficient adaptation
- Long, Ling, et al.
- 1989
(Show Context)
Citation Context ... effect of this is to delay the actual gradient update by a few time steps. This may result in a slightly different convergence rate and misadjustment as in the analogous linear Delayed LMS algorithm =-=[41, 42]-=-. For simplicity we have assumed that the order of each synaptic filter, T , was the same in each layer. This is clearly not necessary. For the general case, let T l ij be the order of the synaptic fi... |

26 |
Temporal backpropagation r FIR neural networks
- Wan
- 1990
(Show Context)
Citation Context ...ition attest to the success and significance of networks in the field. In this paper, we focus on a method for achieving the nonlinear autoregression by use of a Finite Impulse Response (FIR) network =-=[11, 12]-=-. We start by reviewing the FIR network structure and presenting its adaptation algorithm called temporal backpropagation. We then discuss the use of the network in a prediction configuration. The res... |

19 |
Probabilistic displays
- Skilling, Robinson, et al.
- 1991
(Show Context)
Citation Context ... much smaller values due to a missinterpretation of the performance measure, nalL = 3:5.) Alternative Bayesian methods for estimating uncertainties have been suggested by MacKay [35, 36] and Skilling =-=[37]-=-. Additional predictions: The complete 10000 point series continuation was provided after the competition. Fig. 13 shows various iterated predictions starting at different locations within the series.... |

14 |
1989] “Dimensions and entropies of chaotic intensity pulsations in a single-mode far-infrared NH3 laser,” Phys
- Huebner, Abraham, et al.
(Show Context)
Citation Context ...final prediction results. 6 "Measurements were made on an 81.5-micron 14NH3 cw (FIR) laser, pumped optically by the P(13) line of an N20 laser via the vibrational aQ(8,7) NH3 transition" - U=-=. Huebner [33]-=-. 10 Chaotic intensity pulsations in a single-mode far infrared NH3 laser time 0 200 400 600 800 1000 0 50 100 150 200 250 Figure 8: 1100 time points of chaotic laser data Table 2: Normalized sum squa... |

13 |
STELLA: A Model of its
- Lewis
- 1986
(Show Context)
Citation Context ... perspective, the synaptic filter represents a Markov model of signal transmission corresponding to the processes of axonal transport, synaptic modulation, and charge dissipation in the cell membrane =-=[18, 19, 20]. w(0) w(1-=-) w(2) w(T) x(k) x(k-1) x(k-2) x(k-T) y(k) w q--q--q q -1 q -1 q -1 -1 Figure 2: FIR Filter Model: A tapped delay line shows the functional model of the Finite Impulse Response "synapse" (q ... |

13 |
Adaptive IIR filtering: Current results and open issues
- Johnson
- 1984
(Show Context)
Citation Context ...t up is referred to as output-error adaptation. For the linear case, the method results in a estimator that is not biased. Paradoxically, however, the linear predictor may converge to a local minimum =-=[30, 31, 32]-=-. Furthermore, the adaptation algorithms themselves becomes more complicated and less reliable due to the feedback. As a consequence, we will not consider the output-error approach with neural network... |

10 |
Error surfaces of recursive adaptive filters
- Stearns
- 1981
(Show Context)
Citation Context ...t up is referred to as output-error adaptation. For the linear case, the method results in a estimator that is not biased. Paradoxically, however, the linear predictor may converge to a local minimum =-=[30, 31, 32]-=-. Furthermore, the adaptation algorithms themselves becomes more complicated and less reliable due to the feedback. As a consequence, we will not consider the output-error approach with neural network... |

8 |
Temporal backpropagation: An efficient algorithm for finite impulse response neural networks
- Wan
- 1990
(Show Context)
Citation Context ...ition attest to the success and significance of networks in the field. In this paper, we focus on a method for achieving the nonlinear autoregression by use of a Finite Impulse Response (FIR) network =-=[11, 12]-=-. We start by reviewing the FIR network structure and presenting its adaptation algorithm called temporal backpropagation. We then discuss the use of the network in a prediction configuration. The res... |

7 |
The stability of adaptive minimum mean square error equalizers using delayed adjustment
- Kabal
- 1983
(Show Context)
Citation Context ... effect of this is to delay the actual gradient update by a few time steps. This may result in a slightly different convergence rate and misadjustment as in the analogous linear Delayed LMS algorithm =-=[41, 42]-=-. For simplicity we have assumed that the order of each synaptic filter, T , was the same in each layer. This is clearly not necessary. For the general case, let T l ij be the order of the synaptic fi... |

5 |
Discrete Techniques of Parameter Estimation: The Equation Error Formulation
- Mendel
- 1973
(Show Context)
Citation Context ... and desired response are provided from the known training series. The actual output of the network is not fed back as input during training. Such a scheme is referred to as equation-error adaptation =-=[25, 26]-=-. The neural network community has more recently adopted the term teacher-forcing [27]. A simple argument for adapting in this fashion is as follows: in a stationary stochastic environment, minimizing... |

4 |
Segev I (Editors
- Koch
- 1998
(Show Context)
Citation Context ... perspective, the synaptic filter represents a Markov model of signal transmission corresponding to the processes of axonal transport, synaptic modulation, and charge dissipation in the cell membrane =-=[18, 19, 20]. w(0) w(1-=-) w(2) w(T) x(k) x(k-1) x(k-2) x(k-T) y(k) w q--q--q q -1 q -1 q -1 -1 Figure 2: FIR Filter Model: A tapped delay line shows the functional model of the Finite Impulse Response "synapse" (q ... |

4 |
et al., "Backpropagation Applied to Handwritten Zip Code Recognition
- LeCun, Boser
- 1989
(Show Context)
Citation Context ... fully connected network which may attempt to analyze the scene all at once. Similar locally symmetric constraints have been motivated for use in pattern classification using "shared weight"=-= networks [23, 24]-=-. 5 x(k) x(k-1) x(k-2) y(k) x(k) x(k) x(k-1) x(k-2) x(k-6) x(k-5) x(k-4) x(k-3) x(k-2) x(k-1) x(k) (a) (b) (c) q -1 q -1 Figure 5: An FIR network with 2nd order taps for all connections is unfolded in... |

3 |
Adaptive pole-zero filtering: the equation error approach
- Gooch
- 1983
(Show Context)
Citation Context ... and desired response are provided from the known training series. The actual output of the network is not fed back as input during training. Such a scheme is referred to as equation-error adaptation =-=[25, 26]-=-. The neural network community has more recently adopted the term teacher-forcing [27]. A simple argument for adapting in this fashion is as follows: in a stationary stochastic environment, minimizing... |

3 |
Applied Optimal Control", Hemisphere Publishing Corp
- Bryson, Ho
- 1975
(Show Context)
Citation Context ... are extrapolated from rather old methods of linear difference equations. A more modern approach draws from state-space theory [39]. In the linear case the predictor corresponds to a Kalman Estimator =-=[40]-=-. Extending to neural networks yields the set of equations : x(k) = N 1 [x(k \Gamma 1); e(k \Gamma 1)] (20) y(k) = N 2 [x(k)] + e(k); (21) where x(k) corresponds to a vector of internal states which g... |

2 |
Nerve and Muscle Excitation, Third Edition, Sinauer Associates
- Junge
- 1991
(Show Context)
Citation Context ... perspective, the synaptic filter represents a Markov model of signal transmission corresponding to the processes of axonal transport, synaptic modulation, and charge dissipation in the cell membrane =-=[18, 19, 20]. w(0) w(1-=-) w(2) w(T) x(k) x(k-1) x(k-2) x(k-T) y(k) w q--q--q q -1 q -1 q -1 -1 Figure 2: FIR Filter Model: A tapped delay line shows the functional model of the Finite Impulse Response "synapse" (q ... |

2 |
Linear Systems", Prentice-Hall.,Englewood Cliffs
- Kailath
- 1980
(Show Context)
Citation Context ...ion error residuals. Both the neural network AR and ARMA models, however, are extrapolated from rather old methods of linear difference equations. A more modern approach draws from state-space theory =-=[39]-=-. In the linear case the predictor corresponds to a Kalman Estimator [40]. Extending to neural networks yields the set of equations : x(k) = N 1 [x(k \Gamma 1); e(k \Gamma 1)] (20) y(k) = N 2 [x(k)] +... |