## On Selecting Models for Nonlinear Time Series (1995)

Venue: | Physica D |

Citations: | 40 - 11 self |

### BibTeX

@ARTICLE{Judd95onselecting,

author = {Kevin Judd and Alistair Mees},

title = {On Selecting Models for Nonlinear Time Series},

journal = {Physica D},

year = {1995},

volume = {82},

pages = {426--444}

}

### Years of Citing Articles

### OpenURL

### Abstract

Constructing models from time series with nontrivial dynamics involves the problem of how to choose the best model from within a class of models, or to choose between competing classes. This paper discusses a method of building nonlinear models of possibly chaotic systems from data, while maintaining good robustness against noise. The models that are built are close to the simplest possible according to a description length criterion. The method will deliver a linear model if that has shorter description length than a nonlinear model. We show how our models can be used for prediction, smoothing and interpolation in the usual way. We also show how to apply the results to identification of chaos by detecting the presence of homoclinic orbits directly from time series. 1 The Model Selection Problem As our understanding of chaotic and other nonlinear phenomena has grown, it has become apparent that linear models are inadequate to model most dynamical processes. Nevertheless, linear models...

### Citations

10959 |
Computers and Intractability: A Guide to the Theory of NP-Completeness
- Garey, Johnson
- 1979
(Show Context)
Citation Context ...cases. The difficulty illustrated by (15) is not an isolated example. On the contrary, it appears that finding the optimal model of size k is NP-hard---related to the feasible basis extension problem =-=[18, 5]-=----although we have not established this rigorously. If this is the case, then we cannot expect to obtain the optimal solution easily. On the other hand, numerical tests with real time series suggest ... |

2321 |
Estimating the dimension of a model
- Schwarz
- 1978
(Show Context)
Citation Context ...on is related to other well-known model selection criteria, and Rissanen shows that asymptotically, our approximate expression for MDL is equivalent to the Schwarz (or Bayesian) information criterion =-=[25, 11, 14, 22]-=-. We have found, however, that working with the above form gives better results for smaller data sets, and for large ones in critical cases, and the extra computation required is not significant. Actu... |

510 |
Detecting strange attractors in turbulence
- Takens
- 1981
(Show Context)
Citation Context ...a realization of this process (y t ; x t ), t = 1; : : : ; n, find an approximation b F of F . If one is given only a scalar time-series Y t , then, by the Takens embedding theorem and its extensions =-=[24, 21]-=- one can define X t = (Y t\Gamma ; Y t\Gamma2 ; : : : ; Y t\Gammad ) for some lags? 0 and embedding dimension d. Other embedding strategies are also possible. If Y t is a multivariate time-series, the... |

395 |
Nonlinear time series: A dynamical system approach
- Tong
- 1990
(Show Context)
Citation Context ...on is related to other well-known model selection criteria, and Rissanen shows that asymptotically, our approximate expression for MDL is equivalent to the Schwarz (or Bayesian) information criterion =-=[25, 11, 14, 22]-=-. We have found, however, that working with the above form gives better results for smaller data sets, and for large ones in critical cases, and the extra computation required is not significant. Actu... |

373 |
Universal Approximation Bounds for superpositions of a sigmoidal functions
- Barron
- 1993
(Show Context)
Citation Context ...t some of our methods can be applied to purely linear models to improve upon them. 2 1.1 Weak and strong approximations We discuss the functional approximation problem only briefly. Barron and others =-=[3]-=- have shown that certain methods of function approximation are considerably more powerful than others in high dimensions, in the sense that the optimal approximation error grows far more slowly with d... |

261 |
Orthogonal least squares learning algorithm for radial basis function networks
- Chen, Cowan, et al.
- 1991
(Show Context)
Citation Context ... approach. In a later paper we hope to go into fuller detail on this. 3.3 Comparison of model selection algorithms Another contender as a model selection method is the orthogonal least squares method =-=[4, 14]-=-, which is the following iterative procedure. 1. Let k = 0, V 0 = V and e 0 = y. 2. At the kth step find the column v i k of V k that maximizes e ? k v i =jv i j. That is, find the column of V k that ... |

128 |
Mathematical thought from ancient to modern times
- Kline
- 1990
(Show Context)
Citation Context ...imum description length [20] to characterize model quality. Rissanen's argument is that we should regard good models as those that compress the data best: this is, of course, a form of Ockham's Razor =-=[10]-=-. It turns out to be a very powerful idea, with applications in many areas of data analysis including our present problem. To measure data compression, we envisage encoding the data for optimal transm... |

45 |
Stochastic Complexity in Statistical Inquiry, volume 15
- Rissanen
- 1989
(Show Context)
Citation Context ...at can be used for selection and also, in principle, for model comparison, derives from trying to find a correct statement of our ill-posed problem. It is Rissanen's use of minimum description length =-=[20]-=- to characterize model quality. Rissanen's argument is that we should regard good models as those that compress the data best: this is, of course, a form of Ockham's Razor [10]. It turns out to be a v... |

24 |
E.: Singular-value decomposition and the GrassbergerProcaccia algorithm, Phys
- Albano, Muench, et al.
- 1988
(Show Context)
Citation Context ...y T = 0:02 time units, to which we add white noise at a signal-to-noise ratio of 2%. We choose the lag to bes= 5 which is a little over one quarter of the average inter-peak period of the time series =-=[2]-=-. The method of false nearest neighbors [1] suggests that for this lag a three dimensional embedding is sufficient. We obtained an optimal model b F for F for a prediction ofssteps ahead using a radia... |

23 |
Local and global behavior near homoclinic orbits
- Glendinning, Sparrow
- 1984
(Show Context)
Citation Context ...future report by the authors where similar conclusions are drawn from experimental data. First we give a brief description of the Shil'nikov bifurcation; for details the reader should refer elsewhere =-=[7, 17]-=-. Suppose a system of differential equationssx = f(x; ), x 2 R 3 ,s2 R, with f analytic in x and , has a fixed pointsxswith one real eigenvalues? 0 and a complex conjugate pair oes\Sigma i!s, and that... |

22 |
Radial basis function networks for classifying process faults
- Leonard, Kramer
- 1991
(Show Context)
Citation Context ...ere are too many data points then a randomly selected subset of these is acceptable; the latter, in effect, is choosing centers by the "natural" measure of a supposed attractor. The method o=-=f k-means [12, 15]-=- is more sophisticated; it tries to divide the data points into k-clusters and uses the centroids of these clusters as the centers. A common property of all but the first method is that the centers ch... |

21 | Structurally stable systems are not dense - Smale |

18 |
Optimization Under Constraints
- Whittle
- 1971
(Show Context)
Citation Context ...tivity analysis to see the effect of changing the size of B. The constraint N () = k becomessj = u j ; j 62 B; (13) where u = 0 but is kept as a parameter. It is an easy exercise in Lagrangian theory =-=[26]-=- to discover the sensitivity of the optimal solution /(y; u) of (12) to changes in u. First we consider how to enlarge the set of basic variables so as to give the greatest benefit to the mean square ... |

12 |
Dynamical systems and Tessellation’s: Detecting determinism in data
- Mees
- 1991
(Show Context)
Citation Context ...tion methods, and others weak approximation methods. The strong methods include neural nets, radial basis functions, wavelets and piecewise linear approximations such as tesselation and triangulation =-=[13]-=-. Weak methods include linear approximations, global polynomials, and Fourier transforms. Because we usually require higher dimensionality in our models, we are mainly interested in strong models, tho... |

11 |
Local false nearest neighbors and dynamical dimensions from observed chaotic data, Phys
- Abarbanel, Kennel
- 1993
(Show Context)
Citation Context ...e noise at a signal-to-noise ratio of 2%. We choose the lag to bes= 5 which is a little over one quarter of the average inter-peak period of the time series [2]. The method of false nearest neighbors =-=[1]-=- suggests that for this lag a three dimensional embedding is sufficient. We obtained an optimal model b F for F for a prediction ofssteps ahead using a radial basis model formed from Gaussian radial b... |

10 |
Data Transformation and Self-Exciting Threshold Autoregression
- Ghaddar, Tong
- 1981
(Show Context)
Citation Context ... Ghaddar and Tong have considered the problem of building a model using annual sunspot numbers over the period 1700--1979, then using this model to make free-run predictions for the period 1980--1987 =-=[6, 25]-=-. Tong compares the predictions of optimal auto-regressive model AR(9) and an optimal threshold auto-regressive model SETAR(2; 3; 11) [25], which is also a pseudo-linear model. To facilitate compariso... |

8 |
The Takens Embedding Theorem
- Noakes
- 1991
(Show Context)
Citation Context ...rem of Takens which asserts that if a times series generated by dynamical system is embedded suitably, then one obtains a system that retains much of the original system's local and global properties =-=[24, 19]-=-. Since an optimal model b F should only extract the dynamical, deterministic component of a time series, it might also have implicit in it properties of the original system that are invariant under T... |

6 | Persistance of the Dow Jones index on rising volume
- LeBaron
- 1992
(Show Context)
Citation Context ...on is related to other well-known model selection criteria, and Rissanen shows that asymptotically, our approximate expression for MDL is equivalent to the Schwarz (or Bayesian) information criterion =-=[25, 11, 14, 22]-=-. We have found, however, that working with the above form gives better results for smaller data sets, and for large ones in critical cases, and the extra computation required is not significant. Actu... |

5 |
Some tools for analyzing chaos
- Mees, Sparrow
- 1987
(Show Context)
Citation Context ...future report by the authors where similar conclusions are drawn from experimental data. First we give a brief description of the Shil'nikov bifurcation; for details the reader should refer elsewhere =-=[7, 17]-=-. Suppose a system of differential equationssx = f(x; ), x 2 R 3 ,s2 R, with f analytic in x and , has a fixed pointsxswith one real eigenvalues? 0 and a complex conjugate pair oes\Sigma i!s, and that... |

4 |
A fundamental problem in linear inequalities with applications to the traveling salesman problem
- Murty
- 1972
(Show Context)
Citation Context ...cases. The difficulty illustrated by (15) is not an isolated example. On the contrary, it appears that finding the optimal model of size k is NP-hard---related to the feasible basis extension problem =-=[18, 5]-=----although we have not established this rigorously. If this is the case, then we cannot expect to obtain the optimal solution easily. On the other hand, numerical tests with real time series suggest ... |

3 |
Reconstructing the dynamics of Chua’s circuit
- Glover, Mees
- 1993
(Show Context)
Citation Context ... transformation. We might then analyze the dynamical system defined by b F as we would a map derived from theoretical analysis by, for example, finding fixed points, eigenvalues and homoclinic orbits =-=[8]-=-. There are important technical issues that need to be addressed regarding just what remains invariant under a Takens transformation, but these are outside the scope of this paper. The aim here is to ... |

3 |
Parsimonious dynamical reconstruction
- Mees
- 1993
(Show Context)
Citation Context |

2 |
Reconstructing chaotic systems in the presence of noise
- Mees
- 1994
(Show Context)
Citation Context ...ere are too many data points then a randomly selected subset of these is acceptable; the latter, in effect, is choosing centers by the "natural" measure of a supposed attractor. The method o=-=f k-means [12, 15]-=- is more sophisticated; it tries to divide the data points into k-clusters and uses the centroids of these clusters as the centers. A common property of all but the first method is that the centers ch... |

1 |
Estimation and reconstruction in noisy chaotic systems
- Mees, Smith
- 1993
(Show Context)
Citation Context ...gnores this subtlety, but we have found that a test of normality of the residues often fails due to excessive numbers of outliers. One can choose to ignore this too, or try more sophisticated methods =-=[16]-=-, or take a robust statistics approach and use an L 1 -norm. The latter alternative requires some additional development in applying the MDL criterion. As we pointed out earlier, we have not included ... |