## Methods and techniques of complex systems science: An overview (2003)

Citations: | 11 - 0 self |

### BibTeX

@MISC{Shalizi03methodsand,

author = {Cosma Rohilla Shalizi},

title = {Methods and techniques of complex systems science: An overview},

year = {2003}

}

### OpenURL

### Abstract

In this chapter, I review the main methods and techniques of complex systems science. As a

### Citations

8632 | Elements of Information Theory - Cover, Thomas - 1991 |

6140 | A mathematical theory of communication - Shannon - 2001 |

5339 |
Design Patterns: Elements of Reusable Object-Oriented Software
- Gamma, Helm, et al.
- 1995
(Show Context)
Citation Context ...n by explaining what I feel does not fall within my charge, as indicated by Figure 1. At the top of Figure 1 I have put “patterns”. By this I mean more or less what people in software engineering do (=-=Gamma et al., 1995-=-): a pattern is a recurring theme in the analysis of many different systems, a cross-systemic regularity. For instance: bacterial chemotaxis can be thought of as a way of resolving the tension between... |

3953 |
Classification and Regression Trees
- Breiman, Friedman, et al.
- 1984
(Show Context)
Citation Context ...hanics (Engel and Van den Broeck, 2001). It is notoriously hard to understand why they make the predictions they do. • Classification and regression trees (CART), introduced in the book of that name (=-=Breiman et al., 1984-=-), recursively sub-divide the input space, rather like the game of “twenty questions” (“Is the temperature above 20 centigrade? If so, is the glucose concentration above one millimole?”, etc.); each q... |

2351 |
Estimating the Dimension of a Model
- Schwarz
- 1978
(Show Context)
Citation Context ... to the number of knobs, or, more formally, the number of parameters. These include the Akaike information criterion or AIC (Akaike, 1973) and the Bayesian information criterion or BIC (Akaike, 1977; =-=Schwarz, 1978-=-). Other methods penalized the “roughness” of a model, i.e., some measure of how much the prediction shifts with a small change in either the input or the parameters (van de Geer, 2000, ch. 10). A smo... |

2275 |
Equations of state calculations by fast computing machines
- Metropolis, Rosenbluth, et al.
- 1953
(Show Context)
Citation Context ...ints according to the actual probability distribution. This can sometimes be done directly, especially if p(x) is of a particularly nice form. A very general and clever indirect scheme is as follows (=-=Metropolis et al., 1953-=-). We want a whole sequence of points, x1, x2, . . .xn. We pick the first one however we like, and after that we pick successive points according to some Markov chain: that is, the distribution of xi+... |

2159 |
E.: A new approach to linear filtering and prediction problems
- Kalman
- 1960
(Show Context)
Citation Context ... by two of the “grandfathers” of complex systems science, Norbert Wiener and A. N. Kolmogorov, during the Second World War (Kolmogorov, 1941; Wiener, 1949). In the 1960s, Kalman and Bucy (Bucy, 1994; =-=Kalman, 1960-=-; Kalman and Bucy, 1961) solved the problem of optimal recursive filtering, assuming linear dynamics, linear observations and additive noise. In the resulting Kalman filter, the new estimate of the st... |

1887 |
Numerical recipes in C : the art of scientific computing
- Press, Teukolsky, et al.
- 1992
(Show Context)
Citation Context .... noises (West and Deering, 1995, ch. 3). The easiest way to estimate the power spectrum is simply to take the Fourier transform of the time series, using, e.g., the fast Fourier transform algorithm (=-=Press et al., 1992-=-). Equivalently, one might calculate the autocovariance and Fourier transform that. Either way, one has an estimate of the spectrum which is called the periodogram. It is unbiased, in that the expecte... |

1839 | The evolution of cooperation - Axelrod - 1984 |

1706 | A theory of the learnable
- Valiant
- 1984
(Show Context)
Citation Context ...be able to find some function δ such that �� � � �� Pr L(θ) ˆ � − E [L(θ)] � > ǫ ≤ δ(N, ǫ, θ) , (4) with limN δ(N, ǫ, θ) = 0. Then, for any particular θ, we could give probably approximately correct (=-=Valiant, 1984-=-) guarantees, and say that, e.g., to have a 95% confidence that the true error is within 0.001 of the generalization error, requires at least 144,000 samples (or whatever the precise numbers may be). ... |

1694 | An Introduction to Kolmogorov Complexity and its Applications
- Li, Vitanyi
- 1993
(Show Context)
Citation Context ...tributed (IID), then K(xn 1)/|x| → 1 almost surely; IID sequences are incompressible. If x is a realization of a stationary (but not necessarily IID) random process ¯ X, then (Badii and Politi, 1997; =-=Li and Vitanyi, 1993-=-) lim n→∞ E � � n K(X1 ) = h( n ¯ X) , (53) the entropy rate (§VII) of ¯ X. Thus, random data has high complexity, and the complexity of a random process grows at a rate which just measures its unpred... |

1566 |
An Introduction to Support Vector Machines (and Other Kernel-Based Learning Methods
- Cristianini, Shawe-Taylor
- 2000
(Show Context)
Citation Context ...trick works because the VC dimension of linear methods is low, even in high-dimensional spaces. Kernel methods come in many flavors, of which the most popular, currently, are support vector machines (=-=Cristianini and Shawe-Taylor, 2000-=-). 1. Predictive versus Causal Models Predictive and descriptive models both are not necessarily causal. PAC-type results give us reliable prediction, assuming future data will come from the same dist... |

1488 |
An Introduction to Genetic Algorithms
- Mitchell
- 1996
(Show Context)
Citation Context ...networks (Barabasi, this volume), turbulence (Frisch, 1995), physio-chemical pattern formation and biological morphogenesis (Ball, 1999; Cross and Hohenberg, 1993), genetic algorithms (Holland, 1992; =-=Mitchell, 1996-=-), evolutionary dynamics (Gintis, 2000; Hofbauer and Sigmund, 1988), spin glasses (Fischer and Hertz, 1988; Stein, 2003), neuronal networks (see Part III, 4, this book), the immune system (see Part II... |

1284 |
Spline Models for Observational Data
- Wahba
- 1990
(Show Context)
Citation Context ... Ripley [1996] are good for this.) Instead, I will merely name a few. • Splines are piecewise polynomials, good for regression on bounded domains; there is a very elegant theory for their estimation (=-=Wahba, 1990-=-). • Neural networks or multilayer perceptrons have a devoted following, both for regression and classification (Ripley, 1996). The application of VC theory to them is quite well-advanced (Anthony and... |

1256 |
Information theory as an extension of the maximum likelihood principle
- Akaike
- 1973
(Show Context)
Citation Context ...There are thus many regularization methods which add a penalty proportional to the number of knobs, or, more formally, the number of parameters. These include the Akaike information criterion or AIC (=-=Akaike, 1973-=-) and the Bayesian information criterion or BIC (Akaike, 1977; Schwarz, 1978). Other methods penalized the “roughness” of a model, i.e., some measure of how much the prediction shifts with a small cha... |

1174 |
Modeling by shortest data description
- Rissanen
- 1978
(Show Context)
Citation Context ...o simple models, we have the desired kind of trade-off, where we can reduce the part of the data which looks like noise only by using a more elaborate model. The minimum description length principle (=-=Rissanen, 1978-=-, 1989) enjoins us to pick the model which minimizes the description length, and the stochastic complexity of the data is that minimized description-length: θMDL = argmin θ C(x, θ, Θ) (55) CSC(x, Θ) =... |

1170 | Information Theory and Statistics
- Kullback
- 1968
(Show Context)
Citation Context ... since D(P||Q) ≥ 0, and D(P�Q) = 0 implies the two distributions are equal almost everywhere. The divergence can be interpreted either in terms of codes (see below), or in terms of statistical tests (=-=Kullback, 1968-=-). Roughly speaking, given n samples drawn from the distribution P, the probability of our accepting the false hypothesis that the distribution is Q is proportional to e −nD(P�Q) . The mutual informat... |

1132 |
Causality: Models, Reasoning, and Inference
- Pearl
(Show Context)
Citation Context ...se difficulties, the subject of causal inference from data is currently a very active area of research, and many methods have been proposed, generally under assumptions about the absence of feedback (=-=Pearl, 2000-=-; Shafer, 1996; Spirtes et al., 2001). When we have a causal or generative model, we can use very well-established techniques to infer the values of the hidden or latent variables in the model from th... |

1129 | The visual display of quantitative information - Tufte |

1127 |
Pattern Recognition and Neural Networks
- Ripley
- 1996
(Show Context)
Citation Context ...on on bounded domains; there is a very elegant theory for their estimation (Wahba, 1990). • Neural networks or multilayer perceptrons have a devoted following, both for regression and classification (=-=Ripley, 1996-=-). The application of VC theory to them is quite well-advanced (Anthony and Bartlett, 1999; Zapranis and Refenes, 1999), but there are many other approaches, including ones based on statistical mechan... |

899 | Exploratory data analysis - Tukey - 1977 |

871 |
Introduction to the modern theory of dynamical systems
- Katok, Hasselblatt
- 1995
(Show Context)
Citation Context ...ystems which are not designed to function as communications devices; the concepts involved require only well-defined probability distributions. For instance, in nonlinear dynamics (Billingsley, 1965; =-=Katok and Hasselblatt, 1995-=-) information-theoretic notions are very important in characterizing different kinds of dynamical system (see also §III.F). Even more closely tied to complex systems science is the literature on “phys... |

732 |
The principles of quantum mechanics
- Dirac
- 1958
(Show Context)
Citation Context ...more precisely Markovian: all the information needed to determine the future is contained in the present state xt, and earlier states are irrelevant. (This is basically how physicists define “state” (=-=Dirac, 1935-=-).) Indeed, it is often reasonable to assume that F is independent of time, so that the dynamics are autonomous (in the terminology of dynamics) or homogeneous (in that of statistics). If we could loo... |

666 |
Statistics for Long-Memory Processes
- Beran
- 1994
(Show Context)
Citation Context ...related over very long times. These can still be accommodated within the ARIMA framework, formally, by introducing the idea of fractional differencing, or, in continuous time, fractional derivatives (=-=Beran, 1994-=-; West and Deering, 1995). Often long-memory processes are self-similar, which can simplify their statistical estimation (Embrechts and Maejima, 2002). Volatility. All ARMA and even ARIMA models assum... |

659 |
The Chemical basis of morphogenesis
- Turing
- 1952
(Show Context)
Citation Context ...hrough hierarchically structured interactions” (Simon, 1962), “positive feedback leading to highly skewed outcomes” (Simon, 1955), “local inhibition and long-rate activation create spatial patterns” (=-=Turing, 1952-=-), and so forth. At the bottom of the quadrangle is “foundations”, meaning attempts to build a basic, mathematical science concerned with such topics as the measurement of complexity (Badii and Politi... |

638 |
Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology
- Holland
- 1992
(Show Context)
Citation Context .... Here we find networks (Barabasi, this volume), turbulence (Frisch, 1995), physio-chemical pattern formation and biological morphogenesis (Ball, 1999; Cross and Hohenberg, 1993), genetic algorithms (=-=Holland, 1992-=-; Mitchell, 1996), evolutionary dynamics (Gintis, 2000; Hofbauer and Sigmund, 1988), spin glasses (Fischer and Hertz, 1988; Stein, 2003), neuronal networks (see Part III, 4, this book), the immune sys... |

625 | Handbook of Stochastic Methods - Gardiner - 1985 |

596 | An Introduction to Computational Learning Theory - Kearns, Vazirani - 1994 |

563 |
Statistical Language Learning
- Charniak
- 1993
(Show Context)
Citation Context ...ften easier to determine these properties from a system’s grammar than from direct examination of sequence statistics, especially since specialized techniques are available for grammatical inference (=-=Charniak, 1993-=-; Manning and Schütze, 1999). 1. Hidden Markov Models The most important special case of this general picture is that of regular languages. These, we said, are generated by machines with only a finite... |

556 | An introduction to symbolic dynamics and coding - Lind, Marcus - 1995 |

447 | Nonlinear Time Series Analysis - Kantz, Schreiber - 2004 |

433 | R.: Growing Artificial Societies: Social Science from the Bottom Up - Epstein, Axtell - 1996 |

418 |
Monte Carlo Methods
- 18Hammersley, Handscomb
- 1964
(Show Context)
Citation Context ...ity (Vidyasagar, 1997), which tells us that, for all n, Pr(|x/n−p| ≥ ǫ) < 2e−2nǫ2. 25spicking values of x uniformly and averaging the resulting values of f(x) always has a smaller standard deviation (=-=Hammersley and Handscomb, 1964-=-, chapter 5). This example, while time-honored and visually clear, does not show Monte Carlo to its best advantage; there are few one-dimensional integrals which cannot be done better by ordinary, non... |

406 | A formal theory of inductive inference - Solomonoff - 1964 |

388 |
Extrapolation, interpolation, and smoothing of stationary time series
- Wiener
- 1949
(Show Context)
Citation Context ... filters for stationary processes was solved independently by two of the “grandfathers” of complex systems science, Norbert Wiener and A. N. Kolmogorov, during the Second World War (Kolmogorov, 1941; =-=Wiener, 1949-=-). In the 1960s, Kalman and Bucy (Bucy, 1994; Kalman, 1960; Kalman and Bucy, 1961) solved the problem of optimal recursive filtering, assuming linear dynamics, linear observations and additive noise. ... |

349 |
Reasonning about Rational Agent
- Wooldridge
- 2000
(Show Context)
Citation Context ...: see Lerman 2000). But a set of software agents running the Michigan power grid isn’t a model of anything, it’s doing something. 23sFinally, multi-agent systems (Ossowski, 2000) and rational agents (=-=Wooldridge, 2000-=-) in artificial intelligence are not ABMs. The interest of this work is in understanding, and especially designing, systems capable of sophisticated, autonomous cognitive behavior; many people in this... |

333 | Three models for the description of languages - Chomsky - 1956 |

321 |
Markov Chains. Gibbs fields, Monte Carlo simulation
- BREMAUD
- 1999
(Show Context)
Citation Context ...e is a fixed interaction graph, the agents form a Markov random field on that graph. There are now very powerful and computationally efficient methods for evaluating many properties of Markov chains (=-=Brémaud, 1999-=-; Honerkamp, 1994), Markov random fields (Beckerman, 1997), and (closely related) graphical models (Jordan, 1998) without simulation. The recent books of Peyton Young (1998) and Sutton (1998) provide ... |

315 |
Neural Network Learning: Theoretical Foundations,’’ Cambridge Univ
- Anthony, Bartlett
- 1999
(Show Context)
Citation Context ...Wahba, 1990). • Neural networks or multilayer perceptrons have a devoted following, both for regression and classification (Ripley, 1996). The application of VC theory to them is quite well-advanced (=-=Anthony and Bartlett, 1999-=-; Zapranis and Refenes, 1999), but there are many other approaches, including ones based on statistical mechanics (Engel and Van den Broeck, 2001). It is notoriously hard to understand why they make t... |

314 |
The Computational Brain
- Churchland, Sejnowski
- 1992
(Show Context)
Citation Context ...al. (2002) is perhaps the most notorious; see Khemelev and Teahan (2002) and especially Goodman (2001). 22 It is certainly legitimate to regard any dynamical process as also a computational process, (=-=Churchland and Sejnowski, 1992-=-; Giunti, 1997; Margolus, 1999; Shalizi and Crutchfield, 2001), so one could argue that the data is produced by some kind of program. But even so, this computational process generally doesn’t resemble... |

278 | Individual Strategy and Social Structure: An Evolutionary Theory of Institutions - Young |

249 |
On a class of skew distribution functions
- SIMON
- 1955
(Show Context)
Citation Context ...ations. There are many other such patterns in complex systems science: “stability through hierarchically structured interactions” (Simon, 1962), “positive feedback leading to highly skewed outcomes” (=-=Simon, 1955-=-), “local inhibition and long-rate activation create spatial patterns” (Turing, 1952), and so forth. At the bottom of the quadrangle is “foundations”, meaning attempts to build a basic, mathematical s... |

248 |
Ergodic Theory and Information
- Billingsley
- 1978
(Show Context)
Citation Context ...rmation theory to systems which are not designed to function as communications devices; the concepts involved require only well-defined probability distributions. For instance, in nonlinear dynamics (=-=Billingsley, 1965-=-; Katok and Hasselblatt, 1995) information-theoretic notions are very important in characterizing different kinds of dynamical system (see also §III.F). Even more closely tied to complex systems scien... |

238 |
Self-Organized Criticality. An explanation of 1/f Noise
- Bak, Tang, et al.
- 1987
(Show Context)
Citation Context ...ower laws alone thus says nothing about complexity (except in thermodynamic equilibrium!), and certainly is not a reliable sign of some specific favored mechanism, such as self-organized criticality (=-=Bak et al., 1987-=-; Jensen, 1998) or highly-optimized tolerance (Carlson and Doyle, 1999, 2000; Newman et al., 2002). E. Other Measures of Complexity Considerations of space preclude an adequate discussion of further c... |

226 |
Independent coordinates for strange attractors from mutual information
- Fraser, Swinney
(Show Context)
Citation Context ...e lag to the autocorrelation time (see above), or the first minimum of the mutual information function (see §VII below), the notion being that this most nearly achieves a genuinely “new” measurement (=-=Fraser and Swinney, 1986-=-). There is some evidence that the mutual information method works better (Cellucci et al., 2003). Again, while in principle almost any smooth observation function will do, given enough data, in pract... |

218 |
Turbulence: The Legacy of A
- Frisch
- 1995
(Show Context)
Citation Context ... particular systems, natural, artificial and fictional, which complex systems science has traditionally and habitually sought to understand. Here we find networks (Barabasi, this volume), turbulence (=-=Frisch, 1995-=-), physio-chemical pattern formation and biological morphogenesis (Ball, 1999; Cross and Hohenberg, 1993), genetic algorithms (Holland, 1992; Mitchell, 1996), evolutionary dynamics (Gintis, 2000; Hofb... |

202 |
Scientific Explanation and the Causal Structure of the World
- Salmon
- 1984
(Show Context)
Citation Context ...ers and the central limit theorem, pass all statistical tests for randomness, etc. In fact, this possibility, of defining “random” as “incompressible”, is what originally motivated Kolmogorov’s work (=-=Salmon, 1984-=-, chapter 3). 19 The issue of what language to write the program in is secondary; writing a program to convert from one language to another just adds on a constant to the length of the over-all progra... |

196 | Optimal Control and Estimation - Stengel - 1986 |

193 | the ABC Research Group - Gigerenzer, Todd - 1999 |

191 |
Turtles, termites, and traffic jams: Explorations in massively parallel microworlds
- Resnick
- 1994
(Show Context)
Citation Context ... successor, NetLogo (ccl.sesp.northwestern.edu/netlogo), are extensions of the popular Logo language to handle multiple interacting “turtles”, i.e., agents. Like Logo, children can learn to use them (=-=Resnick, 1994-=-), but they are fairly easy for adults, too, and certainly give a feel for working with ABMs. B. Three Things Which Are Not Agent-Based Models Not everything which involves the word “agent” is connect... |