## Computational mechanics: Pattern and prediction, structure and simplicity (1999)

### Cached

### Download Links

- [www.cscs.umich.edu]
- [bactra.org]
- [cscs.umich.edu]
- [vserver1.cscs.lsa.umich.edu]
- DBLP

### Other Repositories/Bibliography

Venue: | Journal of Statistical Physics |

Citations: | 46 - 8 self |

### BibTeX

@ARTICLE{Shalizi99computationalmechanics:,

author = {Cosma Rohilla Shalizi and James P. Crutchfield},

title = {Computational mechanics: Pattern and prediction, structure and simplicity},

journal = {Journal of Statistical Physics},

year = {1999},

volume = {104},

pages = {2001}

}

### Years of Citing Articles

### OpenURL

### Abstract

Computational mechanics, an approach to structural complexity, defines a process’s causal states and gives a procedure for finding them. We show that the causal-state representation—an E-machine—is the minimal one consistent with

### Citations

9138 |
Elements of information theory
- Cover, Thomas
- 1991
(Show Context)
Citation Context ...ipulate. (The special case in which H[S F ] is finite is dealt with in Appendix G.) Normally, we evade this by considering H[SF L ], the uncertainty of the next L symbols, treated as a function of L. =-=(57, 63)-=- On occasion, we will refer to the entropy per symbol or entropy rate: and the conditional entropy rate, h[S F ] — lim LQ. h[S F |X]— lim LQ. 1 L H[SF L ], (9) 1 L H[SF L | X], (10) where X is some ra... |

6983 |
The Mathematical Theory of Communication
- SHANNON, WEAVER
- 1949
(Show Context)
Citation Context ...me foundational issues, randomness is actually rather well understood and well handled by classical tools introduced by Boltzmann; (55) Fisher, Neyman, and Pearson; (56) Kolmogorov; (37) and Shannon, =-=(57)-=- among others. One tradition in the study of complexity in fact identifies complexity with randomness and, as we have just seen, this is useful for some purposes. As these purposes are not those of an... |

4099 |
Introduction to Automata Theory, Languages and Computation, 2nd ed
- Hopcroft, Motwani, et al.
- 2000
(Show Context)
Citation Context ...ain as promised, it follows that every entry in the transition matrix T (s) ij =0, when Si=E(sR). Thus, the labeled transition probabilities have the promised form. QED. Remark 1. In automata theory, =-=(67, 68)-=- a set of states and transitions is said to be deterministic if the current state and the next input—here, the next symbol from the original stochastic process—together fix the next state. This use of... |

3440 | Convergence of Probability Measures - Billingsley - 1968 |

1777 | An introduction to Kolmogorov complexity and its applications
- Li, Vitányi
- 1997
(Show Context)
Citation Context ...esentation—the UTM. There are many well-known difficulties with applying Kolmogorov complexity to natural processes. First, as a quantity, it is uncomputable in general, owing to the halting problem. =-=(40)-=- Second, it is maximal for random sequences;this can be construed either as desirable, as just noted, or as a failure to capture structure, depending on one’s aims. Third, it only applies to a single ... |

1745 | A theory of the learnable
- Valiant
- 1984
(Show Context)
Citation Context ...ept’’. The latter is typically defined to be a binary dichotomy of a certain feature or input space. Particular attention is paid to results about ‘‘probably approximately correct’’ (PAC) procedures: =-=(118)-=- those having a high probability of finding members of a fixed ‘‘representation class’’ (e.g., neural nets, Boolean functions in disjunctive normal form, or deterministic finite automata). The key wor... |

1286 | Information Theory and Statistics
- Kullback
- 1968
(Show Context)
Citation Context ...real distribution of observables as nQ.. The discrepancy between prediction and reality is, moreover, defined information theoretically, in terms of the relative entropy or Kullback–Leibler distance. =-=(63, 73)-=- (We have not used this quantity.) The approach implements Weiss’s discovery that for finite-state sources there is a structural distinction between block-Markovian sources (subshifts of finite type) ... |

1247 |
Causality: models, reasoning, and inference
- Pearl
(Show Context)
Citation Context ...position reminds one of the important role that conditional independence plays in contemporary methods for artificial intelligence, both for developing systems that reason in fluctuating environments =-=(121)-=- and the more recently developed algorith(122, 123) mic methods of graphical models. 6. Description-Length Principles and Universal Coding Theory Rissanen’s minimum description length (MDL) principle,... |

1210 | A universal algorithm for sequential data compression
- Ziv, Lempel
- 1977
(Show Context)
Citation Context ...es is somewhat similar to the states estimated in Rissanen’s context algorithm (48, 125, 126) (and to the ‘‘vocabularies’’ built by universal coding schemes, such as the popular Lempel– Ziv algorithm =-=(127, 128)-=- ). Despite the similarities, there are significant differences. For a random source—for which there is a single causal state—the context algorithm estimates a number of states that diverges (at least... |

615 |
An Introduction to Computational Learning Theory
- Kearns, Vazirani
- 1994
(Show Context)
Citation Context ...achines should form a useful area of future research, a point to which we alluded in the concluding remarks. 5. Computational and Statistical Learning Theory The goal of computational learning theory =-=(116, 117)-=- is to identify algorithms that quickly, reliably, and simply lead to good representations of a target ‘‘concept’’. The latter is typically defined to be a binary dichotomy of a certain feature or inp... |

557 |
Three approaches to the quantitative definition of information
- KOLMOGOROV
- 1965
(Show Context)
Citation Context ...words. Ignoring some foundational issues, randomness is actually rather well understood and well handled by classical tools introduced by Boltzmann; (55) Fisher, Neyman, and Pearson; (56) Kolmogorov; =-=(37)-=- and Shannon, (57) among others. One tradition in the study of complexity in fact identifies complexity with randomness and, as we have just seen, this is useful for some purposes. As these purposes a... |

533 |
Stochastic Complexity
- Rissanen
- 1989
(Show Context)
Citation Context ... 1 , E [ H[R 1 ]. (47) Proof. This follows directly from Theorem 29, since H[R 1 ] \ C m. QED. Lemma 8 (Conditioning Does Not Affect Entropy Rate). For all prescient rivalsR 1 , h[S F ]=h[S F |R 1 ], =-=(48)-=- where the entropy rate h[SF ] and the conditional entropy rate h[SF |R1 ] were defined in Eq. (9) and Eq. (10), respectively. Proof. From Theorem 5 (Eq. (44)) and its Corollary 3 (Eq. (47)), we have ... |

424 |
A formal theory of inductive inference
- Solomonoff
- 1964
(Show Context)
Citation Context ...searchers from attempting to use Kolmogorov–Chaitin complexity for practical tasks—such as measuring the complexity of natural objects (e.g., ref. 44), as a basis for theories of inductive inference, =-=(45, 46)-=- and generally as a means of capturing patterns. (47)sComputational Mechanics 823 As Rissanen (ref. 48, p. 49) says, this is akin to ‘‘learn[ing] the properties [of a data set] by writing programs in ... |

390 | On a theory of computation and complexity over the real numbers: NP-completeness, recursive functions and universal machines - Blum, Shub, et al. - 1989 |

367 |
Three models for the description of language
- Chomsky
- 1956
(Show Context)
Citation Context ...nal Mechanics 869 correspond to such automata—generative in the simple case or classificatory, if we add a reject state and move to it when none of the allowed symbols are encountered. Since Chomsky, =-=(112, 113)-=- it has been known that formal languages can be classified into a hierarchy, the higher levels of which have strictly greater expressive power. The hierarchy is defined by restricting the form of the ... |

342 |
The definition of random sequences
- Martin-Löf
- 1966
(Show Context)
Citation Context ... − ,R 1 ). Therefore, Eq. (A14) implies And so, we have that H[S F 1 |S − ,S] \ H[S F 1 |R 1 − ,R 1 ]. (41) H[SF 1 − |S ,S]−H[SF 1 |R1 − ,R1 ] \ 0 H[R1 − |R1 − ]−H[S |S] \ 0 H[R1 − |R1 − ] \ H[S |S]. =-=(42)-=- QED. Remark. What this theorem says is that there is no more uncertainty in transitions between causal states, than there is in the transitions between any other kind of prescient effective states. I... |

241 | On the length of programs for computing finite binary sequences: statistical considerations
- Chaitin
- 1969
(Show Context)
Citation Context ...1 |R 1 ]=H[S F 1 |S]. Now we apply the chain rule again, H[R1 − ,SF 1 |R1 ]=H[SF 1 |R1 ]+H[R1 − |SF 1 ,R1 ] (34) \ H[SF 1 |R1 ] (35) =H[SF 1 |S] (36) =H[S − ,SF 1 |S] (37) =H[S − |S]+H[SF 1 − |S ,S]. =-=(38)-=- In going from Eq. (36) to Eq. (37) we have used Eq. (33), and in the last step we have used the chain rule once more.s848 Shalizi and Crutchfield Using the chain rule one last time, we have H[R 1 − ,... |

240 |
On the Complexity of Finite Sequences
- Lempel, Ziv
- 1976
(Show Context)
Citation Context ...es is somewhat similar to the states estimated in Rissanen’s context algorithm (48, 125, 126) (and to the ‘‘vocabularies’’ built by universal coding schemes, such as the popular Lempel– Ziv algorithm =-=(127, 128)-=- ). Despite the similarities, there are significant differences. For a random source—for which there is a single causal state—the context algorithm estimates a number of states that diverges (at least... |

235 |
The Volterra and Wiener Theories of Nonlinear Systems
- Schetzen
- 1980
(Show Context)
Citation Context ...er, the Wiener-Kolmogorov framework forces us to sharply separate the linear and nonlinear aspects of prediction and filtering, because it has a great deal of trouble calculating nonlinear operators. =-=(109, 110)-=- Computational mechanics is completely indifferent to this issue, since it packs all of the process’s structure into the E-machine, which is equally calculable in linear or strongly nonlinear situatio... |

229 |
Scientific Explanation and the Causal Structure of the World
- Salmon
- 1984
(Show Context)
Citation Context ...y, H[S R 1 ]=H[S F 1 ]=H[S]. Shifting the entropy rate h[S F ] to the LHS of Eq. (58) and appealing to time-translation once again, we have H[S] − h[SF ]= lim I[S LQ. R L−1 ;SR 1 ] (59) =I[SR ;SF 1 ] =-=(60)-=- =H[SF 1 ]−H[SF 1 |SR ] (61) =H[SF 1 ]−H[SF 1 |S] (62) =I[SF 1 ;S] (63) [ H[S]=Cm, (64) where the last inequality comes from Eq. (A9). QED. Remark 1. The Control Theorem is inspired by, and is a versi... |

213 | Ergodic and Information Theory
- Gray, Davisson, et al.
- 1977
(Show Context)
Citation Context ... h[S F ] to the LHS of Eq. (58) and appealing to time-translation once again, we have H[S] − h[SF ]= lim I[S LQ. R L−1 ;SR 1 ] (59) =I[SR ;SF 1 ] (60) =H[SF 1 ]−H[SF 1 |SR ] (61) =H[SF 1 ]−H[SF 1 |S] =-=(62)-=- =I[SF 1 ;S] (63) [ H[S]=Cm, (64) where the last inequality comes from Eq. (A9). QED. Remark 1. The Control Theorem is inspired by, and is a version of, Ashby’s law of requisite variety (ref. 83, ch. ... |

170 |
A universal data compression system
- RISSANEN
- 1983
(Show Context)
Citation Context ...ion as a Bayesian prior or regard description length as a measure of evidential support.) The construction of causal states is somewhat similar to the states estimated in Rissanen’s context algorithm =-=(48, 125, 126)-=- (and to the ‘‘vocabularies’’ built by universal coding schemes, such as the popular Lempel– Ziv algorithm (127, 128) ). Despite the similarities, there are significant differences. For a random sourc... |

164 |
Visual pattern analyzers
- GRAHAM
- 1989
(Show Context)
Citation Context ...though we suspect it will ultimately be useful in that domain. Nor is it concerned with pattern recognition as a practical matter as found in, say, neuropsychology, (20) psychophysics and perception, =-=(21)-=- cognitive ethology, (22) computer programming, (23) or signal and image processing. (24, 25) Instead, it is concerned with the questions of what patterns are and how patterns should be represented. O... |

159 | Mind and Nature. A Necessary Unity - Bateson - 1979 |

139 |
Hands: A Pattern Theoretic Study of Biological Shapes
- Grenander, Chow, et al.
- 1991
(Show Context)
Citation Context ...ies can be attached to these bonds, leading in a natural way to a (Gibbsian) probability distribution over entire configurations. Grenander and his colleagues have used these methods to characterize, =-=(35, 36)-=- inter alia, several biological phenomena. B. Turing Mechanics: Patterns and Effective Procedures The other path to patterns follows the traditional exploration of the logical foundations of mathemati... |

139 |
Linear Stochastic Systems
- Caines
- 1988
(Show Context)
Citation Context ...sses in an intimate and inextricable way. Probabilists have, of course, long been interested in using information-theoretic tools to analyze stochastic processes, particularly their ergodic behavior. =-=(61, 62, 102, 103)-=- There has also been considerable work in the hidden-Markov-model and optimal-prediction literatures on inferring models of processes from data or from given distributions. (10, 34, 104–106) To the be... |

138 |
Theory of Games and Statistical Decisions
- Blackwood, Girschick
- 1954
(Show Context)
Citation Context ...Crutchfield into the future, then because the causal states are minimal sufficient statistics for the distribution of futures (Theorem 2 (Eq. (29), Remark 4), the optimal rule of behavior will use E. =-=(100)-=- 3. Stochastic Processes Clearly, the computational mechanics approach to patterns and pattern discovery involves stochastic processes in an intimate and inextricable way. Probabilists have, of course... |

119 |
Dissipative Structures and Weak Turbulence
- Manneville
- 1990
(Show Context)
Citation Context ...al mechanics has good measures of disorder in thermodynamic entropy and in related quantities, such as the free energies. When augmented with theories of critical phenomena (1) and pattern formation, =-=(2)-=- it also has an extremely successful approach to analyzing patterns formed through symmetry breaking, both in equilibrium (3) and, more recently, outside it. (4) Unfortunately, these successes involve... |

118 |
Two-dimensional signal and image processing
- Lim
- 1990
(Show Context)
Citation Context ...attern recognition as a practical matter as found in, say, neuropsychology, (20) psychophysics and perception, (21) cognitive ethology, (22) computer programming, (23) or signal and image processing. =-=(24, 25)-=- Instead, it is concerned with the questions of what patterns are and how patterns should be represented. One way to highlight the difference is to call this pattern discovery, rather than pattern rec... |

116 |
Inferring statistical complexity
- Crutchfield, Young
- 1989
(Show Context)
Citation Context ...nd numerical calculations, is what makes computational mechanics ‘‘computational’’, in the sense of ‘‘computation theoretic’’. The basic ideas of computational mechanics were introduced a decade ago. =-=(6)-=- Since then they have been used to analyze dynamical systems, cellular automata, (9) hidden Markov models, (10) evolved spatial computation, (11) stochastic resonance, (12) globally coupled maps, (13)... |

107 |
The Creative Mind
- Boden
- 1990
(Show Context)
Citation Context ...st one representation which exactly captures the target concept. Although this is in line with implicit assumptions in most of mathematical statistics, it seems dubious when analyzing learning in the =-=(5, 119, 120)-=- real world.s870 Shalizi and Crutchfield In any case, the preceding development made no such assumption. One of the goals of computational mechanics is, exactly, discovering the best representation. T... |

101 | The evolution of emergent computation
- Crutchfield, Mitchell
- 1995
(Show Context)
Citation Context ...f computational mechanics were introduced a decade ago. (6) Since then they have been used to analyze dynamical systems, cellular automata, (9) hidden Markov models, (10) evolved spatial computation, =-=(11)-=- stochastic resonance, (12) globally coupled maps, (13) the dripping faucet experiment, (14) and atmospheric turbulence. (15) Despite this record of successful application, there has been some uncerta... |

101 |
Real patterns
- Dennett
- 1991
(Show Context)
Citation Context ...ons. Some interesting philosophical work on patterns-with-error has been done by Dennett, with reference not just to questions about the nature of patterns and their emergence but also to psychology. =-=(52)-=- The intuition is that truly random processes can be modeled very simply—‘‘to model cointossing, toss a coin.’’ Any prediction scheme that is more accurate than assuming complete independence ipso fac... |

100 |
Laws of information conservation (nongrowth) and aspects of the foundation of probability theory
- Levin
- 1974
(Show Context)
Citation Context ...h are perhaps more familiar. Definition 13 (Excess Entropy). The excess entropy E of a process is the mutual information between its semi-infinite past and its semi-infinite future: E — I[S F ;S R ]. =-=(43)-=-sComputational Mechanics 849 The excess entropy is a frequently-used measure of the complexity of stochastic processes and appears under a variety of names;e.g., ‘‘predictive information’’, ‘‘stored i... |

100 |
The Nature of Statistical Learning Theory, 2nd Ed
- Vapnik
- 2000
(Show Context)
Citation Context ...achines should form a useful area of future research, a point to which we alluded in the concluding remarks. 5. Computational and Statistical Learning Theory The goal of computational learning theory =-=(116, 117)-=- is to identify algorithms that quickly, reliably, and simply lead to good representations of a target ‘‘concept’’. The latter is typically defined to be a binary dichotomy of a certain feature or inp... |

95 |
Unpredictability and Undecidability in Dynamical Systems
- Moore
- 1990
(Show Context)
Citation Context ... and others exploring the physical basis of computation. (131–134) These proposals have ranged from highly abstract ideas about how to embed Turing machines in discrete-time nonlinear continuous maps =-=(7, 135)-=- to, more recently, schemes for specialized numerical computation that could in principle be implemented in current hardware. (136) All of them, however, have been synthetic, in the sense that they co... |

94 | Variable Length Markov chains
- Bühlmann, Wyner
- 1999
(Show Context)
Citation Context ...ion as a Bayesian prior or regard description length as a measure of evidential support.) The construction of causal states is somewhat similar to the states estimated in Rissanen’s context algorithm =-=(48, 125, 126)-=- (and to the ‘‘vocabularies’’ built by universal coding schemes, such as the popular Lempel– Ziv algorithm (127, 128) ). Despite the similarities, there are significant differences. For a random sourc... |

93 | The mathematical theory of communication (University of Illinois - Shannon, Weaver - 1949 |

88 | Computation at the Onset of Chaos
- Crutchfield, Young
- 1990
(Show Context)
Citation Context ...ble and their possible representations. Using ideas from information theory, we state a quantitative version of Occam’s Razor for such representations. At that point we define causal states, (6) equiv=-=(7, 8)-=-sComputational Mechanics 819 alence classes of behaviors, and the structure of transitions between causal states—the E-machine. We then show that the causal states are ideal from the point of view of ... |

84 | Toward a quantitative theory of self-generated complexity - Grassberger - 1986 |

83 |
Cognition, evolution and behavior
- SHETTLEWORTH
- 1998
(Show Context)
Citation Context ... ultimately be useful in that domain. Nor is it concerned with pattern recognition as a practical matter as found in, say, neuropsychology, (20) psychophysics and perception, (21) cognitive ethology, =-=(22)-=- computer programming, (23) or signal and image processing. (24, 25) Instead, it is concerned with the questions of what patterns are and how patterns should be represented. One way to highlight the d... |

82 |
Statistical Mechanics of Phase Transitions
- Yeomans
- 1992
(Show Context)
Citation Context ... nature exhibits. Statistical mechanics has good measures of disorder in thermodynamic entropy and in related quantities, such as the free energies. When augmented with theories of critical phenomena =-=(1)-=- and pattern formation, (2) it also has an extremely successful approach to analyzing patterns formed through symmetry breaking, both in equilibrium (3) and, more recently, outside it. (4) Unfortunate... |

77 |
Lectures on Gas Theory
- Boltzmann
- 1995
(Show Context)
Citation Context ...omplexity, and structure, at least as we use those words. Ignoring some foundational issues, randomness is actually rather well understood and well handled by classical tools introduced by Boltzmann; =-=(55)-=- Fisher, Neyman, and Pearson; (56) Kolmogorov; (37) and Shannon, (57) among others. One tradition in the study of complexity in fact identifies complexity with randomness and, as we have just seen, th... |

77 | Recursion theory on the reals and continuous-time computation - Moore - 1996 |

68 | Observable operator models for discrete stochastic time series
- Jaeger
(Show Context)
Citation Context ... L, H[S F L |R 1 ]=H[S F L |S] by the definition, Definition 11, of prescient rivals, H[S F 1 |R 1 ]=H[S F 1 |S]. Now we apply the chain rule again, H[R1 − ,SF 1 |R1 ]=H[SF 1 |R1 ]+H[R1 − |SF 1 ,R1 ] =-=(34)-=- \ H[SF 1 |R1 ] (35) =H[SF 1 |S] (36) =H[S − ,SF 1 |S] (37) =H[S − |S]+H[SF 1 − |S ,S]. (38) In going from Eq. (36) to Eq. (37) we have used Eq. (33), and in the last step we have used the chain rule ... |

68 | Universal schemes for prediction, gambling and portfolio selection, Ann - Algoet - 1992 |

65 | Detecting strange attractors in fluid turbolence - Takens |

59 |
The Working Brain: An Introduction to Neuropsychology
- Luria
- 1973
(Show Context)
Citation Context ...with pattern formation per se; (4) though we suspect it will ultimately be useful in that domain. Nor is it concerned with pattern recognition as a practical matter as found in, say, neuropsychology, =-=(20)-=- psychophysics and perception, (21) cognitive ethology, (22) computer programming, (23) or signal and image processing. (24, 25) Instead, it is concerned with the questions of what patterns are and ho... |

58 | Dynamical recognizers: real-time language recognition by analog computers - Moore - 1998 |

57 |
The Computational Beauty of Nature: Computer Explorations of Fractals, Chaos, Complex Systems, and Adaptation
- Flake
- 1998
(Show Context)
Citation Context ...exity for practical tasks—such as measuring the complexity of natural objects (e.g., ref. 44), as a basis for theories of inductive inference, (45, 46) and generally as a means of capturing patterns. =-=(47)-=-sComputational Mechanics 823 As Rissanen (ref. 48, p. 49) says, this is akin to ‘‘learn[ing] the properties [of a data set] by writing programs in the hope of finding short ones!’’ Various of the diff... |