Results 1  10
of
48
Predictability, Complexity, and Learning
, 2001
"... We define predictive information Ipred(T) as the mutual information between the past and the future of a time series. Three qualitatively different behaviors are found in the limit of large observation times T: Ipred(T) can remain finite, grow logarithmically, or grow as a fractional power law. If t ..."
Abstract

Cited by 30 (2 self)
 Add to MetaCart
We define predictive information Ipred(T) as the mutual information between the past and the future of a time series. Three qualitatively different behaviors are found in the limit of large observation times T: Ipred(T) can remain finite, grow logarithmically, or grow as a fractional power law. If the time series allows us to learn a model with a finite number of parameters, then Ipred(T) grows logarithmically with a coefficient that counts the dimensionality of the model space. In contrast, powerlaw growth is associated, for example, with the learning of infinite parameter (or nonparametric) models such as continuous functions with smoothness constraints. There are connections between the predictive information and measures of complexity that have been defined both in learning theory and the analysis of physical systems through statistical mechanics and dynamical systems theory. Furthermore, in the same way that entropy provides the unique measure of available information consistent with some simple and plausible conditions, we argue that the divergent part of Ipred(T) provides the unique measure for the complexity of dynamics underlying a time series. Finally, we discuss how these ideas may be useful in problems in physics, statistics, and biology.
Blind Construction of Optimal Nonlinear Recursive Predictors for Discrete Sequences
 In Proceedings of the 20th conference on Uncertainty in artificial intelligence (UAI'04
, 2004
"... We present a new method for nonlinear prediction of discrete random sequences under minimal structural assumptions. We give a mathematical construction for optimal predictors of such processes, in the form of hidden Markov models. We then describe an algorithm, CSSR (CausalState Splitting Reconst ..."
Abstract

Cited by 26 (2 self)
 Add to MetaCart
We present a new method for nonlinear prediction of discrete random sequences under minimal structural assumptions. We give a mathematical construction for optimal predictors of such processes, in the form of hidden Markov models. We then describe an algorithm, CSSR (CausalState Splitting Reconstruction), which approximates the ideal predictor from data. We discuss the reliability of CSSR, its data requirements, and its performance in simulations. Finally, we compare our approach to existing methods using variablelength Markov models and crossvalidated hidden Markov models, and show theoretically and experimentally that our method delivers results superior to the former and at least comparable to the latter. 1
An informationtheoretic primer on complexity, selforganisation and emergence
 ADVANCES IN COMPLEX SYSTEMS IN PRESS. URL HTTP: //WWW.WORLDSCINET.COM/ACS/EDITORIAL/PAPER/5183631.PDF
, 2007
"... Complex Systems Science aims to understand concepts like complexity, selforganization, emergence and adaptation, among others. The inherent fuzziness in complex systems definitions is complicated by the unclear relation among these central processes: does selforganisation emerge or does it set the ..."
Abstract

Cited by 16 (2 self)
 Add to MetaCart
Complex Systems Science aims to understand concepts like complexity, selforganization, emergence and adaptation, among others. The inherent fuzziness in complex systems definitions is complicated by the unclear relation among these central processes: does selforganisation emerge or does it set the preconditions for emergence? Does complexity arise by adaptation or is complexity necessary for adaptation to arise? The inevitable consequence of the current impasse is miscommunication among scientists within and across disciplines. We propose a set of concepts, together with their informationtheoretic interpretations, which can be used as a dictionary of Complex Systems Science discourse. Our hope is that the suggested informationtheoretic baseline may facilitate consistent communications among practitioners, and provide new insights into the field.
Exponential family predictive representations of state
 In Neural Information Processing Systems (NIPS
"... 2008 To my wife, Martha. ii Acknowledgments This work would not have been possible without generous help, both intellectually and financially. I am grateful to my advisor, Satinder Singh, for the long discussions we have had as he has patiently taught me to think clearly through my own ideas, sharpe ..."
Abstract

Cited by 11 (2 self)
 Add to MetaCart
2008 To my wife, Martha. ii Acknowledgments This work would not have been possible without generous help, both intellectually and financially. I am grateful to my advisor, Satinder Singh, for the long discussions we have had as he has patiently taught me to think clearly through my own ideas, sharpen my writing, and to raise my sights. A special thanks also to my lab mates, Matt Rudary, Britton Wolfe, Vishal Soni, Erik Talviti, Jonathan Sorg and Ishan Chaudhuri for always letting me bounce ideas around, for listening, and for patient tutoring. Thanks to Andrew Nuxoll for being a kindred spirit, to Nick Gorski for the occasional foosball game and to my collaborators at the University of Alberta. Finally, I would like to gratefully acknowledge the National Science Foundation for financially supporting me through most of my studies with a Graduate Research Fellowship. Finally, a special thank you to my wife Martha for her love, her constancy, her feistiness and for always keeping me on the straight and narrow. Thank you, Grace, Peterson and Andrew for reminding
DYNAMICS OF BAYESIAN UPDATING WITH DEPENDENT DATA AND MISSPECIFIED MODELS
, 2009
"... Recent work on the convergence of posterior distributions under Bayesian updating has established conditions under which the posterior will concentrate on the truth, if the latter has a perfect representation within the support of the prior, and under various dynamical assumptions, such as the data ..."
Abstract

Cited by 10 (2 self)
 Add to MetaCart
Recent work on the convergence of posterior distributions under Bayesian updating has established conditions under which the posterior will concentrate on the truth, if the latter has a perfect representation within the support of the prior, and under various dynamical assumptions, such as the data being independent and identically distributed or Markovian. Here I establish sufficient conditions for the convergence of the posterior distribution in nonparametric problems even when all of the hypotheses are wrong, and the datagenerating process has a complicated dependence structure. The main dynamical assumption is the generalized asymptotic equipartition (or “ShannonMcMillanBreiman”) property of information theory. I derive a kind of large deviations principle for the posterior measure, and discuss the advantages of predicting using a combination of models known to be wrong. An appendix sketches connections between the present results and the “replicator dynamics” of evolutionary theory.
Complementarity in classical dynamical systems
 Foundations of Physics
, 2006
"... symbolic dynamics; epistemic accessibility; partitions ..."
Abstract

Cited by 10 (4 self)
 Add to MetaCart
symbolic dynamics; epistemic accessibility; partitions
External and internal complexity of complex adaptive systems
 Theory in Biosciences
, 2004
"... SFI Working Papers contain accounts of scientific work of the author(s) and do not necessarily represent the views of the Santa Fe Institute. We accept papers intended for publication in peerreviewed journals or proceedings volumes, but not papers that have already appeared in print. Except for pap ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
SFI Working Papers contain accounts of scientific work of the author(s) and do not necessarily represent the views of the Santa Fe Institute. We accept papers intended for publication in peerreviewed journals or proceedings volumes, but not papers that have already appeared in print. Except for papers by our external faculty, papers must be based on work done at SFI, inspired by an invited visit to or collaboration at SFI, or funded by an SFI grant. ©NOTICE: This working paper is included by permission of the contributing author(s) as a means to ensure timely distribution of the scholarly and technical work on a noncommercial basis. Copyright and all rights therein are maintained by the author(s). It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may be reposted only with the explicit permission of the copyright holder. www.santafe.edu
Pattern discovery in time series, part I: Theory, algorithm, analysis, and convergence
, 2002
"... We present a new algorithm for discovering patterns in time series and other sequential data. We exhibit a reliable procedure for building the minimal set of hidden, Markovian states that is statistically capable of producing the behavior exhibited in the data — the underlying process’s causal stat ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
We present a new algorithm for discovering patterns in time series and other sequential data. We exhibit a reliable procedure for building the minimal set of hidden, Markovian states that is statistically capable of producing the behavior exhibited in the data — the underlying process’s causal states. Unlike conventional methods for fitting hidden Markov models (HMMs) to data, our algorithm makes no assumptions about the process’s causal architecture (the number of hidden states and their transition structure), but rather infers it from the data. It starts with assumptions of minimal structure and introduces complexity only when the data demand it. Moreover, the causal states it infers have important predictive optimality properties that conventional HMM states lack. Here, in Part I, we introduce the algorithm, review the theory behind it, prove its asymptotic reliability, and use large deviation theory to estimate its rate of convergence. In the sequel, Part II, we outline the algorithm’s implementation, illustrate its ability to discover even “difficult” patterns, and compare it to various alternative schemes.
Discovering Functional Communities in Dynamical Networks
, 2006
"... Abstract. Many networks are important because they are substrates for dynamical systems, and their pattern of functional connectivity can itself be dynamic — they can functionally reorganize, even if their underlying anatomical structure remains fixed. However, the recent rapid progress in discoveri ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
Abstract. Many networks are important because they are substrates for dynamical systems, and their pattern of functional connectivity can itself be dynamic — they can functionally reorganize, even if their underlying anatomical structure remains fixed. However, the recent rapid progress in discovering the community structure of networks has overwhelmingly focused on that constant anatomical connectivity. In this paper, we lay out the problem of discovering functional communities, and describe an approach to doing so. This method combines recent work on measuring information sharing across stochastic networks with an existing and successful communitydiscovery algorithm for weighted networks. We illustrate it with an application to a large biophysical model of the transition from beta to gamma rhythms in the hippocampus. 1
Information theory and learning: a physical approach
, 2000
"... We try to establish a unified information theoretic approach to learning and to explore some of its applications. First, we define predictive information as the mutual information between the past and the future of a time series, discuss its behavior as a function of the length of the series, and ex ..."
Abstract

Cited by 5 (3 self)
 Add to MetaCart
We try to establish a unified information theoretic approach to learning and to explore some of its applications. First, we define predictive information as the mutual information between the past and the future of a time series, discuss its behavior as a function of the length of the series, and explain how other quantities of interest studied previously in learning theory—as well as in dynamical systems and statistical mechanics—emerge from this universally definable concept. We then prove that predictive information provides the unique measure for the complexity of dynamics underlying the time series and show that there are classes of models characterized by power–law growth of the predictive information that are qualitatively more complex than any of the systems that have been investigated before. Further, we investigate numerically the learning of a nonparametric probability density, which is an example of a problem with power–law complexity, and show that the proper Bayesian formulation of this problem provides for the ‘Occam ’ factors that punish overly complex models and thus allow one to learn not only a solution within a specific model class, but also the class itself using the data