Results 1 
6 of
6
Causal inference using the algorithmic Markov condition
, 2008
"... Inferring the causal structure that links n observables is usually basedupon detecting statistical dependences and choosing simple graphs that make the joint measure Markovian. Here we argue why causal inference is also possible when only single observations are present. We develop a theory how to g ..."
Abstract

Cited by 11 (11 self)
 Add to MetaCart
Inferring the causal structure that links n observables is usually basedupon detecting statistical dependences and choosing simple graphs that make the joint measure Markovian. Here we argue why causal inference is also possible when only single observations are present. We develop a theory how to generate causal graphs explaining similarities between single objects. To this end, we replace the notion of conditional stochastic independence in the causal Markov condition with the vanishing of conditional algorithmic mutual information anddescribe the corresponding causal inference rules. We explain why a consistent reformulation of causal inference in terms of algorithmic complexity implies a new inference principle that takes into account also the complexity of conditional probability densities, making it possible to select among Markov equivalent causal graphs. This insight provides a theoretical foundation of a heuristic principle proposed in earlier work. We also discuss how to replace Kolmogorov complexity with decidable complexity criteria. This can be seen as an algorithmic analog of replacing the empirically undecidable question of statistical independence with practical independence tests that are based on implicit or explicit assumptions on the underlying distribution. email:
Learning Nonlinear Dynamic Models from Nonsequenced Data
"... Virtually all methods of learning dynamic systems from data start from the same basic assumption: the learning algorithm will be given a sequence of data generated from the dynamic system. We consider the case where the training data comes from the system’s operation but with no temporal ordering. T ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
Virtually all methods of learning dynamic systems from data start from the same basic assumption: the learning algorithm will be given a sequence of data generated from the dynamic system. We consider the case where the training data comes from the system’s operation but with no temporal ordering. The data are simply drawn as individual disconnected points. While making this assumption may seem absurd at first glance, many scientific modeling tasks have exactly this property. Previous work proposed methods for learning linear, discrete time models under these assumptions by optimizing approximate likelihood functions. We extend those methods to nonlinear models using kernel methods. We go on to propose a new approach that focuses on achieving temporal smoothness in the learned dynamics. The result is a convex criterion that can be easily optimized and often outperforms the earlier methods. We test these methods on several synthetic data sets including one generated from the Lorenz attractor. 1
Proceedings of the TwentySecond International Joint Conference on Artificial Intelligence Gaussianity Measures for Detecting the Direction of Causal Time Series ∗
"... We conjecture that the distribution of the timereversed residuals of a causal linear process is closer to a Gaussian than the distribution of the noise used to generate the process in the forward direction. This property is demonstrated for causal AR(1) processes assuming that all the cumulants of t ..."
Abstract
 Add to MetaCart
We conjecture that the distribution of the timereversed residuals of a causal linear process is closer to a Gaussian than the distribution of the noise used to generate the process in the forward direction. This property is demonstrated for causal AR(1) processes assuming that all the cumulants of the distribution of the noise are defined. Based on this observation, it is possible to design a decision rule for detecting the direction of time series that can be described as linear processes: The correct direction (forward in time) is the one in which the residuals from a linear fit to the time series are less Gaussian. A series of experiments with simulated and realworld data illustrate the superior results of the proposed rule when compared with other stateoftheart methods based on independence tests. 1
Exploiting Nonsequence Data in Dynamic Model Learning
"... Virtually all methods of learning dynamic models from data start from the same basic assumption: that the learning algorithm will be provided with a single or multiple sequences of data generated from the dynamic model. However, in quite a few modern time series modeling tasks, the collection of rel ..."
Abstract
 Add to MetaCart
Virtually all methods of learning dynamic models from data start from the same basic assumption: that the learning algorithm will be provided with a single or multiple sequences of data generated from the dynamic model. However, in quite a few modern time series modeling tasks, the collection of reliable time series turns out to be a major challenge, either due to the slow progression of the dynamic process of interest, or inaccessibility of repetitive measurements of the same dynamic process over time. In those situations, we observe that it is often easier to collect a large amount of nonsequence samples, or snapshots of the dynamic process of interest. This thesis aims to exploit such nonsequence samples in dynamic model learning. We first consider the case where the only data available are snapshots taken from multiple instantiations of a dynamic process at unknown times, and the dynamic process falls in the class of fully observable, discretetime, firstorder linear or nonlinear dynamic models. We pointed out several issues in model identifiability when learning from nonsequence data, and developed several learning algorithms based on optimizing approximate posterior
1 Causal Inference on Discrete Data using Additive Noise Models
"... Abstract — Inferring the causal structure of a set of random variables from a finite sample of the joint distribution is an important problem in science. The case of two random variables is particularly challenging since no (conditional) independences can be exploited. Recent methods that are based ..."
Abstract
 Add to MetaCart
Abstract — Inferring the causal structure of a set of random variables from a finite sample of the joint distribution is an important problem in science. The case of two random variables is particularly challenging since no (conditional) independences can be exploited. Recent methods that are based on additive noise models suggest the following principle: Whenever the joint distribution P (X,Y) admits such a model in one direction, e.g. Y = f(X)+N, N ⊥ X, but does not admit the reversed model X = g(Y)+ Ñ, Ñ ⊥ Y, one infers the former direction to be causal (i.e. X → Y). Up to now these approaches only deal with continuous variables. In many situations, however, the variables of interest are discrete or even have only finitely many states. In this work we extend the notion of additive noise models to these cases. We prove that it almost never occurs that additive noise models can be fit in both directions. We further propose an efficient algorithm that is able to perform this way of causal inference on finite samples of discrete variables. We show that the algorithm works both on synthetic and real data sets.
Proceedings of the TwentyThird International Joint Conference on Artificial Intelligence Statistical Tests for the Detection of the Arrow of Time in Vector Autoregressive Models
"... The problem of detecting the direction of time in vector Autoregressive (VAR) processes using statistical techniques is considered. By analogy to causal AR(1) processes with nonGaussian noise, we conjecture that the distribution of the time reversed residuals of a linear VAR model is closer to a Ga ..."
Abstract
 Add to MetaCart
The problem of detecting the direction of time in vector Autoregressive (VAR) processes using statistical techniques is considered. By analogy to causal AR(1) processes with nonGaussian noise, we conjecture that the distribution of the time reversed residuals of a linear VAR model is closer to a Gaussian than the distribution of actual residuals in the forward direction. Experiments with simulated data illustrate the validity of the conjecture. Based on these results, we design a decision rule for detecting the direction of VAR processes. The correct direction in time (forward) is the one in which the residuals of the time series are less Gaussian. A series of experiments illustrate the superior results of the proposed rule when compared with other methods based on independence tests. 1