## On Causal Discovery with Cyclic Additive Noise Models

### BibTeX

@MISC{Mooij_oncausal,

author = {Joris M. Mooij and Tom Heskes and Dominik Janzing and Bernhard Schölkopf},

title = {On Causal Discovery with Cyclic Additive Noise Models},

year = {}

}

### OpenURL

### Abstract

We study a particular class of cyclic causal models, where each variable is a (possibly nonlinear) function of its parents and additive noise. We prove that the causal graph of such models is generically identifiable in the bivariate, Gaussian-noise case. We also propose a method to learn such models from observational data. In the acyclic case, the method reduces to ordinary regression, but in the more challenging cyclic case, an additional term arises in the loss function, which makes it a special case of nonlinear independent component analysis. We illustrate the proposed method on synthetic data. 1

### Citations

1117 |
Causality: Models, Reasoning, and Inference
- Pearl
- 2000
(Show Context)
Citation Context ...n synthetic data. 1 Introduction Causal discovery refers to a special class of statistical and machine learning methods that infer causal relationships between variables from data and prior knowledge =-=[1, 2, 3]-=-. Whereas in machine learning, one traditionally concentrates on the task of predicting the values of variables given observations of other variables (for example in regression or classification tasks... |

749 |
Structural Equations with Latent Variables
- Bollen
- 1989
(Show Context)
Citation Context ...n synthetic data. 1 Introduction Causal discovery refers to a special class of statistical and machine learning methods that infer causal relationships between variables from data and prior knowledge =-=[1, 2, 3]-=-. Whereas in machine learning, one traditionally concentrates on the task of predicting the values of variables given observations of other variables (for example in regression or classification tasks... |

622 |
Investigating causal relations by econometric models and cross-spectral methods,” Econometrica
- Granger
- 1969
(Show Context)
Citation Context ...the underlying system. The fact that causes always precede their effects provides additional prior knowledge that simplifies causal discovery, which is exploited in methods based on Granger causality =-=[4]-=-. Additionally, under certain assumptions, “unrolling” the model in time effectively removes the cycles, which is used in methods such as vector auto-regressive models, which are popular in 1economet... |

496 |
Causation, Prediction, and Search
- Spirtes, Glymour, et al.
- 1993
(Show Context)
Citation Context ...n synthetic data. 1 Introduction Causal discovery refers to a special class of statistical and machine learning methods that infer causal relationships between variables from data and prior knowledge =-=[1, 2, 3]-=-. Whereas in machine learning, one traditionally concentrates on the task of predicting the values of variables given observations of other variables (for example in regression or classification tasks... |

219 | Learning the Structure of Dynamic Probabilistic Networks
- Friedman
- 1998
(Show Context)
Citation Context ...ng” the model in time effectively removes the cycles, which is used in methods such as vector auto-regressive models, which are popular in 1econometrics, or more generally, Dynamic Bayesian Networks =-=[5]-=- and ordinary differential equation models. However, all these methods need time series data where the temporal resolution of the measurements is high relative to the characteristic time scale of the ... |

39 | Directed cyclic graphical representations of feedback models
- Spirtes
- 1995
(Show Context)
Citation Context ...) case. An important novel aspect of our work is that we consider continuous-valued variables and nonlinear causal mechanisms. Although the linear case has been studied in considerable detail already =-=[6, 7, 8]-=-, as far as we know, nobody has yet investigated the (more realistic) case of nonlinear causal mechanisms. The basic assumption made in [7] is the so-called Global Directed Markov Condition, which rel... |

35 | Nonlinear causal discovery with additive noise models
- Hoyer, Janzing, et al.
- 2008
(Show Context)
Citation Context ...n this work, we focus our attention on the bivariate case. Our main result, Theorem 1, can be seen as an extension of the identifiability result for acyclic nonlinear additive noise models derived in =-=[11]-=-, although we make the additional simplifying assumption that the noise variables are Gaussian. We believe that similar identifiability results can be derived in the multivariate case (|V | > 2) and f... |

30 | Kernel methods for measuring independence
- Gretton, Herbrich, et al.
- 2005
(Show Context)
Citation Context ...clic additive noise models, omitting the acyclic ones. In each case, we calculated the p-value for independence of the two noise variables using the HSIC (Hilbert-Schmidt Independence Criterion) test =-=[15]-=-; for p-values substantially above 0 (say larger than 1%), we do not reject the null hypothesis of independence and hence accept the model as possible causal explanation of the data. This happens in f... |

28 |
A discovery algorithm for directed cyclic graphs
- Richardson
- 1996
(Show Context)
Citation Context ...) case. An important novel aspect of our work is that we consider continuous-valued variables and nonlinear causal mechanisms. Although the linear case has been studied in considerable detail already =-=[6, 7, 8]-=-, as far as we know, nobody has yet investigated the (more realistic) case of nonlinear causal mechanisms. The basic assumption made in [7] is the so-called Global Directed Markov Condition, which rel... |

15 | Discovering cyclic causal models by independent components analysis
- Lacerda, Spirtes, et al.
- 2008
(Show Context)
Citation Context ...) case. An important novel aspect of our work is that we consider continuous-valued variables and nonlinear causal mechanisms. Although the linear case has been studied in considerable detail already =-=[6, 7, 8]-=-, as far as we know, nobody has yet investigated the (more realistic) case of nonlinear causal mechanisms. The basic assumption made in [7] is the so-called Global Directed Markov Condition, which rel... |

14 | Handbook of Nonlinear Partial Differential Equations - Polyanin, Zaitsev - 2004 |

9 | On the identifiability of the post-nonlinear causal model
- Zhang, Hyvärinen
- 2009
(Show Context)
Citation Context ... ′ Y (x) )( 1 − f ′ X(y)f ′ Y (x) ) 2 − f ′′ X(y)f ′′ Y (x) (11) This is a nonlinear partial differential equation in φ(x) := f ′ Y (x) and ψ(y) := f ′ X (y). Inspired by the identifiability proof in =-=[13]-=-, we adopt the solution method from [14, Supplement S.4.3] that gives a general method for solving functional-differential equations of the form Φ1(x)Ψ1(y) + Φ2(x)Ψ2(y) + · · · + Φk(x)Ψk(y) = 0 (12) w... |

8 | Modeling discrete interventional data using directed cyclic graphical models
- Schmidt, Murphy
- 2009
(Show Context)
Citation Context ...stance, in the bivariate case, one cannot distinguish between X → Y , Y → X and X ⇆ Y using conditional independences alone. Researchers have also studied cyclic causal models with discrete variables =-=[9, 10]-=-. However, if the measured variables are intrinsically continuous-valued, it is desirable to avoid discretization as a preprocessing step, as this throws away information that is useful for causal dis... |

3 | Structure learning in causal cyclic networks
- Itani, Ohannessian, et al.
- 2008
(Show Context)
Citation Context ...stance, in the bivariate case, one cannot distinguish between X → Y , Y → X and X ⇆ Y using conditional independences alone. Researchers have also studied cyclic causal models with discrete variables =-=[9, 10]-=-. However, if the measured variables are intrinsically continuous-valued, it is desirable to avoid discretization as a preprocessing step, as this throws away information that is useful for causal dis... |

3 | Identifiability of Causal Graphs using Functional Models
- Peters
- 2011
(Show Context)
Citation Context ...onsiders two variables, it may be possible to use this two-variable identifiability result as a key building block for deriving more general identifiability results for many variables, similar as how =-=[12]-=- generalized the (acyclic) identifiability result of [11] from two to many variables. 3.2 Proof sketch Writing π···(· · · ) := log p···(· · · ) for logarithms of densities, we reexpress (4) for the bi... |