## Nonlinear causal discovery with additive noise models

### Cached

### Download Links

Citations: | 35 - 16 self |

### BibTeX

@MISC{Hoyer_nonlinearcausal,

author = {Patrik O. Hoyer and Dominik Janzing and Joris Mooij and Jonas Peters and Bernhard Schölkopf},

title = {Nonlinear causal discovery with additive noise models},

year = {}

}

### OpenURL

### Abstract

The discovery of causal relationships between a set of observed variables is a fundamental problem in science. For continuous-valued data linear acyclic causal models with additive noise are often used because these models are well understood and there are well-known methods to fit them to data. In reality, of course, many causal relationships are more or less nonlinear, raising some doubts as to the applicability and usefulness of purely linear methods. In this contribution we show that in fact the basic linear framework can be generalized to nonlinear models. In this extended framework, nonlinearities in the data-generating process are in fact a blessing rather than a curse, as they typically provide information on the underlying causal system and allow more aspects of the true data-generating mechanisms to be identified. In addition to theoretical results we show simulations and some simple real data experiments illustrating the identification power provided by nonlinearities. 1

### Citations

1273 |
Spline models for observational data
- Wahba
- 1990
(Show Context)
Citation Context ...ssion method can be used; we have verified that our results do not depend significantly on the choice of the regression method by comparing with ν-SVR [15] and with thinplate spline kernel regression =-=[16]-=-. For the independence test, we implemented the HSIC [13] with a Gaussian kernel, where we used the gamma distribution as an approximation for the distribution of the HSIC under the null hypothesis of... |

1117 |
Causality: Models, Reasoning, and Inference
- Pearl
- 2000
(Show Context)
Citation Context ...riments illustrating the identification power provided by nonlinearities. 1 Introduction Causal relationships are fundamental to science because they enable predictions of the consequences of actions =-=[1]-=-. While controlled randomized experiments constitute the primary tool for identifying causal relationships, such experiments are in many cases either unethical, too expensive, or technically impossibl... |

749 |
Structural Equations with Latent Variables
- Bollen
- 1989
(Show Context)
Citation Context ...es an important current research topic [1, 2, 3, 4, 5, 6, 7, 8]. If the observed data is continuous-valued, methods based on linear causal models (aka structural equation models) are commonly applied =-=[1, 2, 9]-=-. This is not necessarily because the true causal relationships are really believed to be linear, but rather it reflects the fact that linear models are well understood and easy to work with. A standa... |

640 |
UCI machine learning repository
- Asuncion, Newman
- 2007
(Show Context)
Citation Context ...8 × 10 −8 . Hence, our simple nonlinear model with independent additive noise is not consistent with the data in either direction. The second dataset, the “Abalone” dataset from the UCI ML repository =-=[18]-=-, contains measurements of the number of rings in the shell of abalone (a group of shellfish), which indicate their age, and the length of the shell. Figure 4 shows the results for a subsample of 500 ... |

496 |
Causation, Prediction, and Search
- Spirtes, Glymour, et al.
- 1993
(Show Context)
Citation Context ...r unethical, too expensive, or technically impossible. The development of causal discovery methods to infer causal relationships from uncontrolled data constitutes an important current research topic =-=[1, 2, 3, 4, 5, 6, 7, 8]-=-. If the observed data is continuous-valued, methods based on linear causal models (aka structural equation models) are commonly applied [1, 2, 9]. This is not necessarily because the true causal rela... |

112 |
Learning gaussian networks
- Geiger, Heckerman
- 1994
(Show Context)
Citation Context ...r unethical, too expensive, or technically impossible. The development of causal discovery methods to infer causal relationships from uncontrolled data constitutes an important current research topic =-=[1, 2, 3, 4, 5, 6, 7, 8]-=-. If the observed data is continuous-valued, methods based on linear causal models (aka structural equation models) are commonly applied [1, 2, 9]. This is not necessarily because the true causal rela... |

80 | A Bayesian Approach to Causal Discovery
- Heckerman, Meek, et al.
- 1999
(Show Context)
Citation Context ...r unethical, too expensive, or technically impossible. The development of causal discovery methods to infer causal relationships from uncontrolled data constitutes an important current research topic =-=[1, 2, 3, 4, 5, 6, 7, 8]-=-. If the observed data is continuous-valued, methods based on linear causal models (aka structural equation models) are commonly applied [1, 2, 9]. This is not necessarily because the true causal rela... |

53 | A linear non-Gaussian acyclic model for causal discovery
- Shimizu, Hoyer, et al.
- 2006
(Show Context)
Citation Context |

49 | Shrinking the tube: a new support vector regression algorithm
- Schölkopf, Bartlett, et al.
- 1999
(Show Context)
Citation Context ...egression individually. 1 In principle, any regression method can be used; we have verified that our results do not depend significantly on the choice of the regression method by comparing with ν-SVR =-=[15]-=- and with thinplate spline kernel regression [16]. For the independence test, we implemented the HSIC [13] with a Gaussian kernel, where we used the gamma distribution as an approximation for the dist... |

41 | Learning the structure of linear latent variable models
- Silva, Glymour, et al.
- 2006
(Show Context)
Citation Context |

30 | Kernel methods for measuring independence
- Gretton, Herbrich, et al.
- 2005
(Show Context)
Citation Context ...ips or the distributions of the noise should optimally be utilized here. In our implementation, we perform the regression using Gaussian Processes [12] and the independence tests using kernel methods =-=[13]-=-. Note that one must take care to avoid overfitting, as overfitting may lead one to falsely accept models which should be rejected. 5 Experiments To show the ability of our method to find the correct ... |

23 |
A look at some data on the Old Faithful geyser
- Azzalini, Bowman
- 1990
(Show Context)
Citation Context ... datasets for which the assumptions of our method might only hold approximately. Due to space constraints we only discuss three real world datasets here. The first dataset, the “Old Faithful” dataset =-=[17]-=- contains data about the duration of an eruption and the time interval between subsequent eruptions of the Old Faithful geyser in Yellowstone National Park, USA. Our method obtains a p-value of 0.5 fo... |

20 | Gaussian process networks
- Friedman, Nachman
- 2000
(Show Context)
Citation Context ...en causal relationships are nonlinear it typically helps break the symmetry between the observed variables and allows the identification of causal directions. As Friedman and Nachman have pointed out =-=[10]-=-, non-invertible functional relationships between the observed variables can provide clues to the generating causal model. However, we show that the phenomenon is much more general; for nonlinear mode... |

15 | Automated discovery of linear feedback models
- Richardson, Spirtes
- 1999
(Show Context)
Citation Context |

11 | Causal inference by choosing graphs with most plausible Markov kernels
- Sun, Janzing, et al.
- 2006
(Show Context)
Citation Context ...eneral; for nonlinear models with additive noise almost any nonlinearities (invertible or not) will typically yield identifiable models. Note that other methods to select among Markov equivalent DAGs =-=[11, 8]-=- have (so far) mainly focussed on mixtures of discrete and continuous variables.In the next section, we start by defining the family of models under study, and then, in Section 3 we give theoretical ... |

5 | Über eine Eigenschaft der normalen Verteilungsfunction - Cramér - 1936 |

4 | Distinguishing between cause and effect via kernel-based complexity measures for conditional distributions - Sun, Janzing, et al. - 2007 |