## A linear non-gaussian acyclic model for causal discovery (2006)

### Cached

### Download Links

- [clopinet.com]
- [www.jmlr.org]
- [www.cs.helsinki.fi]
- [www.cs.helsinki.fi]
- [www.cs.helsinki.fi]
- [jmlr.org]
- DBLP

### Other Repositories/Bibliography

Venue: | J. Machine Learning Research |

Citations: | 56 - 24 self |

### BibTeX

@ARTICLE{Shimizu06alinear,

author = {Shohei Shimizu and Patrik O. Hoyer and Aapo Hyvärinen and Antti Kerminen},

title = {A linear non-gaussian acyclic model for causal discovery},

journal = {J. Machine Learning Research},

year = {2006},

pages = {2003--2030}

}

### Years of Citing Articles

### OpenURL

### Abstract

In recent years, several methods have been proposed for the discovery of causal structure from non-experimental data. Such methods make various assumptions on the data generating process to facilitate its identification from purely observational data. Continuing this line of research, we show how to discover the complete causal structure of continuous-valued data, under the assumptions that (a) the data generating process is linear, (b) there are no unobserved confounders, and (c) disturbance variables have non-Gaussian distributions of non-zero variances. The solution relies on the use of the statistical method known as independent component analysis, and does not require any pre-specified time-ordering of the variables. We provide a complete Matlab package for performing this LiNGAM analysis (short for Linear Non-Gaussian Acyclic Model), and demonstrate the effectiveness of the method using artificially generated data and real-world data.

### Citations

3448 |
Controlling the False Discovery Rate: a Practical and Powerful Approach to Multiple Testing
- Benjamini, Hochberg
- 1995
(Show Context)
Citation Context ... the Bonferroni method is too conservative when the number of tests is large. Some authors have improved the Bonferroni procedure or devised new techniques so that they have more power of test (e.g., =-=Benjamini and Hochberg, 1995-=-; Hochberg, 1988; Holm, 1979; Simes, 1986). We would like to study such multiple comparison techniques in future work and implement them in our software package. 5 7. Simulations To verify the validit... |

3389 | An Introduction to the Bootstrap
- Efron, Tibshirani
- 1993
(Show Context)
Citation Context ...udy the empirical performance of this pruning method (Section 7.2). 2013SHIMIZU, HOYER, HYVÄRINEN AND KERMINEN A potential alternative to Wald statistics would be to use resampling techniques (e.g., =-=Efron and Tibshirani, 1993-=-). We provide a basic method using resamplings as an option in our Matlab code. In our implementation we take the causal ordering obtained from the LiNGAM algorithm, and then simply estimate the conne... |

2153 |
Time Series Analysis for Forecasting and Control
- Box, Jenkins
- 1970
(Show Context)
Citation Context ...ed from a LiNGAM model. A time series can be approximated by a LiNGAM model if it is a stationary AR(p) process. An AR(p) process, or an autoregressive process of order p, is defined by the equation (=-=Box and Jenkins, 1976-=-) Xt = φ1Xt−1 + φ2Xt−2 + ···+ φpXt−p + at. That is, the value of Xt is a weighted sum of p previous variables and white noise (at). The weights φ1,φ2,...,φp are called the parameters of the process. A... |

1445 |
Independent component analysis, a new concept
- Comon
- 1994
(Show Context)
Citation Context ...i). Taken together, Equation (2) and the independence and non-Gaussianity of the components of e define the standard linear independent component analysis model. Independent component analysis (ICA) (=-=Comon, 1994-=-; Hyvärinen et al., 2001) is a fairly recent statistical technique for identifying a linear model such as that given in Equation (2). If the observed data is a linear, invertible mixture of non-Gaussi... |

1296 |
Structural Equations with Latent Variables
- Bollen
- 1989
(Show Context)
Citation Context ...ables xi, i ∈{1,...,m} can be arranged in a causal order, such that no later variable causes any earlier variable. We denote such a causal order by k(i). That is, the generating process is recursive (=-=Bollen, 1989-=-), meaning it can be represented graphically by a directed acyclic graph (DAG) (Pearl, 2000; Spirtes et al., 2000). 1. Preliminary results of the paper were presented at UAI2005 and ICA2006 (Shimizu e... |

1271 |
Causality: Models, reasoning, and inference
- Pearl
- 2000
(Show Context)
Citation Context ... and real-world data. Keywords: independent component analysis, non-Gaussianity, causal discovery, directed acyclic graph, non-experimental data 1. Introduction Several authors (Spirtes et al., 2000; =-=Pearl, 2000-=-) have recently formalized concepts related to causality using probability distributions defined on directed acyclic graphs. This line of research emphasizes the importance of understanding the proces... |

818 |
A simple sequentially rejective multiple test procedure
- Holm
- 1979
(Show Context)
Citation Context ...n the number of tests is large. Some authors have improved the Bonferroni procedure or devised new techniques so that they have more power of test (e.g., Benjamini and Hochberg, 1995; Hochberg, 1988; =-=Holm, 1979-=-; Simes, 1986). We would like to study such multiple comparison techniques in future work and implement them in our software package. 5 7. Simulations To verify the validity of our method (and of our ... |

812 |
Time Series: Theory and Methods
- Brockwell, Davis
- 1991
(Show Context)
Citation Context ... the parameters of the process. A process is stationary, if the variance is finite, the mean remains the same over time, and the autocovariance function depends only on the time lag of two variables (=-=Brockwell and Davis, 1987-=-). The last condition also implies that the variance remains the same over time. If we want to approximate a stationary AR(p) process by a LiNGAM model, the white noise process must be non-Gaussian. A... |

584 | Fast and robust fixed-point algorithms for independent component analysis
- Hyvärinen
- 1999
(Show Context)
Citation Context ...identifiable (up to scaling and permutation of the columns, as discussed below) given enough observed data vectors x. Furthermore, efficient algorithms for estimating the mixing matrix are available (=-=Hyvärinen, 1999-=-). We again want to emphasize that ICA uses non-Gaussianity (that is, more than covariance information) to estimate the mixing matrix A (or equivalently its inverse W = A −1 ). For Gaussian disturbanc... |

539 | Blind beamforming for non Gaussian signals
- Cardoso, Souloumiac
- 1993
(Show Context)
Citation Context ... E[ f (x,θ0) f T (x,θ0)]J −T . (5) Pham and Garrat (1997) derived an estimating function for (quasi-) maximum likelihood estimation. Kawanabe and Müller (2005) provided estimating functions for JADE (=-=Cardoso and Souloumiac, 1993-=-) and for ICA based on non-Gaussianity maximization with orthogonality (uncorrelatedness) constraints including FastICA (Hyvärinen, 1999). In this paper, we restrict ourselves to testing mixing and de... |

405 | Equivariant adaptive source separation
- Cardoso, Laheld
- 1996
(Show Context)
Citation Context ...as we intuitively (based on the argumentation in Section 4) require. Appendix D. Asymptotic Variance of ICA Several authors studied asymptotic variance of ICA (Pham and Garrat, 1997; Hyvärinen, 1997; =-=Cardoso and Laheld, 1996-=-; Tichavsk´y et al., 2006), where the theory of estimating functions (Godambe, 1991) was often used. Let us consider a semiparametric model p(x|θ), where θ is a r-dimensional parameter vector of inter... |

294 |
A sharper Bonferroni procedure for multiple tests of significance. Biometrika
- Hochberg
- 1988
(Show Context)
Citation Context ...conservative when the number of tests is large. Some authors have improved the Bonferroni procedure or devised new techniques so that they have more power of test (e.g., Benjamini and Hochberg, 1995; =-=Hochberg, 1988-=-; Holm, 1979; Simes, 1986). We would like to study such multiple comparison techniques in future work and implement them in our software package. 5 7. Simulations To verify the validity of our method ... |

192 |
Multiple Comparison Procedures
- Hochberg, Tamhane
- 1987
(Show Context)
Citation Context ...set of all the tests. We could have a lot of spurious significance if we just repeat tests without any corrections. In such a case, it would be effective to employ multiple comparison procedures (see =-=Hochberg and Tamhane, 1987-=-, for details). A simple and basic method is the Bonferroni correction, where we simply divide a significance level by the number of tests to obtain the significance level for individual test. However... |

140 |
An improved Bonferroni procedure for multiple tests of significance
- Simes
- 1986
(Show Context)
Citation Context ... of tests is large. Some authors have improved the Bonferroni procedure or devised new techniques so that they have more power of test (e.g., Benjamini and Hochberg, 1995; Hochberg, 1988; Holm, 1979; =-=Simes, 1986-=-). We would like to study such multiple comparison techniques in future work and implement them in our software package. 5 7. Simulations To verify the validity of our method (and of our Matlab code),... |

116 |
Learning Gaussian networks
- Geiger, Heckerman
- 1994
(Show Context)
Citation Context ...er work on the linear, causally sufficient, case is the assumption of non-Gaussianity of the disturbances. In most work, an explicit or implicit assumption of Gaussianity has been made (Bollen, 1989; =-=Geiger and Heckerman, 1994-=-; Spirtes et al., 2000). An assumption of Gaussianity of disturbance variables makes the full joint distribution over the xi Gaussian, and the covariance matrix of the data embodies all one could poss... |

101 | Blind separation of mixture of independent sources through a quasi–maximum likelihood approach
- Pham, Garrat
- 1997
(Show Context)
Citation Context ...penalizes small values on the diagonal, as we intuitively (based on the argumentation in Section 4) require. Appendix D. Asymptotic Variance of ICA Several authors studied asymptotic variance of ICA (=-=Pham and Garrat, 1997-=-; Hyvärinen, 1997; Cardoso and Laheld, 1996; Tichavsk´y et al., 2006), where the theory of estimating functions (Godambe, 1991) was often used. Let us consider a semiparametric model p(x|θ), where θ i... |

78 |
Can test statistics in covariance structure analysis be trusted
- Hu, Bentler, et al.
- 1992
(Show Context)
Citation Context ...assess the overall fit of the estimated model to data. However, it is often pointed out that this type of test statistics requires large sample sizes for T1 to behave like a chi-square variate (e.g., =-=Hu et al., 1992-=-). Therefore, we would apply a proposal by Yuan and Bentler (1997) to T1 to improve its chi-square approximation and employ the following test statistic T2: T2 = T1 1 + F(̂τ) . 6.2.3 A DIFFERENCE CHI-... |

45 | Assignment Problems and Extensions - Burkard, Çela - 1999 |

44 | Validating the independent components of neuroimaging time series via clustering and visualization - Himberg, Hyvärinen, et al. - 2004 |

41 |
Estimating functions
- Godambe
- 1991
(Show Context)
Citation Context ...riance of ICA Several authors studied asymptotic variance of ICA (Pham and Garrat, 1997; Hyvärinen, 1997; Cardoso and Laheld, 1996; Tichavsk´y et al., 2006), where the theory of estimating functions (=-=Godambe, 1991-=-) was often used. Let us consider a semiparametric model p(x|θ), where θ is a r-dimensional parameter vector of interest. Note that the density function p(x|θ) is unknown. Let us denote by θ0 the true... |

23 | Corrections to “Performance Analysis of the FastICA Algorithm and Cramér-Rao Bounds for Linear Independent Component Analysis - Tichavsky, Koldovsky, et al. - 2008 |

11 |
Discovery of non-gaussian linear causal models using ICA
- Shimizu, Hyvaerinen, et al.
- 2005
(Show Context)
Citation Context ...len, 1989), meaning it can be represented graphically by a directed acyclic graph (DAG) (Pearl, 2000; Spirtes et al., 2000). 1. Preliminary results of the paper were presented at UAI2005 and ICA2006 (=-=Shimizu et al., 2005-=-, 2006b; Hoyer et al., 2006a). 2004ALINEAR NON-GAUSSIAN ACYCLIC MODEL FOR CAUSAL DISCOVERY 2. The value assigned to each variable xi is a linear function of the values already assigned to the earlier... |

9 | Estimating functions for blind separation when sources have variance dependencies - Kawanabe, Müller |

8 | Estimation of linear, non-gaussian causal models in the presence of confounding latent variables - Hoyer, Shimizu, et al. - 2006 |

6 | On asymmetric properties of the correlation coefficient in the regression setting - Dodge, Rousson |

6 | Finding a causal ordering via independent component analysis - Shimizu, Hyvarinen, et al. |

4 | Use of non-normality in structural equation modeling: Application to direction of causation - Shimizu, Kano |

3 | New permutation algorithms for causal discovery using ICA - Hoyer, Shimizu, et al. |

2 | Time series data library, 2005. URL http://www-personal.buseco.monash. edu.au/˜hyndman/TSDL/. [June 2005]. A. Hyvärinen. One-unit contrast functions for independent component analysis: A statistical analysis - Hyndman |

2 | Testing significance of mixing and demixing coefficients in ICA - Shimizu, Hyvärinen, et al. |