Results 1  10
of
10
Nonlinear causal discovery with additive noise models
"... The discovery of causal relationships between a set of observed variables is a fundamental problem in science. For continuousvalued data linear acyclic causal models with additive noise are often used because these models are well understood and there are wellknown methods to fit them to data. In ..."
Abstract

Cited by 36 (16 self)
 Add to MetaCart
(Show Context)
The discovery of causal relationships between a set of observed variables is a fundamental problem in science. For continuousvalued data linear acyclic causal models with additive noise are often used because these models are well understood and there are wellknown methods to fit them to data. In reality, of course, many causal relationships are more or less nonlinear, raising some doubts as to the applicability and usefulness of purely linear methods. In this contribution we show that in fact the basic linear framework can be generalized to nonlinear models. In this extended framework, nonlinearities in the datagenerating process are in fact a blessing rather than a curse, as they typically provide information on the underlying causal system and allow more aspects of the true datagenerating mechanisms to be identified. In addition to theoretical results we show simulations and some simple real data experiments illustrating the identification power provided by nonlinearities. 1
Causal inference using the algorithmic Markov condition
, 2008
"... Inferring the causal structure that links n observables is usually basedupon detecting statistical dependences and choosing simple graphs that make the joint measure Markovian. Here we argue why causal inference is also possible when only single observations are present. We develop a theory how to g ..."
Abstract

Cited by 12 (11 self)
 Add to MetaCart
(Show Context)
Inferring the causal structure that links n observables is usually basedupon detecting statistical dependences and choosing simple graphs that make the joint measure Markovian. Here we argue why causal inference is also possible when only single observations are present. We develop a theory how to generate causal graphs explaining similarities between single objects. To this end, we replace the notion of conditional stochastic independence in the causal Markov condition with the vanishing of conditional algorithmic mutual information anddescribe the corresponding causal inference rules. We explain why a consistent reformulation of causal inference in terms of algorithmic complexity implies a new inference principle that takes into account also the complexity of conditional probability densities, making it possible to select among Markov equivalent causal graphs. This insight provides a theoretical foundation of a heuristic principle proposed in earlier work. We also discuss how to replace Kolmogorov complexity with decidable complexity criteria. This can be seen as an algorithmic analog of replacing the empirically undecidable question of statistical independence with practical independence tests that are based on implicit or explicit assumptions on the underlying distribution. email:
Causal feature selection
, 2001
"... This chapter reviews techniques for learning causal relationships from data, in application to the problem of feature selection. Most feature selection methods do not attempt to uncover causal relationships between feature and target and focus instead on making best predictions. We examine situation ..."
Abstract

Cited by 11 (5 self)
 Add to MetaCart
This chapter reviews techniques for learning causal relationships from data, in application to the problem of feature selection. Most feature selection methods do not attempt to uncover causal relationships between feature and target and focus instead on making best predictions. We examine situations in which the knowledge of causal relationships benefits feature selection. Such benefits may include: explaining relevance in terms of causal mechanisms, distinguishing between actual features and experimental artifacts, predicting the consequences of actions performed by external agents, and making predictions in nonstationary environments. Conversely, we highlight the benefits that causal discovery may draw from recent developments in feature selection theory and algorithms.
Distinguishing between cause and effect
, 2008
"... We describe eight data sets that together formed the CauseEffectPairs task in the Causality Challenge #2: PotLuck competition. Each set consists of a sample of a pair of statistically dependent random variables. One variable is known to cause the other one, but this information was hidden from the ..."
Abstract

Cited by 8 (7 self)
 Add to MetaCart
(Show Context)
We describe eight data sets that together formed the CauseEffectPairs task in the Causality Challenge #2: PotLuck competition. Each set consists of a sample of a pair of statistically dependent random variables. One variable is known to cause the other one, but this information was hidden from the participants; the task was to identify which of the two variables was the cause and which one the effect, based upon the observed sample. The data sets were chosen such that we expect common agreement on the ground truth. Even though part of the statistical dependences may also be due to hidden common causes, common sense tells us that there is a significant causeeffect relation between the two variables in each pair. We also present baseline results using three different causal inference methods.
On causally asymmetric versions of Occam’s Razor and their relation to thermodynamics
, 2007
"... and their relation to thermodynamics ..."
(Show Context)
Probabilistic latent variable models for distinguishing between cause
"... and effect ..."
(Show Context)
Distinguishing between cause and effect via kernelbased complexity measures for conditional distributions
, 2007
"... ..."
Causal reasoning by evaluating the complexity of conditional densities with kernel methods
 Neurocomputing
"... We propose a method to quantify the complexity of conditional probability measures by a Hilbert space seminorm of the logarithm of its density. The concept of Reproducing Kernel Hilbert Spaces (RKHS) is a flexible tool to define such a seminorm by choosing an appropriate kernel. We present several e ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
(Show Context)
We propose a method to quantify the complexity of conditional probability measures by a Hilbert space seminorm of the logarithm of its density. The concept of Reproducing Kernel Hilbert Spaces (RKHS) is a flexible tool to define such a seminorm by choosing an appropriate kernel. We present several examples with artificial datasets where our kernelbased complexity measure is consistent with our intuitive understanding of complexity of densities. The intention behind the complexity measure is to provide a new approach to inferring causal directions. The idea is that the factorization of the joint probability measure P(effect,cause) into P(effectcause)P(cause) leads typically to “simpler” and “smoother ” terms than the factorization into P(causeeffect)P(effect). Since the conventional constraintbased approach of causal discovery is not able to determine the causal direction between only two variables, our inference principle can in particular be helpful when combined with other existing methods. We provide several simple examples with realworld data where the true causal directions indeed lead to simpler (conditional) densities.
Exploring the causal order of binary variables via exponential hierarchies of Markov kernels
"... Abstract. We propose a new algorithm for estimating the causal structure that underlies the observed dependence among n (n ≥ 4) binary variables X1,..., Xn. Our inference principle states that the factorization of the joint probability into conditional probabilities for Xj given X1,..., Xj−1 often l ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Abstract. We propose a new algorithm for estimating the causal structure that underlies the observed dependence among n (n ≥ 4) binary variables X1,..., Xn. Our inference principle states that the factorization of the joint probability into conditional probabilities for Xj given X1,..., Xj−1 often leads to simpler terms if the order of variables is compatible with the directed acyclic graph representing the causal structure. We study joint measures of OR/AND gates and show that the complexity of the conditional probabilities (the socalled Markov kernels), defined by a hierarchy of exponential models, depends on the order of the variables. Some toy and realdata experiments support our inference rule. 1
ARTICLE IN PRESS
, 2008
"... www.elsevier.com/locate/neucom Causal reasoning by evaluating the complexity of conditional densities with kernel methods ..."
Abstract
 Add to MetaCart
(Show Context)
www.elsevier.com/locate/neucom Causal reasoning by evaluating the complexity of conditional densities with kernel methods