## Kernel measures of conditional dependence (2008)

### Cached

### Download Links

- [www.ism.ac.jp]
- [www.kyb.tuebingen.mpg.de]
- [www.kyb.mpg.de]
- [www.kyb.tuebingen.mpg.de]
- [www.kyb.mpg.de]
- [eprints.pascal-network.org]
- [books.nips.cc]
- DBLP

### Other Repositories/Bibliography

Venue: | In Adv. NIPS |

Citations: | 53 - 35 self |

### BibTeX

@INPROCEEDINGS{Fukumizu08kernelmeasures,

author = {Kenji Fukumizu and Arthur Gretton and Xiaohai Sun and Bernhard Schölkopf},

title = {Kernel measures of conditional dependence},

booktitle = {In Adv. NIPS},

year = {2008}

}

### OpenURL

### Abstract

We propose a new measure of conditional dependence of random variables, based on normalized cross-covariance operators on reproducing kernel Hilbert spaces. Unlike previous kernel dependence measures, the proposed criterion does not depend on the choice of kernel in the limit of infinite data, for a wide class of kernels. At the same time, it has a straightforward empirical estimate with good convergence behaviour. We discuss the theoretical properties of the measure, and demonstrate its application in experiments. 1

### Citations

337 | Kernel independent component analysis
- Bach, Jordan
- 2002
(Show Context)
Citation Context ...election in supervised learning looks for a set of features on which the response variable most depends. Kernel methods have been successfully used for capturing (conditional) dependence of variables =-=[1, 5, 8, 9, 16]-=-. With the ability to represent high order moments, mapping of variables into reproducing kernel Hilbert spaces (RKHSs) allows us to infer properties of the distributions, such as independence and hom... |

240 |
Probability theory
- Renyi
- 1970
(Show Context)
Citation Context ...converge to a value that depends only on the distributions of the variables. Eq. (9) shows that, under the assumptions, I NOCCO is equal to the mean square contingency, a well-known dependence measure=-=[14]-=- commonly used for discrete variables. As we show in Section works as a consistent kernel estimator of the mean square contingency. 2.4, ÎNOCCO n The expression of Eq. (9) can be compared with the mut... |

196 |
Clustering analysis
- Diday, Simon
- 1976
(Show Context)
Citation Context ...dependence 2measures. Recall that an operator A : H1 → H2 is called Hilbert-Schmidt if for complete orthonormal systems (CONSs) {φi} of H1 and {ψj} of H2, the sum ∑ i,j 〈ψj, Aφi〉 2 is finite (see H2 =-=[13]-=-). For a Hilbert-Schmidt operator A, the Hilbert-Schmidt (HS) norm ‖A‖HS is defined by ‖A‖2 ∑ HS = i,j 〈ψj, Aφi〉 2 . It is easy to see that this sum is independent of the choice of CONSs. H2 Provided ... |

170 | Introduction to Graphical Modelling - Edwards - 1995 |

167 | On the influence of the kernel on the consistency of support vector machines
- Steinwart
(Show Context)
Citation Context ...ely, the RKHS should contain a sufficiently rich class of functions to represent all higher order moments. Similar notions have already appeared in the literature: universal kernel on compact domains =-=[15]-=- and Gaussian kernels on the entire R m characterize independence via the cross-covariance operator [8, 1]. We now discuss a unified class of kernels for inference on probabilities. Let (X , B) be a m... |

120 | Dimensionality reduction for supervised learning with reproducing kernel hilbert spaces
- Fukumizu
- 2004
(Show Context)
Citation Context ...election in supervised learning looks for a set of features on which the response variable most depends. Kernel methods have been successfully used for capturing (conditional) dependence of variables =-=[1, 5, 8, 9, 16]-=-. With the ability to represent high order moments, mapping of variables into reproducing kernel Hilbert spaces (RKHSs) allows us to infer properties of the distributions, such as independence and hom... |

114 |
Goodness-of-Fit Statistics for Discrete Multivariate Data. Springer Series in Statistics
- Read, Cressie
- 1988
(Show Context)
Citation Context ...hod from [11], with no need for explicit estimation of the densities. Since ÎNOCCO n is an estimate of the mean square contingency, we also apply a relevant contingency-table-based independence test (=-=[12]-=-), partitioning the variables into bins. Figure 1 shows the values of ÎNOCCO n for a sample. In Table 1, we see that the results of ÎNOCCO n are stable w.r.t. the choice of εn, provided it is sufficie... |

109 |
Estimating mutual information
- Kraskov, Stögbauer, et al.
(Show Context)
Citation Context ...nerate 100 sets of 200 data. We perform permutation tests with ÎNOCCO n , HSIC = ‖̂ Σ (n) Y X‖2HS , and the mutual information (MI). For the empirical estimates of MI, we use the advanced method from =-=[11]-=-, with no need for explicit estimation of the densities. Since ÎNOCCO n is an estimate of the mean square contingency, we also apply a relevant contingency-table-based independence test ([12]), partit... |

97 | Measuring statistical dependence with Hilbert-Schmidt norms
- Gretton, Bousquet, et al.
(Show Context)
Citation Context ...election in supervised learning looks for a set of features on which the response variable most depends. Kernel methods have been successfully used for capturing (conditional) dependence of variables =-=[1, 5, 8, 9, 16]-=-. With the ability to represent high order moments, mapping of variables into reproducing kernel Hilbert spaces (RKHSs) allows us to infer properties of the distributions, such as independence and hom... |

71 |
A kernel method for the two-sample-problem
- Gretton, Borgwardt, et al.
(Show Context)
Citation Context ...e ability to represent high order moments, mapping of variables into reproducing kernel Hilbert spaces (RKHSs) allows us to infer properties of the distributions, such as independence and homogeneity =-=[7]-=-. A drawback of previous kernel dependence measures, however, is that their value depends not only on the distribution of the variables, but also on the kernel, in contrast to measures such as mutual ... |

46 | Joint measures and cross-covariance operators - Baker - 1973 |

44 | A kernel statistical test of independence
- Gretton, Fukumizu, et al.
- 2007
(Show Context)
Citation Context ...d only briefly in this paper. The basic idea is that a kernel should be chosen so that the covariance operator detects independence of variables as effectively as possible. It has been recently shown =-=[10]-=-, under the independence of } . L 2 5−4 −2 0 2 4 4 4 1.6 1.4 2 2 1.2 0 0 I NOCCO 1 0.8 −2 −2 0.6 0.4 −4 −4 −4 −2 0 2 4 0.2 0 0.2 0.4 0.6 0.8 Angle Figure 1: Left and Middle: Examples of data (θ = 0 a... |

32 | Kernel methods for measuring independence
- Gretton, Herbrich, et al.
- 2005
(Show Context)
Citation Context ...n expression. It is interesting to ask if other classical dependence measures, such as the mutual information, can be estimated by kernels (in a broader sense than the expansion about independence of =-=[9]-=-). A relevant measure is the kernel generalized variance (KGV [1]), which is based on a sum of the logarithm of the eigenvalues of VY X, while I NOCCO is their squared sum. It is also interesting to i... |

30 | 2009 Kernel dimension reduction in regression
- Fukumizu, Bach, et al.
(Show Context)
Citation Context ...eld. The assumptions to relate the operators with independence are well described by using characteristic kernels and denseness. The next result generalizes Corollary 9 in [5] (we omit the proof: see =-=[5, 6]-=-). Theorem 3. (i) Assume (A-1) for the kernels. If the product kX kY is characteristic, then we have VY X = O ⇐⇒ X ⊥Y. (ii) Denote ¨ X = (X, Z) and k ¨ X = kX kZ. In addition to (A-1), assume that the... |

17 | Statistical consistency of kernel canonical correlation analysis
- Fukumizu, Bach, et al.
(Show Context)
Citation Context ... Σ 1/2 Y Y VY XΣ 1/2 XX , (2) R(VY X) ⊂ R(ΣY Y ), and N (VY X) ⊥ ⊂ R(ΣXX). The operator norm of VY X is less than or equal to 1. We call VY X the normalized cross-covariance operator (NOCCO, see also =-=[4]-=-). While the operator VY X encodes the same information regarding the dependence of X and Y as ΣY X, the former rather expresses the information more directly than ΣY X, with less influence of the mar... |

7 | A kernel-based causal learning algorithm
- Sun, Janzing, et al.
- 2002
(Show Context)
Citation Context |

1 |
Scheinberg Efficient SVM Training using Low-Rank Kernel
- Fine, K
(Show Context)
Citation Context ...riables are used for ÎCOND . These empirical estimators, and use of εn, will be n justified in Section 2.4 by showing the convergence to I NOCCO and ICOND . With the incomplete Cholesky decomposition =-=[17]-=- of rank r, the complexity to compute ÎCOND n 2.2 Inference on probabilities by characteristic kernels is O(r 2 n). To relate I NOCCO and I COND with independence and conditional independence, respect... |