## Kernel methods for measuring independence (2005)

### Cached

### Download Links

Venue: | Journal of Machine Learning Research |

Citations: | 31 - 14 self |

### BibTeX

@ARTICLE{Gretton05kernelmethods,

author = {Arthur Gretton and Ralf Herbrich and A. Hyvärinen},

title = {Kernel methods for measuring independence},

journal = {Journal of Machine Learning Research},

year = {2005},

volume = {6},

pages = {2075--2129}

}

### OpenURL

### Abstract

We introduce two new functionals, the constrained covariance and the kernel mutual information, to measure the degree of independence of random variables. These quantities are both based on the covariance between functions of the random variables in reproducing kernel Hilbert spaces (RKHSs). We prove that when the RKHSs are universal, both functionals are zero if and only if the random variables are pairwise independent. We also show that the kernel mutual information is an upper bound near independence on the Parzen window estimate of the mutual information. Analogous results apply for two correlation-based dependence functionals introduced earlier: we show the kernel canonical correlation and the kernel generalised variance to be independence measures for universal kernels, and prove the latter to be an upper bound on the mutual information near independence. The performance of the kernel dependence functionals in measuring independence is verified in the context of independent component analysis.

### Citations

8867 |
Elements of Information Theory
- Cover, Thomas
- 1991
(Show Context)
Citation Context ...r independence of random variables are well established in statistical analysis. For instance, one well known measure of statistical dependence between two random variables is the mutual information (=-=Cover and Thomas, 1991-=-), which for random vectors x,y is zero if �and only if� the random vectors are independent. This may also be interpreted as the KL divergence D KL px,y||pxpy between the joint density and the product... |

4811 |
Topics in Matrix Analysis
- Horn, Johnson
- 1990
(Show Context)
Citation Context ...phasise that only the regularised KGV is used in practice. 38 � .sWe now use the result that if A ′ ≻ A ≻ 0 and B ′ ≻ B ≻ 0, then A ′ B ′ ≻ AB (this is a straightforward corollary to Theorem 7.7.3 of =-=Horn and Johnson, 1985-=-). The desired result then holds as long as 3 ′ �K + (1 − 3 ′ )2zI ≺ 3�K + (1 − 3)2zI when 3 ′ > 3 (as well as the analogous result for 3 � L + (1 − 3)2zI), which means or A.7 Proof of Lemma 22 (3 − 3... |

3877 | Neural Networks – A Comprehensive Foundation - Haykin - 1994 |

2321 | Density estimation for statistics and data analysis - Silverman - 1986 |

2126 | Learning with Kernels
- Schölkopf, Smola
- 2002
(Show Context)
Citation Context ...riance operator is then Hilbert-Schmidt (as shown by Gretton et al., 2005a). The empirical estimate COCO(z;F,G) is also simplified when F and G are unit balls in RKHSs, since the representer theorem (=-=Schölkopf and Smola, 2002-=-) holds: this states that a solution of an optimisation problem, dependent only on the function evaluations on a set of observations and on RKHS norms, lies in the span of the kernel functions evaluat... |

2008 | Pattern Classification - Duda, Hart, et al. - 2000 |

1590 | Independent Component Analysis
- Hyvärinen, Karhunen, et al.
- 2001
(Show Context)
Citation Context ...of instantaneous independent component analysis involves the recovery of linearly mixed, i.i.d. sources, in the absence of information about the source distributions beyond their mutual independence (=-=Hyvärinen et al., 2001-=-). c○ A. Gretton, R. Herbrich, A. Smola, O. Bousquet, B. Schölkopf .sTable 1: Table of kernel dependence functionals. Columns show whether the functional is covariance or correlation based, and rows i... |

1404 |
Independent Component Analysis, a new concept
- COMON
- 1994
(Show Context)
Citation Context ...sing only the pairwise independence between elements of x, which is equivalent to recovering the mutually independent terms of s (up to permutation and scaling). This is due to the following theorem (=-=Comon, 1994-=-, Theorem 11). Theorem 24 (Mutual independence in linear ICA) Let s be a vector of dimension n with mutually independent components, of which at most one is Gaussian, and for which the underlying dens... |

1127 | An informationmaximization approach to blind separation and blind deconvolution
- Bell, Sejnowski
- 1995
(Show Context)
Citation Context ...01sGRETTON, HERBRICH, SMOLA, BOUSQUET AND SCHÖLKOPF As well as the kernel algorithms, we compare with three standard ICA methods: FastICA (Hyvärinen et al., 2001), Jade (Cardoso, 1998a), and Infomax (=-=Bell and Sejnowski, 1995-=-); and two more sophisticated methods, neither of them based on kernels: RADICAL (Learned-Miller and Fisher III, 2003), which uses order statistics to obtain entropy estimates; and characteristic func... |

1091 | Nonlinear component analysis as a kernel eigenvalue problem - Schölkopf, Smola, et al. - 1998 |

834 | Kernel Methods for Pattern Analysis - Shawe-Taylor, Cristianini - 2004 |

811 | Probability, random variables, and stochastic processes - Papoulis - 1965 |

525 | A new learning algorithm for blind signal separation - Amari, Cichocki, et al. - 1996 |

435 |
The Fourier transform and its applications
- Bracewell
- 1965
(Show Context)
Citation Context ...ere both kernels are Gaussian; that is, � 2 1 (x − qi) k(x − qi) = � exp − 24&2 2& x 2 � , x � 1 l(y − r j) = exp − (y − r j) 2 2&2 � , y bearing in mind that the impulse function is a limiting case (=-=Bracewell, 1986-=-); !qi (x) = lim &x→0 1 � � 2 (x − qi) exp − := lim &x→0 k (x − qi). (55) � 24& 2 x To compute the covariance structure of the vectors in (53), we require expressions for the expectations � Ex,y kll ⊤... |

414 | signal separation: statistical principles - CARDOSO, Blind |

398 | The geometry of algorithms with orthogonality constraints - Edelman, Arias, et al. - 1999 |

247 | Theory and Applications of Correspondence Analysis - Greenacre - 1984 |

211 | A blind source separation technique using second-order statistics
- Belouchrani, Abed-Meriam, et al.
- 1997
(Show Context)
Citation Context ...nce, if they are generated by a random process with non-zero correlation between the outputs at different times), then an entirely different set of approaches can be brought to bear (see for instance =-=Belouchrani et al., 1997-=-; Pham and Garat, 1997). 22 Although the present study concentrates entirely on the i.i.d. case, we will briefly address random processes with time dependencies in Section 6, when describing possible ... |

203 | Independent Component Analysis
- COMON
- 1994
(Show Context)
Citation Context ...ernel. The polishing step usually caused a measurable improvement in our results. As well as the kernel algorithms, we compare with three standard ICA methods: FastICA (Hyvärinen et al., 2001), Jade (=-=Cardoso, 1998-=-a), and Infomax (Bell and Sejnowski, 1995); and two more sophisticated methods, neither of them based on kernels: RADICAL (Miller and Fisher III, 2003), which uses order statistics to obtain entropy e... |

198 | Efficient SVM training using low-rank kernel representations - Fine, Scheinberg - 2001 |

167 | On the influence of the kernel on the consistency of support vector machines - Steinwart |

150 | Estimation of entropy and mutual information - Paninski - 2003 |

130 | Source separation in postnonlinear mixtures - Taleb, Jutten - 1999 |

120 | Dimensionality Reduction for Supervised Learning with Reproducing Kernel Hilbert Spaces - Fukumizu, Bach, et al. - 2004 |

106 | Kernel partial least squares regression in reproducing kernel Hilbert space - Rosipal, Trejo - 2001 |

102 | Blind separation of mixture of independent sources through a quasimaximun likelihood approach
- Pham, Garat
- 1997
(Show Context)
Citation Context ... by a random process with non-zero correlation between the outputs at different times), then an entirely different set of approaches can be brought to bear (see for instance Belouchrani et al., 1997; =-=Pham and Garat, 1997-=-). 22 Although the present study concentrates entirely on the i.i.d. case, we will briefly address random processes with time dependencies in Section 6, when describing possible extensions to our work... |

99 | Measuring statistical dependence with hilbert-schmidt norms - Gretton, Bousquet, et al. - 2005 |

87 | Nonlinear independent component analysis: Existence and uniqueness results
- Hyvärinen
- 1999
(Show Context)
Citation Context ...D SCHÖLKOPF Various efforts have also been made to solve the more general case t = f(s). This problem requires additional constraints on f , to avoid a trivial solution via the Darmois decomposition (=-=Hyvärinen and Pajunen, 1999-=-) (even then, it is generally the case that each source si can only be recovered up to a nonlinear distortion; this is the analogue of the scaling indeterminacy (Theorem 24) in the linear mixing case)... |

65 | Adaptive blind signal and image processing - 2Cichock, Amari - 2002 |

61 | Kernel and Nonlinear Canonical Correlation Analysis - Lai, Fyfe - 2000 |

49 | Ica using spacings estimates of entropy - Learned-Miller |

46 |
Canonical correlation analysis when the data are curves
- LEURGANS, MOYEED, et al.
- 1993
(Show Context)
Citation Context ...of the function spaces chosen: the use of RKHSs is a more recent innovation). Thus, rather than using the covariance, we may consider a kernelised canonical correlation (KCC) (Bach and Jordan, 2002a; =-=Leurgans et al., 1993-=-), which is a regularised estimate of the spectral norm of the correlation operator between reproducing kernel Hilbert spaces. It follows from the properties of COCO that the KCC is zero at independen... |

44 | Joint measures and cross-covariance operators - Baker - 1973 |

42 | A kernel method for canonical correlation analysis - Akaho - 2001 |

42 | On measures of dependence - Rényi - 1959 |

34 | Probability Essentials - Jacod, Protter - 2000 |

27 | Kernel-based nonlinear blind source separation - Harmeling, Ziehe, et al. |

21 | Fisher III, ICA using spacings estimates of entropy - Learned-Miller, W |

19 | One-unit contrast functions for independent component analysis: A statistical analysis
- Hyvärinen
- 1997
(Show Context)
Citation Context ...is. The kernel methods used & = 1, - = 2×10 −5 , and / = 0.11 (KCC and KGV only). The tanh nonlinearity was used for the FastICA algorithm, since this is more resistant to outliers than the kurtosis (=-=Hyvärinen, 1997-=-). Right: Performance of the KCC and KGV as a function of / for two sources of size m = 1000, where 25 outliers were added to each source following the mixing procedure. 5.5 Audio signal demixing Our ... |

18 | A unifying frame work for independent component analysis - Lee, Girolami, et al. - 1999 |

18 | P.: Least dependent component analysis based on mutual information - Stögbauer, Kraskov, et al. - 2004 |

15 | Consistent independent component analysis and prewhitening
- Chen, Bickel
- 2005
(Show Context)
Citation Context ...sophisticated methods, neither of them based on kernels: RADICAL (Miller and Fisher III, 2003), which uses order statistics to obtain entropy estimates; and characteristic function based ICA (CFICA) (=-=Chen and Bickel, 2004-=-). 24 It was recommended to run the CFICA algorithm with a good initialising guess; we used RADICAL for this purpose. All kernel algorithms were initialised using Jade (except for the 16 source case, ... |

14 | Quadratic dependence measure for nonlinear blind sources separation - Achard, Pham, et al. - 2003 |

13 | Nonlinear Canonical Analysis and Independence Tests - Dauxois, Nkiet - 1998 |

11 | Tree-dependent component analysis - Bach, Jordan - 2002 |

11 | Consistency of kernel canonical correlation analysis - Fukumizu, Bach, et al. - 2006 |

11 | The kernel mutual information - Gretton, Herbrich, et al. - 2003 |

10 | Criteria for blind source separation in post non linear mixture with mutual information - Achard, Pham, et al. |

10 | On the separability of nonlinear mixtures of temporally correlated sources - Hosseini, Jutten - 2003 |

9 | Eléments aléatoires dan un espace de - Mourier - 1953 |