## Statistical consistency of kernel canonical correlation analysis (2007)

### Cached

### Download Links

Venue: | JOURNAL OF MACHINE LEARNING RESEARCH |

Citations: | 16 - 8 self |

### BibTeX

@ARTICLE{Fukumizu07statisticalconsistency,

author = {Kenji Fukumizu and Francis R. Bach and Arthur Gretton},

title = { Statistical consistency of kernel canonical correlation analysis},

journal = {JOURNAL OF MACHINE LEARNING RESEARCH},

year = {2007},

volume = {8},

pages = {361--383}

}

### OpenURL

### Abstract

While kernel canonical correlation analysis (CCA) has been applied in many contexts, the convergence of finite sample estimates of the associated functions to their population counterparts has not yet been established. This paper gives a mathematical proof of the statistical convergence of kernel CCA, providing a theoretical justification for the method. The proof uses covariance operators defined on reproducing kernel Hilbert spaces, and analyzes the convergence of their empirical estimates of finite rank to their population counterparts, which can have infinite rank. The result also gives a sufficient condition for convergence on the regularization coefficient involved in kernel CCA: this should decrease as n −1/3, where n is the number of data.

### Citations

2045 | Learning with kernels
- Schölkopf, Smola
- 2002
(Show Context)
Citation Context ... −1/3 , where n is the number of data. Keywords: canonical correlation analysis, kernel, consistency, regularization, Hilbert space 1. Introduction Kernel methods (Cristianini and Shawe-Taylor, 2000; =-=Schölkopf and Smola, 2002-=-) have recently been developed as a methodology for nonlinear data analysis with positive definite kernels. In kernel methods, data are represented as functions or elements in reproducing kernel Hilbe... |

1556 |
An introduction to Support Vector Machines and other kernel-based learning methods
- Cristianini, Shawe-Taylor
- 2000
(Show Context)
Citation Context ...ernel CCA: this should decrease as n −1/3 , where n is the number of data. Keywords: canonical correlation analysis, kernel, consistency, regularization, Hilbert space 1. Introduction Kernel methods (=-=Cristianini and Shawe-Taylor, 2000-=-; Schölkopf and Smola, 2002) have recently been developed as a methodology for nonlinear data analysis with positive definite kernels. In kernel methods, data are represented as functions or elements ... |

1145 |
An Introduction to Multivariate Statistical Analysis
- Anderson
- 1984
(Show Context)
Citation Context ...rators With cross-covariance operators for (X,Y ), the kernel CCA problem can be formulated as sup 〈g,ΣYX f 〉 HY f ∈HX ,g∈HY subject to � 〈 f ,ΣXX f 〉 HX = 1, 〈g,ΣYY g〉 HY = 1. As with classical CCA (=-=Anderson, 2003-=-, for example), the solution of the above kernel CCA problem is given by the eigenfunctions corresponding to the largest eigenvalue of the following generalized eigenproblem: � ΣY X f = ρ1ΣYY g, ΣXY g... |

1060 | Nonlinear component analysis as a kernel eigenvalue problem
- Schölkopf, Smola, et al.
- 1998
(Show Context)
Citation Context ...akes computation of inner product in the Hilbert spaces tractable. Many methods have been proposed as nonlinear extensions of conventional linear methods, such as kernel principal component analysis (=-=Schölkopf et al., 1998-=-), kernel Fisher discriminant analysis (Mika et al., 1999), and so on. Kernel canonical correlation analysis (kernel CCA) was proposed (Akaho, 2001; Melzer et al., 2001; Bach and Jordan, 2002) as a no... |

787 | Theory of reproducing kernels - Aronszajn - 1950 |

327 | Kernel independent component analysis - Bach, Jordan - 2003 |

237 | Estimating optimal transformations for multiple regression and correlation - Breiman, Friedman - 1985 |

224 |
Theory and applications of Correspondence Analysis
- Greenacre
(Show Context)
Citation Context ...his is easily verified by EX[ f (X) 2 ] = EX[〈 f ,kX (·,X)〉 2 ] ≤ EX[� f �2 HX �kX (·,X)�2 ] = HX � f �2 HX EX[kX (X,X)] for f ∈ HX . 2.1 CCA in Reproducing Kernel Hilbert Spaces Classical CCA (e.g., =-=Greenacre, 1984-=-) looks for linear mappings a T X and b T Y that achieve maximum correlation. Kernel CCA extends this approach by looking for functions f ∈ HX and g ∈ HY such that the random variables f (X) and g(Y )... |

219 |
Probability Theory
- Renyi
- 1970
(Show Context)
Citation Context ...chmidt, which necessarily implies compactness. The condition is described in terms of mean square contingency, which is one of the standard criteria to measure the dependency of two random variables (=-=Rényi, 1970-=-). It is known (Buja, 1990) that the covariance operator considered on L 2 is Hilbert-Schmidt if the mean square contingency is finite. We modify the result to the case of the covariance operator on R... |

178 |
Protocol Analysis
- Ericcson, Simon
- 1984
(Show Context)
Citation Context ...[ f (X)])(g(Y) − EY [g(Y )]) � (= Cov[ f (X),g(Y)]) for all f ∈ HX and g ∈ HY . By regarding the right hand side as a linear functional on the direct product HX ⊗ HY , Riesz’s representation theorem (=-=Reed and Simon, 1980-=-, for example) guarantees the existence and uniqueness of a bounded operator ΣY X. The cross-covariance operator expresses the covariance between functions in the RKHS as a bilinear functional, and co... |

155 | der Vaart, Asymptotic Statistics - van - 1998 |

147 |
The Theory of Tikhonov Regularization for Fredholm Equations of the First Kind
- Groetsch
- 1984
(Show Context)
Citation Context ... eigenfunctions of a NOrmalized Cross-Covariance Operator, and we call it NOCCO for short. Both kernel CCA and NOCCO require a regularization coefficient, which is similar to Tikhonov regularization (=-=Groetsch, 1984-=-), to enforce smoothness of the functions in the finite sample case (thus avoiding a trivial solution) and to enable operator inversion; but the decay of this regularization with increased sample size... |

119 | Dimensionality reduction for supervised learning with reproducing kernel Hilbert spaces - Bach, R, et al. - 2004 |

98 | Measuring statistical dependence with Hilbert-Schmidt norms - Gretton, Bousquet, et al. - 2005 |

82 | Functional Analysis - Lax - 2002 |

44 | Joint measures and cross-covariance operators
- Baker
- 1973
(Show Context)
Citation Context ...n ∑ t=1 � kX (·,Xt), f HX Y X )⊥ are included in the linear hull , respectively. Let QX and QY be the orthogonal projection which maps HX onto R (ΣXX) and HY onto R (ΣYY ), respectively. It is known (=-=Baker, 1973-=-, Theorem 1) that ΣY X has a representation ΣY X = Σ 1/2 YY VY XΣ 1/2 XX , (5) where VY X : HX → HY is a unique bounded operator such that �VY X� ≤ 1 and VY X = QYVY XQX. Note that the inverse of an o... |

44 |
Canonical correlation analysis when the data are curves
- Moyeed, A, et al.
- 1993
(Show Context)
Citation Context ...eterized family of kernels such as the Gaussian RBF 378 � 2sSTATISTICAL CONSISTENCY OF KERNEL CCA kernel is provided, then cross-validation might be a reasonable method to select the best kernel (see =-=Leurgans et al., 1993-=-), however this remains to be established. One of the methods related to kernel CCA is independent component analysis (ICA), since Bach and Jordan (2002) use kernel CCA in their kernel ICA algorithm. ... |

31 |
Remarks on functional canonical variates, alternating least squares methods and
- Buja
- 1990
(Show Context)
Citation Context ...mplies compactness. The condition is described in terms of mean square contingency, which is one of the standard criteria to measure the dependency of two random variables (Rényi, 1970). It is known (=-=Buja, 1990-=-) that the covariance operator considered on L 2 is Hilbert-Schmidt if the mean square contingency is finite. We modify the result to the case of the covariance operator on RKHS. Assume that the measu... |

16 | An explicit description of the reproducing kernel Hilbert spaces of Gaussian RBF kernels
- Steinwart, Hush, et al.
(Show Context)
Citation Context ... support of PX is X , the null space N (ΣXX) is equal to HX ∩ R, where R denotes the constant functions. For the Gaussian RBF kernel k(x,y) = exp � − 1 2σ 2 �x−y� 2� defined on X ⊂ R m , it is known (=-=Steinwart et al., 2004-=-) that if the interior of X is not empty, a nontrivial constant function is not included in the RKHS; thus N (ΣXX) = {0} in such cases. The mean element mX ∈ HX with respect to a random variable X is ... |

8 |
Müller Fisher discriminant analysis with kernels
- Mika, Rätsch, et al.
- 1999
(Show Context)
Citation Context ...ble. Many methods have been proposed as nonlinear extensions of conventional linear methods, such as kernel principal component analysis (Schölkopf et al., 1998), kernel Fisher discriminant analysis (=-=Mika et al., 1999-=-), and so on. Kernel canonical correlation analysis (kernel CCA) was proposed (Akaho, 2001; Melzer et al., 2001; Bach and Jordan, 2002) as a nonlinear extension of canonical correlation analysis with ... |

6 |
Nonlinear feature extraction using generalized canonical correlation analysis
- Melzer, Reiter, et al.
- 2001
(Show Context)
Citation Context ...rincipal component analysis (Schölkopf et al., 1998), kernel Fisher discriminant analysis (Mika et al., 1999), and so on. Kernel canonical correlation analysis (kernel CCA) was proposed (Akaho, 2001; =-=Melzer et al., 2001-=-; Bach and Jordan, 2002) as a nonlinear extension of canonical correlation analysis with positive definite kernels. Given two random variables X and Y , kernel CCA aims at extracting the information w... |

5 | Kernel constrained covariance for dependence measurement - Gretton, Smola, et al. - 2005 |

5 |
KCCA for fMRI analysis
- Hardoon, Shawe-Taylor, et al.
- 2004
(Show Context)
Citation Context ... their correlation is maximized. Kernel CCA has been successfully applied in practice for extracting nonlinear relations between variables in genomic data (Yamanishi et al., 2003), fMRI brain images (=-=Hardoon et al., 2004-=-), chaotic time series (Suetani et al., 2006) and independent component analysis (Bach and Jordan, 2002). As in many statistical methods, the target functions defined in the population case are in pra... |

1 | AND GRETTON Shotaro Akaho. A kernel method for canonical correlation analysis - FUKUMIZU - 2001 |

1 | Hans-Georg Müller, and Jane-Ling Wang. Functional canonical analysis for square integrable stochastic procersses - He |

1 | Detecting hidden synchronization of chaotic dynamical systems: A kernel-based approach - Suetani, Iba, et al. |

1 | Akihiro Nakaya, and Minoru Kanehisa. Extraction of correlated gene clusters from multiple genomic data by generalized kernel canonical correlation analysis. Bioinformatics - Yamanishi, Vert - 2003 |