Results 1  10
of
19
A Hilbert space embedding for distributions
 In Algorithmic Learning Theory: 18th International Conference
, 2007
"... Abstract. We describe a technique for comparing distributions without the need for density estimation as an intermediate step. Our approach relies on mapping the distributions into a reproducing kernel Hilbert space. Applications of this technique can be found in twosample tests, which are used for ..."
Abstract

Cited by 55 (27 self)
 Add to MetaCart
Abstract. We describe a technique for comparing distributions without the need for density estimation as an intermediate step. Our approach relies on mapping the distributions into a reproducing kernel Hilbert space. Applications of this technique can be found in twosample tests, which are used for determining whether two sets of observations arise from the same distribution, covariate shift correction, local learning, measures of independence, and density estimation. Kernel methods are widely used in supervised learning [1, 2, 3, 4], however they are much less established in the areas of testing, estimation, and analysis of probability distributions, where information theoretic approaches [5, 6] have long been dominant. Recent examples include [7] in the context of construction of graphical models, [8] in the context of feature extraction, and [9] in the context of independent component analysis. These methods have by and large a common issue: to compute quantities such as the mutual information, entropy, or KullbackLeibler divergence, we require sophisticated space partitioning and/or
A kernel statistical test of independence
, 2008
"... Although kernel measures of independence have been widely applied in machine learning (notably in kernel ICA), there is as yet no method to determine whether they have detected statistically significant dependence. We provide a novel test of the independence hypothesis for one particular kernel inde ..."
Abstract

Cited by 43 (28 self)
 Add to MetaCart
Although kernel measures of independence have been widely applied in machine learning (notably in kernel ICA), there is as yet no method to determine whether they have detected statistically significant dependence. We provide a novel test of the independence hypothesis for one particular kernel independence measure, the HilbertSchmidt independence criterion (HSIC). The resulting test costs O(m 2), where m is the sample size. We demonstrate that this test outperforms established contingency table and functional correlationbased tests, and that this advantage is greater for multivariate data. Finally, we show the HSIC test also applies to text (and to structured data more generally), for which no other independence test presently exists. 1
A kernel method for the two sample problem
 ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 19
, 2007
"... We propose a framework for analyzing and comparing distributions, allowing us to design statistical tests to determine if two samples are drawn from different distributions. Our test statistic is the largest difference in expectations over functions in the unit ball of a reproducing kernel Hilbert ..."
Abstract

Cited by 39 (13 self)
 Add to MetaCart
We propose a framework for analyzing and comparing distributions, allowing us to design statistical tests to determine if two samples are drawn from different distributions. Our test statistic is the largest difference in expectations over functions in the unit ball of a reproducing kernel Hilbert space (RKHS). We present two tests based on large deviation bounds for the test statistic, while a third is based on the asymptotic distribution of this statistic. The test statistic can be computed in quadratic time, although efficient linear time approximations are available. Several classical metrics on distributions are recovered when the function space used to compute the difference in expectations is allowed to be more general (eg. a Banach space). We apply our twosample tests to a variety of problems, including attribute matching for databases using the Hungarian marriage method, where they perform strongly. Excellent performance is also obtained when comparing distributions over graphs, for which these are the first such tests.
Consistent Independent Component Analysis and Prewhitening
, 2005
"... We study the statistical merits of two techniques used in the literature of independent component analysis (ICA). First, we analyze the characteristicfunction based ICA method (CHFICA) and study its statistical properties such as consistency, consistency, and robustness against small additive noi ..."
Abstract

Cited by 13 (2 self)
 Add to MetaCart
We study the statistical merits of two techniques used in the literature of independent component analysis (ICA). First, we analyze the characteristicfunction based ICA method (CHFICA) and study its statistical properties such as consistency, consistency, and robustness against small additive noise. Second, we study the validity of prewhitening: a preprocessing technique used by many ICA algorithms, as applied to the CHFICA method. In particular, we establish the surprising effectiveness of this technique even when some components have heavy tails and others do not. A fast new algorithm implementing the prewhitened CHFICA method is also provided.
Detecting the Direction of Causal Time Series
"... We propose a method that detects the true direction of time series, by fitting an autoregressive moving average model to the data. Whenever the noise is independent of the previous samples for one ordering of the observations, but dependent for the opposite ordering, we infer the former direction to ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
We propose a method that detects the true direction of time series, by fitting an autoregressive moving average model to the data. Whenever the noise is independent of the previous samples for one ordering of the observations, but dependent for the opposite ordering, we infer the former direction to be the true one. We prove that our method works in the population case as long as the noise of the process is not normally distributed (for the latter case, the direction is not identifiable). A new and important implication of our result is that it confirms a fundamental conjecture in causal reasoning — if after regression the noise is independent of signal for one direction and dependent for the other, then the former represents the true causal direction — in the case of time series. We test our approach on two types of data: simulated data sets conforming to our modeling assumptions, and real world EEG time series. Our method makes a decision for a significant fraction of both data sets, and these decisions are mostly correct. For real world data, our approach outperforms alternative solutions to the problem of time direction recovery. 1.
Consistent Nonparametric Tests of Independence
, 2009
"... Three simple and explicit procedures for testing the independence of two multidimensional random variables are described. Two of the associated test statistics (L1, loglikelihood) are defined when the empirical distribution of the variables is restricted to finite partitions. A third test statist ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
Three simple and explicit procedures for testing the independence of two multidimensional random variables are described. Two of the associated test statistics (L1, loglikelihood) are defined when the empirical distribution of the variables is restricted to finite partitions. A third test statistic is defined as a kernelbased independence measure. Two kinds of tests are provided. Distributionfree strong consistent tests are derived on the basis of large deviation bounds on the test statistcs: these tests make almost surely no Type I or Type II error after a random sample size. Asymptotically αlevel tests are obtained from the limiting distribution of the test statistics. For the latter tests, the Type I error converges to a fixed nonzero value α, and the Type II error drops to zero, for increasing sample size. All tests reject the null hypothesis of independence if the test statistics become large. The performance of the tests is evaluated experimentally on benchmark data.
Novel Characteristic Function Based Criteria for ICA
 In Proc. of the 3rd Int. Conf. on Independent Component Analysis and Blind Signal Separation (ICA2001
, 2001
"... We introduce two nonparametric independent component analysis (ICA) criteria based on factorization of characteristic functions. This approach has potential to separate wide class of distributions because characteristic function always exists. A simple criterion allowing for efficient search of the ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
We introduce two nonparametric independent component analysis (ICA) criteria based on factorization of characteristic functions. This approach has potential to separate wide class of distributions because characteristic function always exists. A simple criterion allowing for efficient search of the separating matrix and a more advanced criterion possessing desirable consistency property are presented. These criteria may easily be used in orthogonal ICA algorithms. Separating matrix is estimated by establishing pairwise independence among the source signals. Theoretical characteristic functions used in the criteria are replaced by empirical ones. In the examples, the reliable performance of the methods is demonstrated using a variety of source distributions including skewed and heavytailed distributions.
Fast kernelbased independent component analysis
 IEEE Transactions on Signal Processing
"... Recent approaches to Independent Component Analysis (ICA) have used kernel independence measures to obtain highly accurate solutions, particularly where classical methods experience difficulty (for instance, sources with nearzero kurtosis). FastKICA (Fast HSICbased Kernel ICA) is a new optimisatio ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
Recent approaches to Independent Component Analysis (ICA) have used kernel independence measures to obtain highly accurate solutions, particularly where classical methods experience difficulty (for instance, sources with nearzero kurtosis). FastKICA (Fast HSICbased Kernel ICA) is a new optimisation method for one such kernel independence measure, the HilbertSchmidt Independence Criterion (HSIC). The high computational efficiency of this approach is achieved by combining geometric optimisation techniques, specifically an approximate Newtonlike method on the orthogonal group, with accurate estimates of the gradient and Hessian based on an incomplete Cholesky decomposition. In contrast to other efficient kernelbased ICA algorithms, FastKICA is applicable to any twice differentiable kernel function. Experimental results for problems with large numbers of sources and observations indicate that FastKICA provides more accurate solutions at a given cost than gradient descent on HSIC. Comparing with other recently published ICA methods, FastKICA is competitive in terms of accuracy, relatively insensitive to local minima when initialised far from independence, and more robust towards outliers. An analysis of the local convergence properties of FastKICA is provided. 1
Nonparametric independence tests: Space partitioning and kernel approaches
 In Algorithmic Learning Theory: 19th International Conference
"... Abstract. Three simple and explicit procedures for testing the independence of two multidimensional random variables are described. Two of the associated test statistics (L1, loglikelihood) are defined when the empirical distribution of the variables is restricted to finite partitions. A third tes ..."
Abstract

Cited by 4 (3 self)
 Add to MetaCart
Abstract. Three simple and explicit procedures for testing the independence of two multidimensional random variables are described. Two of the associated test statistics (L1, loglikelihood) are defined when the empirical distribution of the variables is restricted to finite partitions. A third test statistic is defined as a kernelbased independence measure. All tests reject the null hypothesis of independence if the test statistics become large. The large deviation and limit distribution properties of all three test statistics are given. Following from these results, distributionfree strong consistent tests of independence are derived, as are asymptotically αlevel tests. The performance of the tests is evaluated experimentally on benchmark data. Consider a sample of Rd × Rd ′valued random vectors (X1, Y1),..., (Xn, Yn) with independent and identically distributed (i.i.d.) pairs defined on the same probability space. The distribution of (X, Y) is denoted by ν, while µ1 and µ2 stand for the distributions of X and Y, respectively. We are interested in testing
Kernel methods for detecting the direction of time series
 In: Proccedings of the 32nd Annual Conference of the German Classification Society (GfKl 2008
, 2009
"... Summary. We propose two kernel based methods for detecting the time direction in empirical time series. First we apply a Support Vector Machine on the finitedimensional distributions of the time series (classification method) by embedding these distributions into a Reproducing Kernel Hilbert Space. ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
Summary. We propose two kernel based methods for detecting the time direction in empirical time series. First we apply a Support Vector Machine on the finitedimensional distributions of the time series (classification method) by embedding these distributions into a Reproducing Kernel Hilbert Space. For the ARMA method we fit the observed data with an autoregressive moving average process and test whether the regression residuals are statistically independent of the past values. Whenever the dependence in one direction is significantly weaker than in the other we infer the former to be the true one. Both approaches were able to detect the direction of the true generating model for simulated data sets. We also applied our tests to a large number of real world time series. The ARMA method made a decision for a significant fraction of them, in which it was mostly correct, while the classification method did not perform as well, but still exceeded chance level.