Results 1  10
of
25
A Hilbert space embedding for distributions
 In Algorithmic Learning Theory: 18th International Conference
, 2007
"... Abstract. We describe a technique for comparing distributions without the need for density estimation as an intermediate step. Our approach relies on mapping the distributions into a reproducing kernel Hilbert space. Applications of this technique can be found in twosample tests, which are used for ..."
Abstract

Cited by 52 (25 self)
 Add to MetaCart
Abstract. We describe a technique for comparing distributions without the need for density estimation as an intermediate step. Our approach relies on mapping the distributions into a reproducing kernel Hilbert space. Applications of this technique can be found in twosample tests, which are used for determining whether two sets of observations arise from the same distribution, covariate shift correction, local learning, measures of independence, and density estimation. Kernel methods are widely used in supervised learning [1, 2, 3, 4], however they are much less established in the areas of testing, estimation, and analysis of probability distributions, where information theoretic approaches [5, 6] have long been dominant. Recent examples include [7] in the context of construction of graphical models, [8] in the context of feature extraction, and [9] in the context of independent component analysis. These methods have by and large a common issue: to compute quantities such as the mutual information, entropy, or KullbackLeibler divergence, we require sophisticated space partitioning and/or
Integrating structured biological data by kernel maximum mean discrepancy
 IN ISMB
, 2006
"... Motivation: Many problems in data integration in bioinformatics can be posed as one common question: Are two sets of observations generated by the same distribution? We propose a kernelbased statistical test for this problem, based on the fact that two distributions are different if and only if the ..."
Abstract

Cited by 52 (14 self)
 Add to MetaCart
Motivation: Many problems in data integration in bioinformatics can be posed as one common question: Are two sets of observations generated by the same distribution? We propose a kernelbased statistical test for this problem, based on the fact that two distributions are different if and only if there exists at least one function having different expectation on the two distributions. Consequently we use the maximum discrepancy between function means as the basis of a test statistic. The Maximum Mean Discrepancy (MMD) can take advantage of the kernel trick, which allows us to apply it not only to vectors, but strings, sequences, graphs, and other common structured data types arising in molecular biology. Results: We study the practical feasibility of an MMDbased test on three central data integration tasks: Testing crossplatform comparability of microarray data, cancer diagnosis, and datacontent based schema matching for two different protein function classification schemas. In all of these experiments, including highdimensional ones, MMD is very accurate in finding samples that were generated from the same distribution, and outperforms its best competitors. Conclusions: We have defined a novel statistical test of whether two samples are from the same distribution, compatible with both multivariate and structured data, that is fast, easy to implement, and works well, as confirmed by our experiments.
A kernel method for the two sample problem
 ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 19
, 2007
"... We propose a framework for analyzing and comparing distributions, allowing us to design statistical tests to determine if two samples are drawn from different distributions. Our test statistic is the largest difference in expectations over functions in the unit ball of a reproducing kernel Hilbert ..."
Abstract

Cited by 38 (13 self)
 Add to MetaCart
We propose a framework for analyzing and comparing distributions, allowing us to design statistical tests to determine if two samples are drawn from different distributions. Our test statistic is the largest difference in expectations over functions in the unit ball of a reproducing kernel Hilbert space (RKHS). We present two tests based on large deviation bounds for the test statistic, while a third is based on the asymptotic distribution of this statistic. The test statistic can be computed in quadratic time, although efficient linear time approximations are available. Several classical metrics on distributions are recovered when the function space used to compute the difference in expectations is allowed to be more general (eg. a Banach space). We apply our twosample tests to a variety of problems, including attribute matching for databases using the Hungarian marriage method, where they perform strongly. Excellent performance is also obtained when comparing distributions over graphs, for which these are the first such tests.
Testing for Homogeneity with Kernel Fisher Discriminant Analysis
"... We propose to investigate test statistics for testing homogeneity based on kernel Fisher discriminant analysis. Asymptotic null distributions under null hypothesis are derived, and consistency against fixed alternatives is assessed. Finally, experimental evidence of the performance of the proposed a ..."
Abstract

Cited by 15 (9 self)
 Add to MetaCart
We propose to investigate test statistics for testing homogeneity based on kernel Fisher discriminant analysis. Asymptotic null distributions under null hypothesis are derived, and consistency against fixed alternatives is assessed. Finally, experimental evidence of the performance of the proposed approach on both artificial and real datasets is provided. 1
Dimensionality Reduction for Density Ratio Estimation in Highdimensional Spaces
 NEURAL NETWORKS, VOL.23, NO.1, PP.44–59
, 2010
"... The ratio of two probability density functions is becoming a quantity of interest these days in the machine learning and data mining communities since it can be used for various data processing tasks such as nonstationarity adaptation, outlier detection, and feature selection. Recently, several met ..."
Abstract

Cited by 11 (9 self)
 Add to MetaCart
The ratio of two probability density functions is becoming a quantity of interest these days in the machine learning and data mining communities since it can be used for various data processing tasks such as nonstationarity adaptation, outlier detection, and feature selection. Recently, several methods have been developed for directly estimating the density ratio without going through density estimation and were shown to work well in various practical problems. However, these methods still perform rather poorly when the dimensionality of the data domain is high. In this paper, we propose to incorporate a dimensionality reduction scheme into a densityratio estimation procedure and experimentally show that the estimation accuracy in highdimensional cases can be improved.
Adaptive Concept Drift Detection
"... An established method to detect concept drift in data streams is to perform statistical hypothesis testing on the multivariate data in the stream. Statistical decision theory offers rankbased statistics for this task. However, these statistics depend on a fixed set of characteristics of the underly ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
An established method to detect concept drift in data streams is to perform statistical hypothesis testing on the multivariate data in the stream. Statistical decision theory offers rankbased statistics for this task. However, these statistics depend on a fixed set of characteristics of the underlying distribution. Thus, they work well whenever the change in the underlying distribution affects these properties measured by the statistic, but they perform not very well, if the drift influences the characteristics caught by the test statistic only to a small degree. To address this problem, we present three novel drift detection tests, whose test statistics are dynamically adapted to match the actual data at hand. The first one is based on a rank statistic on density estimates for a binary representation of the data, the second compares average margins of a linear classifier induced by the 1norm support vector machine (SVM), and the last one is based on the average zeroone or sigmoid error rate of an SVM classifier. Experiments show that the margin and errorbased tests outperform the multivariate WaldWolfowitz test for concept drift detection. We also show that the tests work even if the drift is gradual in nature and that the new methods are faster than the WaldWolfowitz test. 1
KullbackLeibler Divergence Estimation of Continuous Distributions
 Proceedings of IEEE International Symposium on Information Theory
, 2008
"... Abstract—We present a method for estimating the KL divergence between continuous densities and we prove it converges almost surely. Divergence estimation is typically solved estimating the densities first. Our main result shows this intermediate step is unnecessary and that the divergence can be eit ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
Abstract—We present a method for estimating the KL divergence between continuous densities and we prove it converges almost surely. Divergence estimation is typically solved estimating the densities first. Our main result shows this intermediate step is unnecessary and that the divergence can be either estimated using the empirical cdf or knearestneighbour density estimation, which does not converge to the true measure for finite k. The convergence proof is based on describing the statistics of our estimator using waitingtimes distributions, as the exponential or Erlang. We illustrate the proposed estimators and show how they compare to existing methods based on density estimation, and we also outline how our divergence estimators can be used for solving the twosample problem. I.
A Fast, Consistent Kernel TwoSample Test
"... A kernel embedding of probability distributions into reproducing kernel Hilbert spaces (RKHS) has recently been proposed, which allows the comparison of two probability measures P and Q based on the distance between their respective embeddings: for a sufficiently rich RKHS, this distance is zero if ..."
Abstract

Cited by 6 (4 self)
 Add to MetaCart
A kernel embedding of probability distributions into reproducing kernel Hilbert spaces (RKHS) has recently been proposed, which allows the comparison of two probability measures P and Q based on the distance between their respective embeddings: for a sufficiently rich RKHS, this distance is zero if and only if P and Q coincide. In using this distance as a statistic for a test of whether two samples are from different distributions, a major difficulty arises in computing the significance threshold, since the empirical statistic has as its null distribution (where P = Q) an infinite weighted sum of χ 2 random variables. Prior finite sample approximations to the null distribution include using bootstrap resampling, which yields a consistent estimate but is computationally costly; and fitting a parametric model with the low order moments of the test statistic, which can work well in practice but has no consistency or accuracy guarantees. The main result of the present work is a novel estimate of the null distribution, computed from the eigenspectrum of the Gram matrix on the aggregate sample from P and Q, and having lower computational cost than the bootstrap. A proof of consistency of this estimate is provided. The performance of the null distribution estimate is compared with the bootstrap and parametric approaches on an artificial example, high dimensional multivariate data, and text. 1
Estimation of information theoretic measures for continuous random variables
 NIPS
"... We analyze the estimation of information theoretic measures of continuous random variables such as: differential entropy, mutual information or KullbackLeibler divergence. The objective of this paper is twofold. First, we prove that the information theoretic measure estimates using the knearestn ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
We analyze the estimation of information theoretic measures of continuous random variables such as: differential entropy, mutual information or KullbackLeibler divergence. The objective of this paper is twofold. First, we prove that the information theoretic measure estimates using the knearestneighbor density estimation with fixed k converge almost surely, even though the knearestneighbor density estimation with fixed k does not converge to its true measure. Second, we show that the information theoretic measure estimates do not converge for k growing linearly with the number of samples. Nevertheless, these nonconvergent estimates can be used for solving the twosample problem and assessing if two random variables are independent. We show that the twosample and independence tests based on these nonconvergent estimates compare favorably with the maximum mean discrepancy test and the Hilbert Schmidt independence criterion. 1
Testing for Homogeneity with KFDA Testing for Homogeneity with Kernel Fisher Discriminant Analysis
, 2008
"... We propose to investigate test statistics for testing homogeneity in reproducing kernel Hilbert spaces. Asymptotic null distributions under null hypothesis are derived, and consistency under fixed and local alternatives is assessed. Finally, experimental evidence of the performance of the proposed a ..."
Abstract
 Add to MetaCart
We propose to investigate test statistics for testing homogeneity in reproducing kernel Hilbert spaces. Asymptotic null distributions under null hypothesis are derived, and consistency under fixed and local alternatives is assessed. Finally, experimental evidence of the performance of the proposed approach on both artificial data and a speaker verification task is provided.