Results 1  10
of
12
Hilbert Space Embeddings and Metrics on Probability Measures
"... A Hilbert space embedding for probability measures has recently been proposed, with applications including dimensionality reduction, homogeneity testing, and independence testing. This embedding represents any probability measure as a mean element in a reproducing kernel Hilbert space (RKHS). A pseu ..."
Abstract

Cited by 21 (9 self)
 Add to MetaCart
A Hilbert space embedding for probability measures has recently been proposed, with applications including dimensionality reduction, homogeneity testing, and independence testing. This embedding represents any probability measure as a mean element in a reproducing kernel Hilbert space (RKHS). A pseudometric on the space of probability measures can be defined as the distance between distribution embeddings: we denote this as γk, indexed by the kernel function k that defines the inner product in the RKHS. We present three theoretical properties of γk. First, we consider the question of determining the conditions on the kernel k for which γk is a metric: such k are denoted characteristic kernels. Unlike pseudometrics, a metric is zero only when two distributions coincide, thus ensuring the RKHS embedding maps all distributions uniquely (i.e., the embedding is injective). While previously published conditions may apply only in restricted circumstances (e.g., on compact domains), and are difficult to check, our conditions are straightforward and intuitive: integrally strictly positive definite kernels are characteristic. Alternatively, if a bounded continuous kernel is translationinvariant on R d, then it is characteristic if and only if the support of its Fourier transform is the entire R d.
Kernel Belief Propagation
"... We propose a nonparametric generalization of belief propagation, Kernel Belief Propagation (KBP), for pairwise Markov random fields. Messages are represented as functions in a reproducing kernel Hilbert space (RKHS), and message updates are simple linear operations in the RKHS. KBP makes none of the ..."
Abstract

Cited by 16 (6 self)
 Add to MetaCart
We propose a nonparametric generalization of belief propagation, Kernel Belief Propagation (KBP), for pairwise Markov random fields. Messages are represented as functions in a reproducing kernel Hilbert space (RKHS), and message updates are simple linear operations in the RKHS. KBP makes none of the assumptions commonly required in classical BP algorithms: the variables need not arise from a finite domain or a Gaussian distribution, nor must their relations take any particular parametric form. Rather, the relations between variables are represented implicitly, and are learned nonparametrically from training data. KBP has the advantage that it may be used on any domain where kernels are defined (Rd, strings, groups), even where explicit parametric models are not known, or closed form expressions for the BP updates do not exist. The computational cost of message updates in KBP is polynomial in the training data size. We also propose a constant time approximate message update procedure by representing messages using a small number of basis functions. In experiments, we apply KBP to image denoising, depth prediction from still images, and protein configuration prediction: KBP is faster than competing classical and nonparametric approaches (by orders of magnitude, in some cases), while providing significantly more accurate results. 1
Kernel Choice and Classifiability for RKHS Embeddings of Probability Distributions
"... Embeddings of probability measures into reproducing kernel Hilbert spaces have been proposed as a straightforward and practical means of representing and comparing probabilities. In particular, the distance between embeddings (the maximum mean discrepancy, or MMD) has several key advantages over man ..."
Abstract

Cited by 10 (7 self)
 Add to MetaCart
Embeddings of probability measures into reproducing kernel Hilbert spaces have been proposed as a straightforward and practical means of representing and comparing probabilities. In particular, the distance between embeddings (the maximum mean discrepancy, or MMD) has several key advantages over many classical metrics on distributions, namely easy computability, fast convergence and low bias of finite sample estimates. An important requirement of the embedding RKHS is that it be characteristic: in this case, the MMD between two distributions is zero if and only if the distributions coincide. Three new results on the MMD are introduced in the present study. First, it is established that MMD corresponds to the optimal risk of a kernel classifier, thus forming a natural link between the distance between distributions and their ease of classification. An important consequence is that a kernel must be characteristic to guarantee classifiability between distributions in the RKHS. Second, the class of characteristic kernels is broadened to incorporate all strictly positive definite kernels: these include nontranslation invariant kernels and kernels on noncompact domains. Third, a generalization of the MMD is proposed for families of kernels, as the supremum over MMDs on a class of kernels (for instance the Gaussian kernels with different bandwidths). This extension is necessary to obtain a single distance measure if a large selection or class of characteristic kernels is potentially appropriate. This generalization is reasonable, given that it corresponds to the problem of learning the kernel by minimizing the risk of the corresponding kernel classifier. The generalized MMD is shown to have consistent finite sample estimates, and its performance is demonstrated on a homogeneity testing example. 1
A Fast, Consistent Kernel TwoSample Test
"... A kernel embedding of probability distributions into reproducing kernel Hilbert spaces (RKHS) has recently been proposed, which allows the comparison of two probability measures P and Q based on the distance between their respective embeddings: for a sufficiently rich RKHS, this distance is zero if ..."
Abstract

Cited by 6 (4 self)
 Add to MetaCart
A kernel embedding of probability distributions into reproducing kernel Hilbert spaces (RKHS) has recently been proposed, which allows the comparison of two probability measures P and Q based on the distance between their respective embeddings: for a sufficiently rich RKHS, this distance is zero if and only if P and Q coincide. In using this distance as a statistic for a test of whether two samples are from different distributions, a major difficulty arises in computing the significance threshold, since the empirical statistic has as its null distribution (where P = Q) an infinite weighted sum of χ 2 random variables. Prior finite sample approximations to the null distribution include using bootstrap resampling, which yields a consistent estimate but is computationally costly; and fitting a parametric model with the low order moments of the test statistic, which can work well in practice but has no consistency or accuracy guarantees. The main result of the present work is a novel estimate of the null distribution, computed from the eigenspectrum of the Gram matrix on the aggregate sample from P and Q, and having lower computational cost than the bootstrap. A proof of consistency of this estimate is provided. The performance of the null distribution estimate is compared with the bootstrap and parametric approaches on an artificial example, high dimensional multivariate data, and text. 1
On the relation between universality, characteristic kernels and RKHS embedding of measures
 Proc. 13 th International Conference on Artificial Intelligence and Statistics, volume 9 of Workshop and Conference Proceedings. JMLR, 2010a
"... embedding of measures ..."
Universal kernels on nonstandard input spaces
 in Advances in Neural Information Processing Systems
, 2010
"... During the last years support vector machines (SVMs) have been successfully applied in situations where the input space X is not necessarily a subset of R d. Examples include SVMs for the analysis of histograms or colored images, SVMs for text classification and web mining, and SVMs for applications ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
During the last years support vector machines (SVMs) have been successfully applied in situations where the input space X is not necessarily a subset of R d. Examples include SVMs for the analysis of histograms or colored images, SVMs for text classification and web mining, and SVMs for applications from computational biology using, e.g., kernels for trees and graphs. Moreover, SVMs are known to be consistent to the Bayes risk, if either the input space is a complete separable metric space and the reproducing kernel Hilbert space (RKHS) H ⊂ Lp(PX) is dense, or if the SVM uses a universal kernel k. So far, however, there are no kernels of practical interest known that satisfy these assumptions, if X ̸ ⊂ R d. We close this gap by providing a general technique based on Taylortype kernels to explicitly construct universal kernels on compact metric spaces which are not subset of R d. We apply this technique for the following special cases: universal kernels on the set of probability measures, universal kernels based on Fourier transforms, and universal kernels for signal processing. 1
Ranking with kernels in Fourier space
"... In typical ranking problems the total number n of items to be ranked is relatively large, but each data instance involves only k << n items. This paper examines the structure of such partial rankings in Fourier space. Specifically, we develop a kernel–based framework for solving ranking problems, de ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
In typical ranking problems the total number n of items to be ranked is relatively large, but each data instance involves only k << n items. This paper examines the structure of such partial rankings in Fourier space. Specifically, we develop a kernel–based framework for solving ranking problems, define some canonical kernels on permutations, and show that by transforming to Fourier space, the complexity of computing the kernel between two partial rankings can be reduced from O((n−k)! 2) to O((2k) 2k+3). 1
Universality, Characteristic Kernels and RKHS Embedding of Measures
"... Over the last few years, two different notions of positive definite (pd) kernels—universal and characteristic—have been developing in parallel in machine learning: universal kernels are proposed in the context of achieving the Bayes risk by kernelbased classification/regression algorithms while cha ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Over the last few years, two different notions of positive definite (pd) kernels—universal and characteristic—have been developing in parallel in machine learning: universal kernels are proposed in the context of achieving the Bayes risk by kernelbased classification/regression algorithms while characteristic kernels are introduced in the context of distinguishing probability measures by embedding them into a reproducing kernel Hilbert space (RKHS). However, the relation between these two notions is not well understood. The main contribution of this paper is to clarify the relation between universal and characteristic kernels by presenting a unifying study relating them to RKHS embedding of measures, in addition to clarifying their relation to other common notions of strictly pd, conditionally strictly pd and integrally strictly pd kernels. For radial kernels on Rd, all these notions are shown to be equivalent.
Kernels based tests with nonasymptotic bootstrap approaches for twosample problems
, 2012
"... Considering either two independent i.i.d. samples, or two independent samples generated from a heteroscedastic regression model, or two independent Poisson processes, we address the question of testing equality of their respective distributions. We first propose single testing procedures based on a ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Considering either two independent i.i.d. samples, or two independent samples generated from a heteroscedastic regression model, or two independent Poisson processes, we address the question of testing equality of their respective distributions. We first propose single testing procedures based on a general symmetric kernel. The corresponding critical values are chosen from a wild or permutation bootstrap approach, and the obtained tests are exactly (and not just asymptotically) of level α. We then introduce an aggregation method, which enables to overcome the difficulty of choosing a kernel and/or the parameters of the kernel. We derive nonasymptotic properties for the aggregated tests, proving that they may be optimal in a classical statistical sense.
DISCUSSION OF: BROWNIAN DISTANCE COVARIANCE
"... 1. Introduction. A dependence statistic, the Brownian Distance Covariance, has been proposed for use in dependence measurement and independence testing: we refer to this contribution henceforth as SR [we also note the earlier work on this topic of Székely, Rizzo and Bakirov (2007)]. Some advantages ..."
Abstract
 Add to MetaCart
1. Introduction. A dependence statistic, the Brownian Distance Covariance, has been proposed for use in dependence measurement and independence testing: we refer to this contribution henceforth as SR [we also note the earlier work on this topic of Székely, Rizzo and Bakirov (2007)]. Some advantages of the authors’ approach are that the random variables X and Y being tested may have arbitrary dimension