Results 1 -
6 of
6
Kernel Bayes ’ Rule
"... A nonparametric kernel-based method for realizing Bayes ’ rule is proposed, based on kernel representations of probabilities in reproducing kernel Hilbert spaces. The prior and conditional probabilities are expressed as empirical kernel mean and covariance operators, respectively, and the kernel mea ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
A nonparametric kernel-based method for realizing Bayes ’ rule is proposed, based on kernel representations of probabilities in reproducing kernel Hilbert spaces. The prior and conditional probabilities are expressed as empirical kernel mean and covariance operators, respectively, and the kernel mean of the posterior distribution is computed in the form of a weighted sample. The kernel Bayes ’ rule can be applied to a wide variety of Bayesian inference problems: we demonstrate Bayesian computation without likelihood, and filtering with a nonparametric statespace model. A consistency rate for the posterior estimate is established. 1
Spatially-Aware Comparison and Consensus for Clusterings ∗
"... This paper proposes a new distance metric between clusterings that incorporates information about the spatial distribution of points and clusters. Our approach builds on the idea of a Hilbert space-based representation of clusters as a combination of the representations of their constituent points. ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This paper proposes a new distance metric between clusterings that incorporates information about the spatial distribution of points and clusters. Our approach builds on the idea of a Hilbert space-based representation of clusters as a combination of the representations of their constituent points. We use this representation and the underlying metric to design a spatially-aware consensus clustering procedure. This consensus procedure is implemented via a novel reduction to Euclidean clustering, and is both simple and efficient. All of our results apply to both soft and hard clusterings. We accompany these algorithms with a detailed experimental evaluation that demonstrates the efficiency and quality of our techniques.
Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence Gaussianity Measures for Detecting the Direction of Causal Time Series ∗
"... We conjecture that the distribution of the timereversed residuals of a causal linear process is closer to a Gaussian than the distribution of the noise used to generate the process in the forward direction. This property is demonstrated for causal AR(1) processes assuming that all the cumulants of t ..."
Abstract
- Add to MetaCart
We conjecture that the distribution of the timereversed residuals of a causal linear process is closer to a Gaussian than the distribution of the noise used to generate the process in the forward direction. This property is demonstrated for causal AR(1) processes assuming that all the cumulants of the distribution of the noise are defined. Based on this observation, it is possible to design a decision rule for detecting the direction of time series that can be described as linear processes: The correct direction (forward in time) is the one in which the residuals from a linear fit to the time series are less Gaussian. A series of experiments with simulated and real-world data illustrate the superior results of the proposed rule when compared with other stateof-the-art methods based on independence tests. 1
Supplementary material for: Generalizing from Several Related Classification Tasks to a New
"... The function k: Ω×Ω → R is called a kernel on Ω if the matrix (k(xi, xj))1≤i,j≤n is positive semidefinite for all positive integers n and all x1,..., xn ∈ Ω. It is well-known that if k is a kernel on Ω, then there exists a Hilbert space ˜ H and ˜ Φ: Ω → ˜ H such that k(x, x ′ ) = 〈 ˜ Φ(x), ˜ Φ( ..."
Abstract
- Add to MetaCart
The function k: Ω×Ω → R is called a kernel on Ω if the matrix (k(xi, xj))1≤i,j≤n is positive semidefinite for all positive integers n and all x1,..., xn ∈ Ω. It is well-known that if k is a kernel on Ω, then there exists a Hilbert space ˜ H and ˜ Φ: Ω → ˜ H such that k(x, x ′ ) = 〈 ˜ Φ(x), ˜ Φ(x ′) 〉 ˜ H. While ˜ H and ˜ Φ are not uniquely determined by k, the Hilbert space of functions Hk = {〈v, ˜ Φ(·) 〉 ˜ H: v ∈ ˜ H} is uniquely determined by k, and is called the reproducing kernel Hilbert space (RKHS) of k. One way to envision Hk is as follows. Define Φ(x): = k(·, x), which is called the canonical feature map associated with k. Then Hk is the completion of the span of {Φ(x) : x ∈ Ω}. We also recall the reproducing property, which states that 〈f, Φ(x) 〉 = f(x) for all f ∈ Hk. A kernel k on a compact metric space Ω is said to be universal when its RKHS is dense in C(Ω), the set of continuous functions on Ω, with respect to the supremum norm. Universal kernels are important for establishing universal consistency of many learning algorithms. If k is a kernel on Ω, then k ∗ (x, x ′ ):= k(x, x ′) k(x, x)k(x, x is the associated normalized kernel. If a kernel is universal, then so is its associated normalized kernel. For example, the exponential kernel k(x, x ′ ) = exp(κ〈x, x ′ 〉 R d), κ> 0, can be shown to be universal on R d through a Taylor series argument. Consequently, the Gaussian kernel kσ(x, x ′ ):= exp ( 1 σ 2 〈x, x ′ 〉) exp ( 1 2σ 2 ‖x ‖ 2) exp ( 1 2σ 2 ‖x ′ ‖ 2) is universal, being the normalized kernel associated with the exponential kernel with κ = 1/σ 2. See [1] for additional details and discussion. 2 Implementation We describe an implementation of our methodology for the hinge loss, ℓ(t, y) = max(0, 1 − yt). To make the presentation more concise, we will employ the extended feature representation ˜ X = ( ̂ PX, X), and we will also employ a single index on these variables and on the labels. Thus the training data are ( ˜ Xi, Yi)1≤i≤M, where M = ∑ N i=1 ni, and we seek a solution to min f∈H k M∑ i=1 ci max(0, 1 − Yif ( ˜ Xi)) + 1
Learning in Hilbert vs. Banach Spaces: A Measure Embedding Viewpoint
"... The goal of this paper is to investigate the advantages and disadvantages of learning in Banach spaces over Hilbert spaces. While many works have been carried out in generalizing Hilbert methods to Banach spaces, in this paper, we consider the simple problem of learning a Parzen window classifier in ..."
Abstract
- Add to MetaCart
The goal of this paper is to investigate the advantages and disadvantages of learning in Banach spaces over Hilbert spaces. While many works have been carried out in generalizing Hilbert methods to Banach spaces, in this paper, we consider the simple problem of learning a Parzen window classifier in a reproducing kernel Banach space (RKBS)—which is closely related to the notion of embedding probability measures into an RKBS—in order to carefully understand its pros and cons over the Hilbert space classifier. We show that while this generalization yields richer distance measures on probabilities compared to its Hilbert space counterpart, it however suffers from serious computational drawback limiting its practical applicability, which therefore demonstrates the need for developing efficient learning algorithms in Banach spaces. 1
Generalizing from Several Related Classification Tasks to a New Unlabeled Sample
"... We consider the problem of assigning class labels to an unlabeled test data set, given several labeled training data sets drawn from similar distributions. This problem arises in several applications where data distributions fluctuate because of biological, technical, or other sources of variation. ..."
Abstract
- Add to MetaCart
We consider the problem of assigning class labels to an unlabeled test data set, given several labeled training data sets drawn from similar distributions. This problem arises in several applications where data distributions fluctuate because of biological, technical, or other sources of variation. We develop a distributionfree, kernel-based approach to the problem. This approach involves identifying an appropriate reproducing kernel Hilbert space and optimizing a regularized empirical risk over the space. We present generalization error analysis, describe universal kernels, and establish universal consistency of the proposed methodology. Experimental results on flow cytometry data are presented. 1

