Results 1  10
of
19
Kernel Bayes ’ Rule
"... A nonparametric kernelbased method for realizing Bayes ’ rule is proposed, based on kernel representations of probabilities in reproducing kernel Hilbert spaces. The prior and conditional probabilities are expressed as empirical kernel mean and covariance operators, respectively, and the kernel mea ..."
Abstract

Cited by 8 (3 self)
 Add to MetaCart
A nonparametric kernelbased method for realizing Bayes ’ rule is proposed, based on kernel representations of probabilities in reproducing kernel Hilbert spaces. The prior and conditional probabilities are expressed as empirical kernel mean and covariance operators, respectively, and the kernel mean of the posterior distribution is computed in the form of a weighted sample. The kernel Bayes ’ rule can be applied to a wide variety of Bayesian inference problems: we demonstrate Bayesian computation without likelihood, and filtering with a nonparametric statespace model. A consistency rate for the posterior estimate is established. 1
Modelling transition dynamics in mdps with rkhs embeddings
 In arXiv
, 2012
"... We propose a new, nonparametric approach to learning and representing transition dynamics in Markov decision processes (MDPs), which can be combined easily with dynamic programming methods for policy optimisation and value estimation. This approach makes use of a recently developed representation of ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
We propose a new, nonparametric approach to learning and representing transition dynamics in Markov decision processes (MDPs), which can be combined easily with dynamic programming methods for policy optimisation and value estimation. This approach makes use of a recently developed representation of conditional distributions as embeddings in a reproducing kernel Hilbert space (RKHS). Such representations bypass the need for estimating transition probabilities or densities, and apply to any domain on which kernels can be defined. This avoids the need to calculate intractable integrals, since expectations are represented as RKHS inner products whose computation has linear complexity in the number of points used to represent the embedding. We provide guarantees for the proposed applications in MDPs: in the context of a value iteration algorithm, we prove convergence to either the optimal policy, or to the closest projection of the optimal policy in our model class (an RKHS), under reasonable assumptions. In experiments, we investigate a learning task in a typical classical control setting (the underactuated pendulum), and on a navigation problem where only images from a sensor are observed. For policy optimisation we compare with leastsquares policy iteration where a Gaussian process is used for value function estimation. For value estimation we also compare to the NPDP method. Our approach achieves better performance in all experiments.
SpatiallyAware Comparison and Consensus for Clusterings ∗
"... This paper proposes a new distance metric between clusterings that incorporates information about the spatial distribution of points and clusters. Our approach builds on the idea of a Hilbert spacebased representation of clusters as a combination of the representations of their constituent points. ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
This paper proposes a new distance metric between clusterings that incorporates information about the spatial distribution of points and clusters. Our approach builds on the idea of a Hilbert spacebased representation of clusters as a combination of the representations of their constituent points. We use this representation and the underlying metric to design a spatiallyaware consensus clustering procedure. This consensus procedure is implemented via a novel reduction to Euclidean clustering, and is both simple and efficient. All of our results apply to both soft and hard clusterings. We accompany these algorithms with a detailed experimental evaluation that demonstrates the efficiency and quality of our techniques.
Analysis of kernel mean matching under covariate shift
 In ICML
, 2012
"... In real supervised learning scenarios, it is not uncommon that the training and test sample follow different probability distributions, thus rendering the necessity to correct the sampling bias. Focusing on a particular covariate shift problem, we derive high probability confidence bounds for the ke ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
In real supervised learning scenarios, it is not uncommon that the training and test sample follow different probability distributions, thus rendering the necessity to correct the sampling bias. Focusing on a particular covariate shift problem, we derive high probability confidence bounds for the kernel mean matching (KMM) estimator, whose convergence rate turns out to depend on some regularity measure of the regression function and also on some capacity measure of the kernel. By comparing KMM with the natural plugin estimator, we establish the superiority of the former hence provide concrete evidence/understanding to the effectiveness of KMM under covariate shift. 1.
Fast prediction of new feature utility
 In ICML
, 2012
"... We study the new feature utility prediction problem: statistically testing whether adding a feature to the data representation can improve the accuracy of a current predictor. In many applications, identifying new features is the main pathway for improving performance. However, evaluating every pote ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
We study the new feature utility prediction problem: statistically testing whether adding a feature to the data representation can improve the accuracy of a current predictor. In many applications, identifying new features is the main pathway for improving performance. However, evaluating every potential feature by retraining the predictor can be costly. The paper describes an efficient, learnerindependent technique for estimating new feature utility without retraining based on the current predictor’s outputs. The method is obtained by deriving a connection between loss reduction potential and the new feature’s correlation with the loss gradient of the current predictor. This leads to a simple yet powerful hypothesis testing procedure, for which we prove consistency. Our theoretical analysis is accompanied by empirical evaluation on standard benchmarks and a largescale industrial dataset. 1.
Generalizing from Several Related Classification Tasks to a New Unlabeled Sample
"... We consider the problem of assigning class labels to an unlabeled test data set, given several labeled training data sets drawn from similar distributions. This problem arises in several applications where data distributions fluctuate because of biological, technical, or other sources of variation. ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
We consider the problem of assigning class labels to an unlabeled test data set, given several labeled training data sets drawn from similar distributions. This problem arises in several applications where data distributions fluctuate because of biological, technical, or other sources of variation. We develop a distributionfree, kernelbased approach to the problem. This approach involves identifying an appropriate reproducing kernel Hilbert space and optimizing a regularized empirical risk over the space. We present generalization error analysis, describe universal kernels, and establish universal consistency of the proposed methodology. Experimental results on flow cytometry data are presented. 1
Universality, Characteristic Kernels and RKHS Embedding of Measures
"... Over the last few years, two different notions of positive definite (pd) kernels—universal and characteristic—have been developing in parallel in machine learning: universal kernels are proposed in the context of achieving the Bayes risk by kernelbased classification/regression algorithms while cha ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Over the last few years, two different notions of positive definite (pd) kernels—universal and characteristic—have been developing in parallel in machine learning: universal kernels are proposed in the context of achieving the Bayes risk by kernelbased classification/regression algorithms while characteristic kernels are introduced in the context of distinguishing probability measures by embedding them into a reproducing kernel Hilbert space (RKHS). However, the relation between these two notions is not well understood. The main contribution of this paper is to clarify the relation between universal and characteristic kernels by presenting a unifying study relating them to RKHS embedding of measures, in addition to clarifying their relation to other common notions of strictly pd, conditionally strictly pd and integrally strictly pd kernels. For radial kernels on Rd, all these notions are shown to be equivalent.
Proceedings of the TwentySecond International Joint Conference on Artificial Intelligence Gaussianity Measures for Detecting the Direction of Causal Time Series ∗
"... We conjecture that the distribution of the timereversed residuals of a causal linear process is closer to a Gaussian than the distribution of the noise used to generate the process in the forward direction. This property is demonstrated for causal AR(1) processes assuming that all the cumulants of t ..."
Abstract
 Add to MetaCart
We conjecture that the distribution of the timereversed residuals of a causal linear process is closer to a Gaussian than the distribution of the noise used to generate the process in the forward direction. This property is demonstrated for causal AR(1) processes assuming that all the cumulants of the distribution of the noise are defined. Based on this observation, it is possible to design a decision rule for detecting the direction of time series that can be described as linear processes: The correct direction (forward in time) is the one in which the residuals from a linear fit to the time series are less Gaussian. A series of experiments with simulated and realworld data illustrate the superior results of the proposed rule when compared with other stateoftheart methods based on independence tests. 1