Results 1  10
of
20
M.: Conditional mean embeddings as regressors
 ICML
, 2012
"... We demonstrate an equivalence between reproducing kernel Hilbert space (RKHS) embeddings of conditional distributions and vectorvalued regressors. This connection introduces a natural regularized loss function which the RKHS embeddings minimise, providing an intuitive understanding of the embedding ..."
Abstract

Cited by 22 (9 self)
 Add to MetaCart
We demonstrate an equivalence between reproducing kernel Hilbert space (RKHS) embeddings of conditional distributions and vectorvalued regressors. This connection introduces a natural regularized loss function which the RKHS embeddings minimise, providing an intuitive understanding of the embeddings and a justification for their use. Furthermore, the equivalence allows the application of vectorvalued regression methods and results to the problem of learning conditional distributions. Using this link we derive a sparse version of the embedding by considering alternative formulations. Further, by applying convergence results for vectorvalued regression to the embedding problem we derive minimax convergence rates which are O(log(n)/n) – compared to current state of the art rates of O(n−1/4) – and are valid under milder and more intuitive assumptions. These minimax upper rates coincide with lower rates up to a logarithmic factor, showing that the embedding method achieves nearly optimal rates. We study our sparse embedding algorithm in a reinforcement learning task where the algorithm shows significant improvement in sparsity over an incomplete Cholesky decomposition. 1. Introduction/Motivation In recent years a framework for embedding probability distributions into reproducing kernel Hilbert spaces (RKHS)
Hilbert Space Embeddings of Predictive State Representations
"... Predictive State Representations (PSRs) are an expressive class of models for controlled stochastic processes. PSRs represent state as a set of predictions of future observable events. Because PSRs are defined entirely in terms of observable data, statistically consistent estimates of PSR parameters ..."
Abstract

Cited by 12 (2 self)
 Add to MetaCart
(Show Context)
Predictive State Representations (PSRs) are an expressive class of models for controlled stochastic processes. PSRs represent state as a set of predictions of future observable events. Because PSRs are defined entirely in terms of observable data, statistically consistent estimates of PSR parameters can be learned efficiently by manipulating moments of observed training data. Most learning algorithms for PSRs have assumed that actions and observations are finite with low cardinality. In this paper, we generalize PSRs to infinite sets of observations and actions, using the recent concept of Hilbert space embeddings of distributions. The essence is to represent the state as one or more nonparametric conditional embedding operators in a Reproducing Kernel Hilbert Space (RKHS) and leverage recent work in kernel methods to estimate, predict, and update the representation. We show that these Hilbert space embeddings of PSRs are able to gracefully handle continuous actions and observations, and that our learned models outperform competing system identification algorithms on several prediction benchmarks. 1
Smooth Operators
"... We develop a generic approach to form smooth versions of basic mathematical operations like multiplication, composition, change of measure, and conditional expectation, among others. Operations which result in functions outside the reproducing kernel Hilbert space (such as the product of two RKHS fu ..."
Abstract

Cited by 8 (4 self)
 Add to MetaCart
We develop a generic approach to form smooth versions of basic mathematical operations like multiplication, composition, change of measure, and conditional expectation, among others. Operations which result in functions outside the reproducing kernel Hilbert space (such as the product of two RKHS functions) are approximated via a natural cost function, such that the solution is guaranteed to be in the targeted RKHS. This approximation problem is reduced to a regression problem using an adjoint trick, and solved in a vectorvalued RKHS, consisting of continuous, linear, smooth operators which map from an input, realvalued RKHS to the desired target RKHS. Important constraints, such as an almost everywhere positive density, can be enforced or approximated naturally in this framework, using convex constraints on the operators. Finally, smooth operators can be composed to accomplish more complex machine learning tasks, such as the sum rule and kernelized approximate Bayesian inference, where stateoftheart convergence rates are obtained. 1.
Path Integral Control by Reproducing Kernel Hilbert Space Embedding
"... We present an embedding of stochastic optimal control problems, of the so called path integral form, into reproducing kernel Hilbert spaces. Using consistent, sample based estimates of the embedding leads to a modelfree, nonparametric approach for calculation of an approximate solution to the cont ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
We present an embedding of stochastic optimal control problems, of the so called path integral form, into reproducing kernel Hilbert spaces. Using consistent, sample based estimates of the embedding leads to a modelfree, nonparametric approach for calculation of an approximate solution to the control problem. This formulation admits a decomposition of the problem into an invariant and task dependent component. Consequently, we make much more efficient use of the sample data compared to previous sample based approaches in this domain, e.g., by allowing sample reuse across tasks. Numerical examples on test problems, which illustrate the sample efficiency, are provided. 1
Monte Carlo Filtering Using Kernel Embedding of Distributions

, 2014
"... Recent advances of kernel methods have yielded a framework for representing probabilities using a reproducing kernel Hilbert space, called kernel embedding of distributions. In this paper, we propose a Monte Carlo filtering algorithm based on kernel embeddings. The proposed method is applied to stat ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Recent advances of kernel methods have yielded a framework for representing probabilities using a reproducing kernel Hilbert space, called kernel embedding of distributions. In this paper, we propose a Monte Carlo filtering algorithm based on kernel embeddings. The proposed method is applied to statespace models where sampling from the transition model is possible, while the observation model is to be learned from training samples without assuming a parametric model. As a theoretical basis of the proposed method, we prove consistency of the Monte Carlo method combined with kernel embeddings. Experimental results on synthetic models and real visionbased robot localization confirm the effectiveness of the proposed approach.
Doubly robust offpolicy evaluation for reinforcement learning.
, 2015
"... Abstract We study the problem of evaluating a policy that is different from the one that generates data. Such a problem, known as offpolicy evaluation in reinforcement learning (RL), is encountered whenever one wants to estimate the value of a new solution, based on historical data, before actuall ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Abstract We study the problem of evaluating a policy that is different from the one that generates data. Such a problem, known as offpolicy evaluation in reinforcement learning (RL), is encountered whenever one wants to estimate the value of a new solution, based on historical data, before actually deploying it in the real system, which is a critical step of applying RL in most realworld applications. Despite the fundamental importance of the problem, existing general methods either have uncontrolled bias or suffer high variance. In this work, we extend the socalled doubly robust estimator for bandits to sequential decisionmaking problems, which gets the best of both worlds: it is guaranteed to be unbiased and has low variance, and as a point estimator, it outperforms the most popular importancesampling estimator and its variants in most occasions. We also provide theoretical results on the hardness of the problem, and show that our estimator can match the asymptotic lower bound in certain scenarios.
Abstraction Selection in ModelBased Reinforcement Learning
"... State abstractions are often used to reduce the complexity of modelbased reinforcement learning when only limited quantities of data are available. However, choosing the appropriate level of abstraction is an important problem in practice. Existing approaches have theoretical guarantees only un ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
State abstractions are often used to reduce the complexity of modelbased reinforcement learning when only limited quantities of data are available. However, choosing the appropriate level of abstraction is an important problem in practice. Existing approaches have theoretical guarantees only under strong assumptions on the domain or asymptotically large amounts of data, but in this paper we propose a simple algorithm based on statistical hypothesis testing that comes with a finitesample guarantee under assumptions on candidate abstractions. Our algorithm trades off the low approximation error of finer abstractions against the low estimation error of coarser abstractions, resulting in a loss bound that depends only on the quality of the best available abstraction and is polynomial in planning horizon. 1.
Kernel Bayesian Inference with Posterior Regularization
"... Abstract We propose a vectorvalued regression problem whose solution is equivalent to the reproducing kernel Hilbert space (RKHS) embedding of the Bayesian posterior distribution. This equivalence provides a new understanding of kernel Bayesian inference. Moreover, the optimization problem induces ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract We propose a vectorvalued regression problem whose solution is equivalent to the reproducing kernel Hilbert space (RKHS) embedding of the Bayesian posterior distribution. This equivalence provides a new understanding of kernel Bayesian inference. Moreover, the optimization problem induces a new regularization for the posterior embedding estimator, which is faster and has comparable performance to the squared regularization in kernel Bayes' rule. This regularization coincides with a former thresholding approach used in kernel POMDPs whose consistency remains to be established. Our theoretical work solves this open problem and provides consistency analysis in regression settings. Based on our optimizational formulation, we propose a flexible Bayesian posterior regularization framework which for the first time enables us to put regularization at the distribution level. We apply this method to nonparametric statespace filtering tasks with extremely nonlinear dynamics and show performance gains over all other baselines.
Hilbert Space Embeddings of PSRs
"... Many problems in machine learning and artificial intelligence involve discretetime partially observable nonlinear dynamical systems. If the observations are discrete, then Hidden Markov Models (HMMs) (Rabiner, 1989) or, in the control setting, Partially Observable Markov Decision Processes (POMDPs) ..."
Abstract
 Add to MetaCart
(Show Context)
Many problems in machine learning and artificial intelligence involve discretetime partially observable nonlinear dynamical systems. If the observations are discrete, then Hidden Markov Models (HMMs) (Rabiner, 1989) or, in the control setting, Partially Observable Markov Decision Processes (POMDPs) (Sondik, 1971) can be used to represent belief as a discrete distribution over latent states.
Doubly Robust Offpolicy Value Evaluation for Reinforcement Learning
"... Abstract We study the problem of offpolicy value evaluation in reinforcement learning (RL), where one aims to estimate the value of a new policy based on data collected by a different policy. This problem is often a critical step when applying RL to realworld problems. Despite its importance, exi ..."
Abstract
 Add to MetaCart
Abstract We study the problem of offpolicy value evaluation in reinforcement learning (RL), where one aims to estimate the value of a new policy based on data collected by a different policy. This problem is often a critical step when applying RL to realworld problems. Despite its importance, existing general methods either have uncontrolled bias or suffer high variance. In this work, we extend the doubly robust estimator for bandits to sequential decisionmaking problems, which gets the best of both worlds: it is guaranteed to be unbiased and can have a much lower variance than the popular importance sampling estimators. We demonstrate the estimator's accuracy in several benchmark problems, and illustrate its use as a subroutine in safe policy improvement. We also provide theoretical results on the inherent hardness of the problem, and show that our estimator can match the lower bound in certain scenarios.