Results 1 
7 of
7
Protovalue functions: A laplacian framework for learning representation and control in markov decision processes
 Journal of Machine Learning Research
, 2006
"... This paper introduces a novel spectral framework for solving Markov decision processes (MDPs) by jointly learning representations and optimal policies. The major components of the framework described in this paper include: (i) A general scheme for constructing representations or basis functions by d ..."
Abstract

Cited by 85 (10 self)
 Add to MetaCart
(Show Context)
This paper introduces a novel spectral framework for solving Markov decision processes (MDPs) by jointly learning representations and optimal policies. The major components of the framework described in this paper include: (i) A general scheme for constructing representations or basis functions by diagonalizing symmetric diffusion operators (ii) A specific instantiation of this approach where global basis functions called protovalue functions (PVFs) are formed using the eigenvectors of the graph Laplacian on an undirected graph formed from state transitions induced by the MDP (iii) A threephased procedure called representation policy iteration comprising of a sample collection phase, a representation learning phase that constructs basis functions from samples, and a final parameter estimation phase that determines an (approximately) optimal policy within the (linear) subspace spanned by the (current) basis functions. (iv) A specific instantiation of the RPI framework using leastsquares policy iteration (LSPI) as the parameter estimation method (v) Several strategies for scaling the proposed approach to large discrete and continuous state spaces, including the Nyström extension for outofsample interpolation of eigenfunctions, and the use of Kronecker sum factorization to construct compact eigenfunctions in product spaces such as factored MDPs (vi) Finally, a series of illustrative discrete and continuous control tasks, which both illustrate the concepts and provide a benchmark for evaluating the proposed approach. Many challenges remain to be addressed in scaling the proposed framework to large MDPs, and several elaboration of the proposed framework are briefly summarized at the end.
Geodesic Gaussian kernels for value function approximation
, 2007
"... The leastsquares policy iteration approach works efficiently in value function approximation, given appropriate basis functions. Because of its smoothness, the Gaussian kernel is a popular and useful choice as a basis function. However, it does not allow for discontinuity which typically arises in ..."
Abstract

Cited by 11 (4 self)
 Add to MetaCart
The leastsquares policy iteration approach works efficiently in value function approximation, given appropriate basis functions. Because of its smoothness, the Gaussian kernel is a popular and useful choice as a basis function. However, it does not allow for discontinuity which typically arises in realworld reinforcement learning tasks. In this paper, we propose a new basis function based on geodesic Gaussian kernels, which exploits the nonlinear manifold structure induced by the Markov decision processes. The usefulness of the proposed method is successfully demonstrated in simulated robot arm control and Khepera robot navigation.
Approximate dynamic programming using bellman residual elimination and gaussian process regression
 In Proceedings of the American Control Conference
, 2009
"... The overarching goal of the thesis is to devise new strategies for multiagent planning and control problems, especially in the case where the agents are subject to random failures, maintenance needs, or other health management concerns, or in cases where the system model is not perfectly known. We ..."
Abstract

Cited by 5 (4 self)
 Add to MetaCart
(Show Context)
The overarching goal of the thesis is to devise new strategies for multiagent planning and control problems, especially in the case where the agents are subject to random failures, maintenance needs, or other health management concerns, or in cases where the system model is not perfectly known. We argue that dynamic programming techniques, in particular Markov Decision Processes (MDPs), are a natural framework for addressing these planning problems, and present an MDP problem formulation for a persistent surveillance mission that incorporates stochastic fuel usage dynamics and the possibility for randomlyoccurring failures into the planning process. We show that this problem formulation and its optimal policy lead to good mission performance in a number of realworld scenarios. Furthermore, an online, adaptive solution framework is developed that allows the planning system to improve its performance over time, even in the case where the true system model is uncertain or timevarying. Motivated by the difficulty of solving the persistent mission problem exactly when the number of agents becomes large, we then develop a new family of approximate dynamic programming algorithms, called Bellman Residual Elimination (BRE) methods, which can be employed to approximately solve largescale MDPs. We analyze these methods and prove a number of desirable theoretical properties about them, including reduction to exact policy iteration under certain conditions. Finally, we apply these BRE methods to largescale persistent surveillance problems and show that they yield good performance, and furthermore, that they can be successfully integrated into the adaptive planning framework. 2 1
KernelBased Reinforcement Learning Using Bellman Residual Elimination
 JOURNAL OF MACHINE LEARNING RESEARCH
"... This paper presents a class of new approximate policy iteration algorithms for solving infinitehorizon, discounted Markov decision processes (MDPs) for which a model of the system is available. The algorithms are similar in spirit to Bellman residual minimization methods. However, by exploiting ker ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
This paper presents a class of new approximate policy iteration algorithms for solving infinitehorizon, discounted Markov decision processes (MDPs) for which a model of the system is available. The algorithms are similar in spirit to Bellman residual minimization methods. However, by exploiting kernelbased regression techniques with nondegenerate kernel functions as the underlying costtogo function approximation architecture, the new algorithms are able to explicitly construct costtogo solutions for which the Bellman residuals are identically zero at a set of chosen sample states. For this reason, we have named our approach Bellman residual elimination (BRE). Since the Bellman residuals are zero at the sample states, our BRE algorithms can be proven to reduce to exact policy iteration in the limit of sampling the entire state space. Furthermore, by exploiting knowledge of the model, the BRE algorithms eliminate the need to perform trajectory simulations and therefore do not suffer from simulation noise effects. The theoretical basis of our approach is a pair of reproducing kernel Hilbert spaces corresponding to the cost and Bellman residual function spaces, respectively. By construcing an invertible linear mapping between
BASIS CONSTRUCTION AND UTILIZATION FOR MARKOV DECISION PROCESSES USING GRAPHS
, 2010
"... The ease or difficulty in solving a problem strongly depends on the way it is represented.
For example, consider the task of multiplying the numbers 12 and 24. Now imagine multiplying
XII and XXIV. Both tasks can be solved, but it is clearly more difficult to use
the Roman numeral representations of ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
The ease or difficulty in solving a problem strongly depends on the way it is represented.
For example, consider the task of multiplying the numbers 12 and 24. Now imagine multiplying
XII and XXIV. Both tasks can be solved, but it is clearly more difficult to use
the Roman numeral representations of twelve and twentyfour. Humans excel at finding
appropriate representations for solving complex problems. This is not true for artificial
systems, which have largely relied on humans to provide appropriate representations. The
ability to autonomously construct useful representations and to efficiently exploit them is
an important challenge for artificial intelligence.
This dissertation builds on a recently introduced graphbased approach to learning representations
for sequential decisionmaking problems modeled as Markov decision processes
(MDPs). Representations, or basis functions, forMDPs are abstractions of the problem’s
state space and are used to approximate value functions, which quantify the expected
longterm utility obtained by following a policy. The graphbased approach generates basis
functions capturing the structure of the environment. Handling large environments requires
efficiently constructing and utilizing these functions. We address two issues with
this approach: (1) scaling basis construction and value function approximation to large
graphs/data sets, and (2) tailoring the approximation to a specific policy’s value function.
We introduce two algorithms for computing basis functions from large graphs. Both
algorithms work by decomposing the basis construction problem into smaller, more manageable
subproblems. One method determines the subproblems by enforcing block structure,
or groupings of states. The other method uses recursion to solve subproblems which
are then used for approximating the original problem. Both algorithms result in a set of basis
functions from which we employ basis selection algorithms. The selection algorithms
represent the value function with as few basis functions as possible, thereby reducing the
computational complexity of value function approximation and preventing overfitting.
The use of basis selection algorithms not only addresses the scaling problem but also
allows for tailoring the approximation to a specific policy. This results in a more accurate
representation than obtained when using the same subset of basis functions irrespective of
the policy being evaluated. To make effective use of the data, we develop a hybrid leastsquares
algorithm for setting basis function coefficients. This algorithm is a parametric
combination of two common leastsquares methods used for MDPs. We provide a geometric
and analytical interpretation of these methods and demonstrate the hybrid algorithm’s
ability to discover improved policies. We also show how the algorithm can include graphbased
regularization to help with sparse samples from stochastic environments.
This work investigates all aspects of linear value function approximation: constructing
a dictionary of basis functions, selecting a subset of basis functions from the dictionary,
and setting the coefficients on the selected basis functions. We empirically evaluate each
of these contributions in isolation and in one combined architecture.
Latent Kullback Leibler Control for ContinuousState Systems using Probabilistic Graphical Models
"... Kullback Leibler (KL) control problems allow for ecient computation of optimal control by solving a principal eigenvector problem. However, direct applicability of such framework to continuous stateaction systems is limited. In this paper, we propose to embed a KL control problem in a probabil ..."
Abstract
 Add to MetaCart
(Show Context)
Kullback Leibler (KL) control problems allow for ecient computation of optimal control by solving a principal eigenvector problem. However, direct applicability of such framework to continuous stateaction systems is limited. In this paper, we propose to embed a KL control problem in a probabilistic graphical model where observed variables correspond to the continuous (possibly highdimensional) state of the system and latent variables correspond to a discrete (lowdimensional) representation of the state amenable for KL control computation. We present two examples of this approach. The rst one uses standard hidden Markov models (HMMs) and computes exact optimal control, but is only applicable to lowdimensional systems. The second one uses factorial HMMs, it is scalable to higher dimensional problems, but control computation is approximate. We illustrate both examples in several robot motor control tasks. 1