Results 1 
5 of
5
Protovalue functions: A laplacian framework for learning representation and control in markov decision processes
 Journal of Machine Learning Research
, 2006
"... This paper introduces a novel spectral framework for solving Markov decision processes (MDPs) by jointly learning representations and optimal policies. The major components of the framework described in this paper include: (i) A general scheme for constructing representations or basis functions by d ..."
Abstract

Cited by 66 (10 self)
 Add to MetaCart
This paper introduces a novel spectral framework for solving Markov decision processes (MDPs) by jointly learning representations and optimal policies. The major components of the framework described in this paper include: (i) A general scheme for constructing representations or basis functions by diagonalizing symmetric diffusion operators (ii) A specific instantiation of this approach where global basis functions called protovalue functions (PVFs) are formed using the eigenvectors of the graph Laplacian on an undirected graph formed from state transitions induced by the MDP (iii) A threephased procedure called representation policy iteration comprising of a sample collection phase, a representation learning phase that constructs basis functions from samples, and a final parameter estimation phase that determines an (approximately) optimal policy within the (linear) subspace spanned by the (current) basis functions. (iv) A specific instantiation of the RPI framework using leastsquares policy iteration (LSPI) as the parameter estimation method (v) Several strategies for scaling the proposed approach to large discrete and continuous state spaces, including the Nyström extension for outofsample interpolation of eigenfunctions, and the use of Kronecker sum factorization to construct compact eigenfunctions in product spaces such as factored MDPs (vi) Finally, a series of illustrative discrete and continuous control tasks, which both illustrate the concepts and provide a benchmark for evaluating the proposed approach. Many challenges remain to be addressed in scaling the proposed framework to large MDPs, and several elaboration of the proposed framework are briefly summarized at the end.
Geodesic Gaussian kernels for value function approximation
, 2007
"... The leastsquares policy iteration approach works efficiently in value function approximation, given appropriate basis functions. Because of its smoothness, the Gaussian kernel is a popular and useful choice as a basis function. However, it does not allow for discontinuity which typically arises in ..."
Abstract

Cited by 9 (3 self)
 Add to MetaCart
The leastsquares policy iteration approach works efficiently in value function approximation, given appropriate basis functions. Because of its smoothness, the Gaussian kernel is a popular and useful choice as a basis function. However, it does not allow for discontinuity which typically arises in realworld reinforcement learning tasks. In this paper, we propose a new basis function based on geodesic Gaussian kernels, which exploits the nonlinear manifold structure induced by the Markov decision processes. The usefulness of the proposed method is successfully demonstrated in simulated robot arm control and Khepera robot navigation.
BASIS CONSTRUCTION AND UTILIZATION FOR MARKOV DECISION PROCESSES USING GRAPHS
, 2010
"... The ease or difficulty in solving a problem strongly depends on the way it is represented.
For example, consider the task of multiplying the numbers 12 and 24. Now imagine multiplying
XII and XXIV. Both tasks can be solved, but it is clearly more difficult to use
the Roman numeral representations of ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
The ease or difficulty in solving a problem strongly depends on the way it is represented.
For example, consider the task of multiplying the numbers 12 and 24. Now imagine multiplying
XII and XXIV. Both tasks can be solved, but it is clearly more difficult to use
the Roman numeral representations of twelve and twentyfour. Humans excel at finding
appropriate representations for solving complex problems. This is not true for artificial
systems, which have largely relied on humans to provide appropriate representations. The
ability to autonomously construct useful representations and to efficiently exploit them is
an important challenge for artificial intelligence.
This dissertation builds on a recently introduced graphbased approach to learning representations
for sequential decisionmaking problems modeled as Markov decision processes
(MDPs). Representations, or basis functions, forMDPs are abstractions of the problem’s
state space and are used to approximate value functions, which quantify the expected
longterm utility obtained by following a policy. The graphbased approach generates basis
functions capturing the structure of the environment. Handling large environments requires
efficiently constructing and utilizing these functions. We address two issues with
this approach: (1) scaling basis construction and value function approximation to large
graphs/data sets, and (2) tailoring the approximation to a specific policy’s value function.
We introduce two algorithms for computing basis functions from large graphs. Both
algorithms work by decomposing the basis construction problem into smaller, more manageable
subproblems. One method determines the subproblems by enforcing block structure,
or groupings of states. The other method uses recursion to solve subproblems which
are then used for approximating the original problem. Both algorithms result in a set of basis
functions from which we employ basis selection algorithms. The selection algorithms
represent the value function with as few basis functions as possible, thereby reducing the
computational complexity of value function approximation and preventing overfitting.
The use of basis selection algorithms not only addresses the scaling problem but also
allows for tailoring the approximation to a specific policy. This results in a more accurate
representation than obtained when using the same subset of basis functions irrespective of
the policy being evaluated. To make effective use of the data, we develop a hybrid leastsquares
algorithm for setting basis function coefficients. This algorithm is a parametric
combination of two common leastsquares methods used for MDPs. We provide a geometric
and analytical interpretation of these methods and demonstrate the hybrid algorithm’s
ability to discover improved policies. We also show how the algorithm can include graphbased
regularization to help with sparse samples from stochastic environments.
This work investigates all aspects of linear value function approximation: constructing
a dictionary of basis functions, selecting a subset of basis functions from the dictionary,
and setting the coefficients on the selected basis functions. We empirically evaluate each
of these contributions in isolation and in one combined architecture.
KernelBased Reinforcement Learning Using Bellman Residual Elimination
 JOURNAL OF MACHINE LEARNING RESEARCH
"... This paper presents a class of new approximate policy iteration algorithms for solving infinitehorizon, discounted Markov decision processes (MDPs) for which a model of the system is available. The algorithms are similar in spirit to Bellman residual minimization methods. However, by exploiting ker ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
This paper presents a class of new approximate policy iteration algorithms for solving infinitehorizon, discounted Markov decision processes (MDPs) for which a model of the system is available. The algorithms are similar in spirit to Bellman residual minimization methods. However, by exploiting kernelbased regression techniques with nondegenerate kernel functions as the underlying costtogo function approximation architecture, the new algorithms are able to explicitly construct costtogo solutions for which the Bellman residuals are identically zero at a set of chosen sample states. For this reason, we have named our approach Bellman residual elimination (BRE). Since the Bellman residuals are zero at the sample states, our BRE algorithms can be proven to reduce to exact policy iteration in the limit of sampling the entire state space. Furthermore, by exploiting knowledge of the model, the BRE algorithms eliminate the need to perform trajectory simulations and therefore do not suffer from simulation noise effects. The theoretical basis of our approach is a pair of reproducing kernel Hilbert spaces corresponding to the cost and Bellman residual function spaces, respectively. By construcing an invertible linear mapping between