Results 1 - 10
of
40
Analyzing feature generation for valuefunction approximation
- In Proceedings of the 24th International Conference on Machine Learning
, 2007
"... We analyze a simple, Bellman-error-based approach to generating basis functions for valuefunction approximation. We show that it generates orthogonal basis functions that provably tighten approximation error bounds. We also illustrate the use of this approach in the presence of noise on some sample ..."
Abstract
-
Cited by 30 (4 self)
- Add to MetaCart
We analyze a simple, Bellman-error-based approach to generating basis functions for valuefunction approximation. We show that it generates orthogonal basis functions that provably tighten approximation error bounds. We also illustrate the use of this approach in the presence of noise on some sample problems. 1.
Transfer Learning for Reinforcement Learning Domains: A Survey
"... The reinforcement learning paradigm is a popular way to address problems that have only limited environmental feedback, rather than correctly labeled examples, as is common in other machine learning contexts. While significant progress has been made to improve learning in a single task, the idea of ..."
Abstract
-
Cited by 27 (3 self)
- Add to MetaCart
The reinforcement learning paradigm is a popular way to address problems that have only limited environmental feedback, rather than correctly labeled examples, as is common in other machine learning contexts. While significant progress has been made to improve learning in a single task, the idea of transfer learning has only recently been applied to reinforcement learning tasks. The core idea of transfer is that experience gained in learning to perform one task can help improve learning performance in a related, but different, task. In this article we present a framework that classifies transfer learning methods in terms of their capabilities and goals, and then use it to survey the existing literature, as well as to suggest future directions for transfer learning work.
Value function approximation in reinforcement learning using the Fourier basis
, 2008
"... We describe the Fourier Basis, a linear value function approximation scheme based on the Fourier Series. We empirically evaluate its properties, and demonstrate that it performs well compared to Radial Basis Functions and the Polynomial Basis, the two most popular fixed bases for linear value functi ..."
Abstract
-
Cited by 18 (11 self)
- Add to MetaCart
We describe the Fourier Basis, a linear value function approximation scheme based on the Fourier Series. We empirically evaluate its properties, and demonstrate that it performs well compared to Radial Basis Functions and the Polynomial Basis, the two most popular fixed bases for linear value function approximation, and is competitive with learned Proto-Value Functions even though no extra experience or computation is required. 1
LQR-Trees: Feedback motion planning on sparse randomized trees
- in In Proceedings of Robotics: Science and Systems (RSS
"... Abstract — Recent advances in the direct computation of Lyapunov functions using convex optimization make it possible to efficiently evaluate regions of stability for smooth nonlinear systems. Here we present a feedback motion planning algorithm which uses these results to efficiently combine locall ..."
Abstract
-
Cited by 17 (4 self)
- Add to MetaCart
Abstract — Recent advances in the direct computation of Lyapunov functions using convex optimization make it possible to efficiently evaluate regions of stability for smooth nonlinear systems. Here we present a feedback motion planning algorithm which uses these results to efficiently combine locally valid linear quadratic regulator (LQR) controllers into a nonlinear feedback policy which probabilistically covers the reachable area of a (bounded) state space with a region of stability, certifying that all initial conditions that are capable of reaching the goal will stabilize to the goal. We investigate the properties of this systematic nonlinear feedback control design algorithm on simple underactuated systems and discuss the potential for control of more complicated control problems like bipedal walking. I.
An Analysis of Linear Models, Linear Value-Function Approximation, and Feature Selection for Reinforcement Learning
"... We show that linear value-function approximation is equivalent to a form of linear model approximation. We then derive a relationship between the model-approximation error and the Bellman error, and show how this relationship can guide feature selection for model improvement and/or value-function im ..."
Abstract
-
Cited by 17 (3 self)
- Add to MetaCart
We show that linear value-function approximation is equivalent to a form of linear model approximation. We then derive a relationship between the model-approximation error and the Bellman error, and show how this relationship can guide feature selection for model improvement and/or value-function improvement. We also show how these results give insight into the behavior of existing feature-selection algorithms. 1.
A unifying framework for computational reinforcement learning theory
, 2009
"... Computational learning theory studies mathematical models that allow one to formally analyze and compare the performance of supervised-learning algorithms such as their sample complexity. While existing models such as PAC (Probably Approximately Correct) have played an influential role in understand ..."
Abstract
-
Cited by 13 (6 self)
- Add to MetaCart
Computational learning theory studies mathematical models that allow one to formally analyze and compare the performance of supervised-learning algorithms such as their sample complexity. While existing models such as PAC (Probably Approximately Correct) have played an influential role in understanding the nature of supervised learning, they have not been as successful in reinforcement learning (RL). Here, the fundamental barrier is the need for active exploration in sequential decision problems. An RL agent tries to maximize long-term utility by exploiting its knowledge about the problem, but this knowledge has to be acquired by the agent itself through exploring the problem that may reduce short-term utility. The need for active exploration is common in many problems in daily life, engineering, and sciences. For example, a Backgammon program strives to take good moves to maximize the probability of winning a game, but sometimes it may try novel and possibly harmful moves to discover how the opponent reacts in the hope of discovering a better game-playing strategy. It has been known since the early days of RL that a good tradeoff between exploration and exploitation is critical for the agent to learn fast (i.e., to reach near-optimal strategies
Approximate Dynamic Programming with Applications in MultiAgent Systems
, 2007
"... This thesis presents the development and implementation of approximate dynamic programming methods used to manage multi-agent systems. The purpose of this thesis is to develop an architectural framework and theoretical methods that enable an autonomous mission system to manage real-time multi-agent ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
This thesis presents the development and implementation of approximate dynamic programming methods used to manage multi-agent systems. The purpose of this thesis is to develop an architectural framework and theoretical methods that enable an autonomous mission system to manage real-time multi-agent operations. To meet this goal, we begin by discussing aspects of the real-time multi-agent mission problem. Next, we formulate this problem as a Markov Decision Process (MDP) and present a system architecture designed to improve mission-level functional reliability through system self-awareness and adaptive mission planning. Since most multi-agent mission problems are computationally difficult to solve in real-time, approximation techniques are needed to find policies for these large-scale problems. Thus, we have developed
Linear Complementarity for Regularized Policy Evaluation and Improvement
, 2010
"... Recent work in reinforcement learning has emphasized the power of L1 regularization to perform feature selection and prevent overfitting. We propose formulating the L1 regularized linear fixed point problem as a linear complementarity problem (LCP). This formulation offers several advantages over th ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
Recent work in reinforcement learning has emphasized the power of L1 regularization to perform feature selection and prevent overfitting. We propose formulating the L1 regularized linear fixed point problem as a linear complementarity problem (LCP). This formulation offers several advantages over the LARS-inspired formulation, LARS-TD. The LCP formulation allows the use of efficient off-theshelf solvers, leads to a new uniqueness result, and can be initialized with starting points from similar problems (warm starts). We demonstrate that warm starts, as well as the efficiency of LCP solvers, can speed up policy iteration. Moreover, warm starts permit a form of modified policy iteration that can be used to approximate a “greedy” homotopy path, a generalization of the LARS-TD homotopy path that combines policy evaluation and optimization.
Towards Feature Selection In Actor-Critic Algorithms
"... Choosing features for the critic in actor-critic algorithms with function approximation is known to be a challenge. Too few critic features can lead to degeneracy of the actor gradient, and too many features may lead to slower convergence of the learner. In this paper, we show that a wellstudied cla ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
Choosing features for the critic in actor-critic algorithms with function approximation is known to be a challenge. Too few critic features can lead to degeneracy of the actor gradient, and too many features may lead to slower convergence of the learner. In this paper, we show that a wellstudied class of actor policies satisfy the known requirements for convergence when the actor features are selected carefully. We demonstrate that two popular representations for value methods-the barycentric interpolators and the graph Laplacian proto-value functions- can be used to represent the actor in order to satisfy these conditions. A consequence of this work is a generalization of the proto-value function methods to the continuous action actor-critic domain. Finally, we analyze the performance of this approach using a simulation of a torque-limited inverted pendulum. 1.
Sparse Approximate Policy Evaluation using Graph-based Basis Functions
"... Proto-value functions and diffusion wavelets are graph-based basis functions that capture topological structure of the MDP state space. A subset of these basis functions must be selected when approximating value functions in order to maintain computational efficiency and prevent overfitting. We eval ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
Proto-value functions and diffusion wavelets are graph-based basis functions that capture topological structure of the MDP state space. A subset of these basis functions must be selected when approximating value functions in order to maintain computational efficiency and prevent overfitting. We evaluated four basis selection algorithms for performing this task. This is an enhancement over the previously used heuristic of always selecting the most global, or smoothest, subset of basis functions regardless of the policy being evaluated. We analyzed two schemes, one direct and one indirect, for combining basis selection and approximate policy evaluation. The indirect scheme requires more computation than the direct scheme, but gains flexibility in the manner in which basis functions are selected. The coefficients applied to the basis functions were set using least-squares methods. We also described how least-squares methods can be altered to include regularization. Laplacian-based regularization provides a bias toward smoother approximate value functions which can prevent overfitting and can be useful in stochastic domains. A thorough set of experiments was conducted on a simple chain MDP to understand how basis selection and the different least-squares policy evaluation algorithms impact one another. Although the experiments used graph-based basis functions, the algorithms described in this paper can be applied to any set of basis functions. 1

