Results 1  10
of
93
Analyzing feature generation for valuefunction approximation
 In Proceedings of the 24th International Conference on Machine Learning
, 2007
"... We analyze a simple, Bellmanerrorbased approach to generating basis functions for valuefunction approximation. We show that it generates orthogonal basis functions that provably tighten approximation error bounds. We also illustrate the use of this approach in the presence of noise on some sample ..."
Abstract

Cited by 56 (5 self)
 Add to MetaCart
(Show Context)
We analyze a simple, Bellmanerrorbased approach to generating basis functions for valuefunction approximation. We show that it generates orthogonal basis functions that provably tighten approximation error bounds. We also illustrate the use of this approach in the presence of noise on some sample problems. 1.
LQRTrees: Feedback motion planning on sparse randomized trees
 in In Proceedings of Robotics: Science and Systems (RSS
"... Abstract — Recent advances in the direct computation of Lyapunov functions using convex optimization make it possible to efficiently evaluate regions of stability for smooth nonlinear systems. Here we present a feedback motion planning algorithm which uses these results to efficiently combine locall ..."
Abstract

Cited by 54 (9 self)
 Add to MetaCart
(Show Context)
Abstract — Recent advances in the direct computation of Lyapunov functions using convex optimization make it possible to efficiently evaluate regions of stability for smooth nonlinear systems. Here we present a feedback motion planning algorithm which uses these results to efficiently combine locally valid linear quadratic regulator (LQR) controllers into a nonlinear feedback policy which probabilistically covers the reachable area of a (bounded) state space with a region of stability, certifying that all initial conditions that are capable of reaching the goal will stabilize to the goal. We investigate the properties of this systematic nonlinear feedback control design algorithm on simple underactuated systems and discuss the potential for control of more complicated control problems like bipedal walking. I.
Transfer Learning for Reinforcement Learning Domains: A Survey
"... The reinforcement learning paradigm is a popular way to address problems that have only limited environmental feedback, rather than correctly labeled examples, as is common in other machine learning contexts. While significant progress has been made to improve learning in a single task, the idea of ..."
Abstract

Cited by 49 (6 self)
 Add to MetaCart
The reinforcement learning paradigm is a popular way to address problems that have only limited environmental feedback, rather than correctly labeled examples, as is common in other machine learning contexts. While significant progress has been made to improve learning in a single task, the idea of transfer learning has only recently been applied to reinforcement learning tasks. The core idea of transfer is that experience gained in learning to perform one task can help improve learning performance in a related, but different, task. In this article we present a framework that classifies transfer learning methods in terms of their capabilities and goals, and then use it to survey the existing literature, as well as to suggest future directions for transfer learning work.
An Analysis of Linear Models, Linear ValueFunction Approximation, and Feature Selection for Reinforcement Learning
"... We show that linear valuefunction approximation is equivalent to a form of linear model approximation. We then derive a relationship between the modelapproximation error and the Bellman error, and show how this relationship can guide feature selection for model improvement and/or valuefunction im ..."
Abstract

Cited by 44 (3 self)
 Add to MetaCart
We show that linear valuefunction approximation is equivalent to a form of linear model approximation. We then derive a relationship between the modelapproximation error and the Bellman error, and show how this relationship can guide feature selection for model improvement and/or valuefunction improvement. We also show how these results give insight into the behavior of existing featureselection algorithms. 1.
Value function approximation in reinforcement learning using the Fourier basis
, 2008
"... We describe the Fourier Basis, a linear value function approximation scheme based on the Fourier Series. We empirically evaluate its properties, and demonstrate that it performs well compared to Radial Basis Functions and the Polynomial Basis, the two most popular fixed bases for linear value functi ..."
Abstract

Cited by 44 (14 self)
 Add to MetaCart
We describe the Fourier Basis, a linear value function approximation scheme based on the Fourier Series. We empirically evaluate its properties, and demonstrate that it performs well compared to Radial Basis Functions and the Polynomial Basis, the two most popular fixed bases for linear value function approximation, and is competitive with learned ProtoValue Functions even though no extra experience or computation is required. 1
A unifying framework for computational reinforcement learning theory
, 2009
"... Computational learning theory studies mathematical models that allow one to formally analyze and compare the performance of supervisedlearning algorithms such as their sample complexity. While existing models such as PAC (Probably Approximately Correct) have played an influential role in understand ..."
Abstract

Cited by 23 (7 self)
 Add to MetaCart
Computational learning theory studies mathematical models that allow one to formally analyze and compare the performance of supervisedlearning algorithms such as their sample complexity. While existing models such as PAC (Probably Approximately Correct) have played an influential role in understanding the nature of supervised learning, they have not been as successful in reinforcement learning (RL). Here, the fundamental barrier is the need for active exploration in sequential decision problems. An RL agent tries to maximize longterm utility by exploiting its knowledge about the problem, but this knowledge has to be acquired by the agent itself through exploring the problem that may reduce shortterm utility. The need for active exploration is common in many problems in daily life, engineering, and sciences. For example, a Backgammon program strives to take good moves to maximize the probability of winning a game, but sometimes it may try novel and possibly harmful moves to discover how the opponent reacts in the hope of discovering a better gameplaying strategy. It has been known since the early days of RL that a good tradeoff between exploration and exploitation is critical for the agent to learn fast (i.e., to reach nearoptimal strategies
Linear Complementarity for Regularized Policy Evaluation and Improvement
, 2010
"... Recent work in reinforcement learning has emphasized the power of L1 regularization to perform feature selection and prevent overfitting. We propose formulating the L1 regularized linear fixed point problem as a linear complementarity problem (LCP). This formulation offers several advantages over th ..."
Abstract

Cited by 22 (3 self)
 Add to MetaCart
(Show Context)
Recent work in reinforcement learning has emphasized the power of L1 regularization to perform feature selection and prevent overfitting. We propose formulating the L1 regularized linear fixed point problem as a linear complementarity problem (LCP). This formulation offers several advantages over the LARSinspired formulation, LARSTD. The LCP formulation allows the use of efficient offtheshelf solvers, leads to a new uniqueness result, and can be initialized with starting points from similar problems (warm starts). We demonstrate that warm starts, as well as the efficiency of LCP solvers, can speed up policy iteration. Moreover, warm starts permit a form of modified policy iteration that can be used to approximate a “greedy” homotopy path, a generalization of the LARSTD homotopy path that combines policy evaluation and optimization.
Reinforcement Learning for Dialog Management using LeastSquares Policy Iteration and Fast Feature Selection
"... Reinforcement learning (RL) is a promising technique for creating a dialog manager. RL accepts features of the current dialog state and seeks to find the best action given those features. Although it is often easy to posit a large set of potentially useful features, in practice, it is difficult to f ..."
Abstract

Cited by 20 (2 self)
 Add to MetaCart
(Show Context)
Reinforcement learning (RL) is a promising technique for creating a dialog manager. RL accepts features of the current dialog state and seeks to find the best action given those features. Although it is often easy to posit a large set of potentially useful features, in practice, it is difficult to find the subset which is large enough to contain useful information yet compact enough to reliably learn a good policy. In this paper, we propose a method for RL optimization which automatically performs feature selection. The algorithm is based on leastsquares policy iteration, a stateoftheart RL algorithm which is highly sampleefficient and can learn from a static corpus or online. Experiments in dialog simulation show it is more stable than a baseline RL algorithm taken from a working dialog system.
Error Propagation for Approximate Policy and Value Iteration
"... We address the question of how the approximation error/Bellman residual at each iteration of the Approximate Policy/Value Iteration algorithms influences the quality of the resulted policy. We quantify the performance loss as the Lp norm of the approximation error/Bellman residual at each iteration. ..."
Abstract

Cited by 15 (0 self)
 Add to MetaCart
(Show Context)
We address the question of how the approximation error/Bellman residual at each iteration of the Approximate Policy/Value Iteration algorithms influences the quality of the resulted policy. We quantify the performance loss as the Lp norm of the approximation error/Bellman residual at each iteration. Moreover, we show that the performance loss depends on the expectation of the squared RadonNikodym derivative of a certain distribution rather than its supremum – as opposed to what has been suggested by the previous results. Also our results indicate that the contribution of the approximation/Bellman error to the performance loss is more prominent in the later iterations of API/AVI, and the effect of an error term in the earlier iterations decays exponentially fast. 1
Approximate Dynamic Programming with Applications in MultiAgent Systems
, 2007
"... This thesis presents the development and implementation of approximate dynamic programming methods used to manage multiagent systems. The purpose of this thesis is to develop an architectural framework and theoretical methods that enable an autonomous mission system to manage realtime multiagent ..."
Abstract

Cited by 15 (1 self)
 Add to MetaCart
(Show Context)
This thesis presents the development and implementation of approximate dynamic programming methods used to manage multiagent systems. The purpose of this thesis is to develop an architectural framework and theoretical methods that enable an autonomous mission system to manage realtime multiagent operations. To meet this goal, we begin by discussing aspects of the realtime multiagent mission problem. Next, we formulate this problem as a Markov Decision Process (MDP) and present a system architecture designed to improve missionlevel functional reliability through system selfawareness and adaptive mission planning. Since most multiagent mission problems are computationally difficult to solve in realtime, approximation techniques are needed to find policies for these largescale problems. Thus, we have developed