Results 1 - 10
of
33
Policy gradient methods for robotics
- In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS
, 2006
"... Abstract — The aquisition and improvement of motor skills and control policies for robotics from trial and error is of essential importance if robots should ever leave precisely pre-structured environments. However, to date only few existing reinforcement learning methods have been scaled into the d ..."
Abstract
-
Cited by 52 (14 self)
- Add to MetaCart
Abstract — The aquisition and improvement of motor skills and control policies for robotics from trial and error is of essential importance if robots should ever leave precisely pre-structured environments. However, to date only few existing reinforcement learning methods have been scaled into the domains of highdimensional robots such as manipulator, legged or humanoid robots. Policy gradient methods remain one of the few exceptions and have found a variety of applications. Nevertheless, the application of such methods is not without peril if done in an uninformed manner. In this paper, we give an overview on learning with policy gradient methods for robotics with a strong focus on recent advances in the field. We outline previous applications to robotics and show how the most recently developed methods can significantly improve learning performance. Finally, we evaluate our most promising algorithm in the application of hitting a baseball with an anthropomorphic arm. I.
Using Local Trajectory Optimizers To Speed Up Global Optimization In Dynamic Programming
- Advances in Neural Information Processing Systems
, 1994
"... Dynamic programming provides a methodology to develop planners and controllers for nonlinear systems. However, general dynamic programming is computationally intractable. We have developed procedures that allow more complex planning and control problems to be solved. We use second order local trajec ..."
Abstract
-
Cited by 41 (7 self)
- Add to MetaCart
Dynamic programming provides a methodology to develop planners and controllers for nonlinear systems. However, general dynamic programming is computationally intractable. We have developed procedures that allow more complex planning and control problems to be solved. We use second order local trajectory optimization to generate locally optimal plans and local models of the value function and its derivatives. We maintain global consistency of the local models of the value function, guaranteeing that our locally optimal plans are actually globally optimal, up to the resolution of our search procedures. Learning to do the right thing at each instant in situations that evolve over time is difficult, as the future cost of actions chosen now may not be obvious immediately, and may only become clear with time. Value functions are a representational tool that makes the consequences of actions explicit. Value functions are difficult to learn directly, but they can be built up from learned mode...
Minimax differential dynamic programming: An application to robust bipedwalking
- in Advances in Neural Information Processing Systems 14
, 2002
"... We have developed a robust control policy design method for high-dimensional state spaces by using differential dynamic programming with a minimax criterion. As an example, we applied our method to a simulated five link biped robot. The results show lower joint torques using the optimal control poli ..."
Abstract
-
Cited by 39 (8 self)
- Add to MetaCart
We have developed a robust control policy design method for high-dimensional state spaces by using differential dynamic programming with a minimax criterion. As an example, we applied our method to a simulated five link biped robot. The results show lower joint torques using the optimal control policy compared to torques generated by a hand-tuned PD servo controller. Results also show that the simulated biped robot can successfully walk with unknown disturbances that cause controllers generated by standard differential dynamic programming and the hand-tuned PD servo to fail. Learning to compensate for modeling error and previously unknown disturbances in conjunction with robust control design is also demonstrated. We also applied proposed method to a real biped robot for optimizing swing leg trajectories. 1
Assessing the quality of learned local models
- Advances in Neural Information Processing Systems 6
, 1994
"... An approach is presented to learning high dimensional functions in the case where the learning algorithm can affect the generation of new data. A local modeling algorithm, locally weighted regression, is used to represent the learned function. Architectural parameters of the approach, such as distan ..."
Abstract
-
Cited by 36 (13 self)
- Add to MetaCart
An approach is presented to learning high dimensional functions in the case where the learning algorithm can affect the generation of new data. A local modeling algorithm, locally weighted regression, is used to represent the learned function. Architectural parameters of the approach, such as distance metrics, are also localized and become a function of the query point instead of being global. Statistical tests are given for when a local model is good enough and sampling should be moved to a new area. Our methods explicitly deal with the case where prediction accuracy requirements exist during exploration: By gradually shifting a “center of exploration ” and controlling the speed of the shift with local prediction accuracy, a goal-directed exploration of state space takes place along the fringes of the current data support until the task goal is achieved. We illustrate this approach with simulation results and results from a real robot learning a complex juggling task. 1
Nonparametric Representation of Policies and Value Functions: A Trajectory-Based Approach
- In NIPS 15
, 2003
"... A longstanding goal of reinforcement learning is to develop nonparametric representations of policies and value functions that support rapid learning without suffering from interference or the curse of dimensionality. ..."
Abstract
-
Cited by 23 (3 self)
- Add to MetaCart
A longstanding goal of reinforcement learning is to develop nonparametric representations of policies and value functions that support rapid learning without suffering from interference or the curse of dimensionality.
Scalable Techniques from Nonparametric Statistics for Real Time Robot Learning
, 2000
"... Locally weighted learning (LWL) is a class of techniques from nonparametric statistics that provides useful representations and training algorithms for learning about complex phenomena during autonomous adaptive control of robotic systems. This paper introduces several LWL algorithms that have been ..."
Abstract
-
Cited by 21 (1 self)
- Add to MetaCart
Locally weighted learning (LWL) is a class of techniques from nonparametric statistics that provides useful representations and training algorithms for learning about complex phenomena during autonomous adaptive control of robotic systems. This paper introduces several LWL algorithms that have been tested successfully in real-time learning of complex robot tasks. We discuss two major classes of LWL, memory-based LWL and purely incremental LWL that does not need to remember any data explicitly. In contrast to the traditional belief that LWL methods cannot work well in high-dimensional spaces, we provide new algorithms that have been tested on up to 90 dimensional learning problems. The applicability of our LWL algorithms is demonstrated in various robot learning examples, including the learning of devil-sticking, polebalancing by a humanoid robot arm, and inverse-dynamics learning for a seven and a 30 degree-of-freedom robot. In all these examples, the application of our statistical n...
Optimal Control of Switched Autonomous Systems
- IEEE Conference on Decision and Control, Las Vegas, NV
, 2002
"... In this paper, optimal control problems for switched autonomous systems are studied. In particular, we fo-cus on problems in which a prespecified sequence of ac-tive subsystems is given and propose an approach to finding the optimal switching instants. The approach derives the derivatives of the cos ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
In this paper, optimal control problems for switched autonomous systems are studied. In particular, we fo-cus on problems in which a prespecified sequence of ac-tive subsystems is given and propose an approach to finding the optimal switching instants. The approach derives the derivatives of the cost with respect to the switching instants and uses nonlinear optimization tech-niques to locate the optimal switching instants. The ap-proach is then applied to general quadratic problems for switched linear autonomous systems and to reachability problems. Examples illustrate the results. 1
Exploiting Model Uncertainty Estimates for Safe Dynamic Control Learning
- in Neural Information Processing Systems 9
, 1996
"... Model learning combined with dynamic programming has been shown to be effective for learning control of continuous state dynamic systems. The simplest method assumes the learned model is correct and applies dynamic programming to it, but many approximators provide uncertainty estimates on the fit. H ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
Model learning combined with dynamic programming has been shown to be effective for learning control of continuous state dynamic systems. The simplest method assumes the learned model is correct and applies dynamic programming to it, but many approximators provide uncertainty estimates on the fit. How can they be exploited? This paper addresses the case where the system must be prevented from having catastrophic failures during learning. We propose a new algorithm adapted from the dual control literature and use Bayesian locally weighted regression models with stochastic dynamic programming. A common reinforcement learning assumption is that aggressive exploration should be encouraged. This paper addresses the converse case in which the system has to reign in exploration. The algorithm is illustrated on a 4 dimensional simulated control problem. 1 Introduction Reinforcement learning and related grid-based dynamic programming techniques are increasingly being applied to dynamic system...
Optimal Control of Switched Systems via Nonlinear Optimization Based on Direct Differentiations of Value Functions
, 2001
"... This paper presents an approach for solving optimal control problems of switched systems. In general, in ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
This paper presents an approach for solving optimal control problems of switched systems. In general, in

