Results 1  10
of
93
Forward models: Supervised learning with a distal teacher
 Cognitive Science
, 1992
"... Internal models of the environment have an important role to play in adaptive systems in general and are of particular importance for the supervised learning paradigm. In this paper we demonstrate that certain classical problems associated with the notion of the \teacher " in supervised lea ..."
Abstract

Cited by 410 (8 self)
 Add to MetaCart
Internal models of the environment have an important role to play in adaptive systems in general and are of particular importance for the supervised learning paradigm. In this paper we demonstrate that certain classical problems associated with the notion of the \teacher &quot; in supervised learning can be solved by judicious use of learned internal models as components of the adaptive system. In particular, we show how supervised learning algorithms can be utilized in cases in which an unknown dynamical system intervenes between actions and desired outcomes. Our approach applies to any supervised learning algorithm that is capable of learning in multilayer networks.
Policy search for motor primitives in robotics
 Advances in Neural Information Processing Systems 22 (NIPS 2008
, 2009
"... Many motor skills in humanoid robotics can be learned using parametrized motor primitives as done in imitation learning. However, most interesting motor learning problems are highdimensional reinforcement learning problems often beyond the reach of current methods. In this paper, we extend previou ..."
Abstract

Cited by 117 (24 self)
 Add to MetaCart
(Show Context)
Many motor skills in humanoid robotics can be learned using parametrized motor primitives as done in imitation learning. However, most interesting motor learning problems are highdimensional reinforcement learning problems often beyond the reach of current methods. In this paper, we extend previous work on policy learning from the immediate reward case to episodic reinforcement learning. We show that this results in a general, common framework also connected to policy gradient methods and yielding a novel algorithm for policy learning that is particularly wellsuited for dynamic motor primitives. The resulting algorithm is an EMinspired algorithm applicable to complex motor learning tasks. We compare this algorithm to several wellknown parametrized policy search methods and show that it outperforms them. We apply it in the context of motor learning and show that it can learn a complex BallinaCup task using a real Barrett WAMTM robot arm. 1
Development of an optimal vehicletogrid aggregator for frequency regulation
 IEEE Trans. Smart Grid
, 2010
"... Abstract—For vehicletogrid (V2G) frequency regulation services, we propose an aggregator that makes efficient use of the distributed power of electric vehicles to produce the desired gridscale power. The cost arising from the battery charging and the revenue obtained by providing the regulation ..."
Abstract

Cited by 53 (1 self)
 Add to MetaCart
Abstract—For vehicletogrid (V2G) frequency regulation services, we propose an aggregator that makes efficient use of the distributed power of electric vehicles to produce the desired gridscale power. The cost arising from the battery charging and the revenue obtained by providing the regulation are investigated and represented mathematically. Some design considerations of the aggregator are also discussed together with practical constraints such as the energy restriction of the batteries. The cost function with constraints enables us to construct an optimization problem. Based on the developed optimization problem, we apply the dynamic programming algorithm to compute the optimal charging control for each vehicle. Finally, simulations are provided to illustrate the optimality of the proposed charging control strategy with variations of parameters. Index Terms—Aggregator, battery, dynamic programming, electric vehicle, plugin hybrid electric vehicle (PHEV), regulation, vehicletogrid (V2G). I.
Reinforcement Learning in Robotics: A Survey
"... Reinforcement learning offers to robotics a framework and set oftoolsfor the design of sophisticated and hardtoengineer behaviors. Conversely, the challenges of robotic problems provide both inspiration, impact, and validation for developments in reinforcement learning. The relationship between di ..."
Abstract

Cited by 38 (2 self)
 Add to MetaCart
Reinforcement learning offers to robotics a framework and set oftoolsfor the design of sophisticated and hardtoengineer behaviors. Conversely, the challenges of robotic problems provide both inspiration, impact, and validation for developments in reinforcement learning. The relationship between disciplines has sufficient promise to be likened to that between physics and mathematics. In this article, we attempt to strengthen the links between the two research communities by providing a survey of work in reinforcement learning for behavior generation in robots. We highlight both key challenges in robot reinforcement learning as well as notable successes. We discuss how contributions tamed the complexity of the domain and study the role of algorithms, representations, and prior knowledge in achieving these successes. As a result, a particular focus of our paper lies on the choice between modelbased and modelfree as well as between value functionbased and policy search methods. By analyzing a simple problem in some detail we demonstrate how reinforcement learning approaches may be profitably applied, and
Approximate Solutions to the TimeInvariant HamiltonJacobiBellman Equation
, 1998
"... In this paper we develop a new method to approximate the solution to the HamiltonJacobiBellman (HJB) equation which arises in optimal control when the plant is modeled by nonlinear dynamics. The approximation is comprised of two steps. First, successive approximation is used to reduce the HJB equat ..."
Abstract

Cited by 27 (4 self)
 Add to MetaCart
In this paper we develop a new method to approximate the solution to the HamiltonJacobiBellman (HJB) equation which arises in optimal control when the plant is modeled by nonlinear dynamics. The approximation is comprised of two steps. First, successive approximation is used to reduce the HJB equation to a sequence of linear partial differential equations. These equations are then approximated via Galerkin's spectral method. The resulting algorithm has several important advantages over previously reported methods. Namely, the resulting control is in feedback form and its associated region of attraction is well defined. In addition, all computations are performed offline and the control can be made arbitrarily close to optimal. Accordingly this paper presents a new tool for designing nonlinear control systems that adhere to a prescribed integral performance criteria. Key Words: Nonlinear control, optimal control, HamiltonJacobiBellman equation, feedback synthesis, successive approxi...
Improving the Performance of Stabilizing Controls for Nonlinear Systems
 Control Systems Magazine
, 1996
"... There are a variety of tools for computing stabilizing feedback control laws for nonlinear systems. The difficulty is that these tools usually do not take into account the performance of the control and therefore, systematic improvement of an arbitrary stabilizing control law is extremely difficult ..."
Abstract

Cited by 21 (14 self)
 Add to MetaCart
(Show Context)
There are a variety of tools for computing stabilizing feedback control laws for nonlinear systems. The difficulty is that these tools usually do not take into account the performance of the control and therefore, systematic improvement of an arbitrary stabilizing control law is extremely difficult and often impossible. The objective of this paper is to present a design algorithm that addresses this problem. The algorithm that we present iteratively computes a sequence of control laws with increasingly improved performance. We also consider implementation issues and discuss some of the successes and difficulty that we have encountered. Finally, we present a number of illustrative examples and compare our algorithm with perturbation methods. Keywords: Nonlinear Control, Suboptimal Design Methodology, Galerkin's Spectral Method, Feedback Synthesis. Correspondence should be sent to Randy Beard, 444 CB BYU, Provo, Utah 84602, beard\Omega ee.byu.edu 1 Introduction If a system is modele...
An Approach to Rough Terrain Autonomous Mobility
 IN INTERNATIONAL CONFERENCE ON MOBILE PLANETARY ROBOTS
, 1998
"... Offroad autonomous navigation is one of the most difficult automation challenges from the point of view of constraints on mobility, speed of motion, lack of environmental structure, density of hazards, and typical lack of prior information. This paper describes an autonomous navigation software sy ..."
Abstract

Cited by 18 (0 self)
 Add to MetaCart
(Show Context)
Offroad autonomous navigation is one of the most difficult automation challenges from the point of view of constraints on mobility, speed of motion, lack of environmental structure, density of hazards, and typical lack of prior information. This paper describes an autonomous navigation software system for outdoor vehicles which includes perception, mapping, obstacle detection and avoidance, and goal seeking. It has been used on several vehicle testbeds including autonomous HMMWV's and planetary rover prototypes. To date, it has achieved speeds of 15 km/hr and excursions of 15 km. We introduce algorithms for optimal processing and computational stabilization of range imagery for terrain mapping purposes. We formulate the problem of trajectory generation as one of predictive control searching trajectories in command space. We also formulate the problem of goal arbitration in local autonomous mobility as an optimal control problem. We emphasize the modeling of vehicles in state space ...
Robot planning in the space of feasible actions: Two examples
 In Proc. of the IEEE Int. Conf. on Robotics&Automation (ICRA
, 1996
"... Several researchers in robotics and artijicial intelligence have found that the commonly used method ofplanning in a state (conjiguration) space is intractable in certain domains. This may be because the Cspace has very high dimensionality, the “Cspace obstacles ” are too diflcult to compute, or; ..."
Abstract

Cited by 14 (0 self)
 Add to MetaCart
Several researchers in robotics and artijicial intelligence have found that the commonly used method ofplanning in a state (conjiguration) space is intractable in certain domains. This may be because the Cspace has very high dimensionality, the “Cspace obstacles ” are too diflcult to compute, or; because a mapping between desired states and actions is not straightforward. Instead of using an inverse model that relates a desired state to an action to be executed by a robot, we have used a methodology that selects between the feasible actions that a robot might execute, in effect, circumventing many of the problems faced by configuration space planners. In this paper we discuss the implications of such a method and present two examples of working systems that employ this methodology. One system drives an autonomous crosscountry vehicle while the other controls a robotic excavator performing a trenching operation. 1
Proper Orthogonal Decomposition in Optimal Control of Fluids
 Int. J. Numer. Meth. Fluids
, 1999
"... In this article, we present a reduced order modeling approach suitable for active control of fluid dynamical systems based on proper orthogonal decomposition (POD). The rationale behind the reduced order modeling is that numerical simulation of NavierStokes equations is still too costly for the pur ..."
Abstract

Cited by 11 (0 self)
 Add to MetaCart
In this article, we present a reduced order modeling approach suitable for active control of fluid dynamical systems based on proper orthogonal decomposition (POD). The rationale behind the reduced order modeling is that numerical simulation of NavierStokes equations is still too costly for the purpose of optimization and control of unsteady flows. We examine the possibility of obtaining reduced order models that reduce computational complexity associated with the NavierStokes equations while capturing the essential dynamics by using the POD. The POD allows extraction of certain optimal set of basis functions  perhaps few  from a computational or experimental database through an eigenvalue analysis. The solution is then obtained as a linear combination of these optimal set of basis functions by means of Galerkin projection. This makes it attractive for optimal control and estimation of systems governed by partial differential equations. We here use it in active control of fluid flows ...
On the BehrensFisher Problem: A
 Review, Journal of Educational and Behavioral Statistics
"... The problem of Aiming Control is analyzed using a residence probability measure along with the associated notion of (D, T)stability. Key WordsLinear Systems; stochastic ontrol; large deviations; stability criteria; approximation theory. AbstractIn this paper, the problem of aiming control is fo ..."
Abstract

Cited by 11 (0 self)
 Add to MetaCart
The problem of Aiming Control is analyzed using a residence probability measure along with the associated notion of (D, T)stability. Key WordsLinear Systems; stochastic ontrol; large deviations; stability criteria; approximation theory. AbstractIn this paper, the problem of aiming control is formulated and analyzed in terms of the residence probability measure. Specifically, the notion of residence probability in a domain is introduced and its asymptotic expression is derived for linear systems with small, additive white noise. The associated notion of (D, T)stability, which characterizes the performance of stochastic systems with no equilibrium points, is introduced and investigated. Finally, the controllability of residence probability is studied and the necessary and sufficient conditions for (D, T)stabilizability are derived. The development is based on the asymptotic large deviations theory.