Results 1  10
of
35
Reinforcement learning: a survey
 Journal of Artificial Intelligence Research
, 1996
"... This paper surveys the field of reinforcement learning from a computerscience perspective. It is written to be accessible to researchers familiar with machine learning. Both the historical basis of the field and a broad selection of current work are summarized. Reinforcement learning is the problem ..."
Abstract

Cited by 1690 (26 self)
 Add to MetaCart
(Show Context)
This paper surveys the field of reinforcement learning from a computerscience perspective. It is written to be accessible to researchers familiar with machine learning. Both the historical basis of the field and a broad selection of current work are summarized. Reinforcement learning is the problem faced by an agent that learns behavior through trialanderror interactions with a dynamic environment. The work described here has a resemblance to work in psychology, but differs considerably in the details and in the use of the word "reinforcement." The paper discusses central issues of reinforcement learning, including trading off exploration and exploitation, establishing the foundations of the field via Markov decision theory, learning from delayed reinforcement, constructing empirical models to accelerate learning, making use of generalization and hierarchy, and coping with hidden state. It concludes with a survey of some implemented systems and an assessment of the practical utility of current methods for reinforcement learning.
Spacetime Constraints
 Computer Graphics
, 1988
"... Spacetime constraints are a new method for creating character animation. The animator specifies what the character has to do, for instance, "jump from here to there, clearing a hurdle in between;" how the motion should be performed, for instance "don't waste energy," or &quo ..."
Abstract

Cited by 378 (6 self)
 Add to MetaCart
Spacetime constraints are a new method for creating character animation. The animator specifies what the character has to do, for instance, "jump from here to there, clearing a hurdle in between;" how the motion should be performed, for instance "don't waste energy," or "come down hard enough to splatter whatever you land on;" the character's physical structurethe geometry, mass, connectivity, etc. of the parts; and the physical resources available to the character to accomplish the motion, for instance the character 's muscles, a floor to push off from, etc. The requirements contained in this description, together with Newton 's laws, comprise a problem of constrained optimization. The solution to this problem is a physically valid motion satisfying the "what" constraints and optimizing the "how" criteria. We present as examples a Luxo lamp performing a variety of coordinated motions. These realistic motions conform to such principles of traditional animation as anticipation, squas...
Locally Weighted Learning for Control
, 1996
"... Lazy learning methods provide useful representations and training algorithms for learning about complex phenomena during autonomous adaptive control of complex systems. This paper surveys ways in which locally weighted learning, a type of lazy learning, has been applied by us to control tasks. We ex ..."
Abstract

Cited by 197 (19 self)
 Add to MetaCart
(Show Context)
Lazy learning methods provide useful representations and training algorithms for learning about complex phenomena during autonomous adaptive control of complex systems. This paper surveys ways in which locally weighted learning, a type of lazy learning, has been applied by us to control tasks. We explain various forms that control tasks can take, and how this affects the choice of learning paradigm. The discussion section explores the interesting impact that explicitly remembering all previous experiences has on the problem of learning to control.
Robot Trajectory Optimization using Approximate Inference
"... The general stochastic optimal control (SOC) problem in robotics scenarios is often too complex to be solved exactly and in near real time. A classical approximate solution is to first compute an optimal (deterministic) trajectory and then solve a local linearquadraticgaussian (LQG) perturbation m ..."
Abstract

Cited by 68 (16 self)
 Add to MetaCart
(Show Context)
The general stochastic optimal control (SOC) problem in robotics scenarios is often too complex to be solved exactly and in near real time. A classical approximate solution is to first compute an optimal (deterministic) trajectory and then solve a local linearquadraticgaussian (LQG) perturbation model to handle the system stochasticity. We present a new algorithm for this approach which improves upon previous algorithms like iLQG. We consider a probabilistic model for which the maximum likelihood (ML) trajectory coincides with the optimal trajectory and which, in the LQG case, reproduces the classical SOC solution. The algorithm then utilizes approximate inference methods (similar to expectation propagation) that efficiently generalize to nonLQG systems. We demonstrate the algorithm on a simulated 39DoF humanoid robot. 1.
Dual Dynamics: Designing Behavior Systems for Autonomous Robots
 Artificial Life and Robotics
, 1998
"... This paper describes the #dual dynamics" #DD# formal scheme for robotic behavior control systems. Behaviors are designed as dynamical systems which are speci#ed in ordinary di#erential equations. A key idea for the DD scheme is that a robotic agent can work in di#erent #modes", which lead ..."
Abstract

Cited by 48 (5 self)
 Add to MetaCart
This paper describes the #dual dynamics" #DD# formal scheme for robotic behavior control systems. Behaviors are designed as dynamical systems which are speci#ed in ordinary di#erential equations. A key idea for the DD scheme is that a robotic agent can work in di#erent #modes", which lead to qualitatively different behavioral patterns. Intuitively, modes can be likened to #moods". Mathematically, transitions between modes are bifurcations in the control system. Key words: behaviorbased robotics, control, dynamical systems. 1 Introduction The #behavior based" approach to designing mobile robots #3##8# has been very fertile. However, the #eld su#ers somewhat from a certain lack of formal theory, which in turn hampers the understanding and the design of robots with increasingly complex behavioral repertoires. This paper introduces a formal model of complete behavior control architectures for mobile robots, the #dual dynamics" #DD# scheme. Behaviors are construed as dynamical systems,...
Stochastic Plans for Robotic Manipulation
, 1990
"... Geometric uncertainty is unavoidable when programming robots for physical applications. We propose a stochastic framework for manipulation planning where plans are ranked on the basis of expected cost. That is, we express the desirability of states and actions with a cost function and describe uncer ..."
Abstract

Cited by 37 (9 self)
 Add to MetaCart
Geometric uncertainty is unavoidable when programming robots for physical applications. We propose a stochastic framework for manipulation planning where plans are ranked on the basis of expected cost. That is, we express the desirability of states and actions with a cost function and describe uncertainty with probability distributions. We illustrate the approach with a new design for a programmable parts feeder, a mechanism that orients twodimensional parts using a sequence of openloop mechanical motions. We present a planning algorithm that accepts an nsided polygonal part as input and, in time O(n²), generates a stochastically optimal plan for orienting the part.
Nonlinear State Dynamics: Computational Methods and Manufacturing Application
 International Journal of Control
, 1999
"... : Stochastic optimal control problems are considered that are nonlinear in the state dynamics, but otherwise are an LQGP problem in the control, i.e. the dynamics are linear in the control vector and the costs are quadratic in the control. In addition the system is randomly perturbed by both continu ..."
Abstract

Cited by 13 (11 self)
 Add to MetaCart
: Stochastic optimal control problems are considered that are nonlinear in the state dynamics, but otherwise are an LQGP problem in the control, i.e. the dynamics are linear in the control vector and the costs are quadratic in the control. In addition the system is randomly perturbed by both continuous Gaussian (G) and discontinuous Poisson (P) noise. The approach to the solution is by way of computational stochastic dynamic programming using a new enhancement with a least squares equivalent LQGP problem in the state to accelerate the iterative convergence, without adding to the state space computational complexity since the LQGP coefficient equations are independent of the state. General Gauss statistics quadratures are developed to numerically handle Poisson jump integrals. The methods are illustrated for a multistage manufacturing system (MMS) with sufficient realism in an uncertain environment, together with implementation procedures needed to modify the formal general theory. 1. ...
Design of affine controllers via convex optimization
, 2008
"... Abstract—We consider a discretetime timevarying linear dynamical system, perturbed by process noise, with linear noise corrupted measurements, over a finite horizon. We address the problem of designing a general affine causal controller, in which the control input is an affine function of all prev ..."
Abstract

Cited by 13 (1 self)
 Add to MetaCart
(Show Context)
Abstract—We consider a discretetime timevarying linear dynamical system, perturbed by process noise, with linear noise corrupted measurements, over a finite horizon. We address the problem of designing a general affine causal controller, in which the control input is an affine function of all previous measurements, in order to minimize a convex objective, in either a stochastic or worstcase setting. This controller design problem is not convex in its natural form, but can be transformed to an equivalent convex optimization problem by a nonlinear change of variables, which allows us to efficiently solve the problem. Our method is related to the classicaldesign procedure for timeinvariant, infinitehorizon linear controller design, and the more recent purified output control method. We illustrate the method with applications to supply chain optimization and dynamic portfolio optimization, and show the method can be combined with model predictive control techniques when perfect state information is available. Index Terms—Affine controller, dynamical system, dynamic linear programming (DLP), linear exponential quadratic Gaussian (LEQG), linear quadratic Gaussian (LQG), model predictive control (MPC), proportionalintegralderivative (PID). I.
Techniques in Computational Stochastic Dynamic Programming
 in Control and Dynamic Systems
, 1996
"... INTRODUCTION When Bellman introduced dynamic programming in his original monograph [8], computers were not as powerful as current personal computers. Hence, his description of the extreme computational demands as the Curse of Dimensionality [9] would not have had the super and massively parallel p ..."
Abstract

Cited by 12 (8 self)
 Add to MetaCart
(Show Context)
INTRODUCTION When Bellman introduced dynamic programming in his original monograph [8], computers were not as powerful as current personal computers. Hence, his description of the extreme computational demands as the Curse of Dimensionality [9] would not have had the super and massively parallel processors of today in mind. However, massive and super computers can not overcome the Curse of Dimensionality alone, but parallel and vector computation can permit the solution of higher dimension than was previously possible and thus permit more realistic dynamic programming applications. Today such large problems are called Grand and National Challenge problems [45, 46] in high performance computing. Today's availability of high performance vector supercomputers and massively parallel processors have made it possible to compute optimal policies and values of control systems for much larger dimensions than was possible earlier. Advance
The NLQGP Problem: Application to a Multistage Manufacturing System
 Proceedings of the 1998 American Control Conference,vol
, 1998
"... The Nonlinear Quadratic Gaussian Poisson (NLQGP) problem denotes an optimal control problem with nonlinear dynamics and quadratic costs with both Gaussian and Poisson noise disturbances. The NLQGP problem provides a comprehensive model for many applications since the noises considered are quite robu ..."
Abstract

Cited by 8 (8 self)
 Add to MetaCart
(Show Context)
The Nonlinear Quadratic Gaussian Poisson (NLQGP) problem denotes an optimal control problem with nonlinear dynamics and quadratic costs with both Gaussian and Poisson noise disturbances. The NLQGP problem provides a comprehensive model for many applications since the noises considered are quite robust and add extra realism to physical models. The problem is examined and is illustrated with an application to a multistage manufacturing system (MMS) in an uncertain environment. 1 Introduction The nonlinear dynamics, quadratic performance, Gaussian noise and Poisson noise or NLQGP problem, has its dynamics governed by the stochastic differential equation (SDE) dX(t) = [F 0 (X(t); t) + F 1 (X(t); t)U(t)]dt +G(t)dW(t) + H 0 (X(t); t)dP 0 (t) + [H 1 (X(t); t)U(t)]dP 1 (t) (1) for general Markov processes in continuous time, with m \Theta 1 state vector X(t), n \Theta 1 control vector U(t), r \Theta 1 Gaussian noise vector dW(t), and q ` \Theta1 spacetime Poisson noise vectors dP ` (t), f...