Results 1  10
of
19
Reinforcement learning: a survey
 Journal of Artificial Intelligence Research
, 1996
"... This paper surveys the field of reinforcement learning from a computerscience perspective. It is written to be accessible to researchers familiar with machine learning. Both the historical basis of the field and a broad selection of current work are summarized. Reinforcement learning is the problem ..."
Abstract

Cited by 1298 (23 self)
 Add to MetaCart
This paper surveys the field of reinforcement learning from a computerscience perspective. It is written to be accessible to researchers familiar with machine learning. Both the historical basis of the field and a broad selection of current work are summarized. Reinforcement learning is the problem faced by an agent that learns behavior through trialanderror interactions with a dynamic environment. The work described here has a resemblance to work in psychology, but differs considerably in the details and in the use of the word "reinforcement." The paper discusses central issues of reinforcement learning, including trading off exploration and exploitation, establishing the foundations of the field via Markov decision theory, learning from delayed reinforcement, constructing empirical models to accelerate learning, making use of generalization and hierarchy, and coping with hidden state. It concludes with a survey of some implemented systems and an assessment of the practical utility of current methods for reinforcement learning.
Spacetime Constraints
 Computer Graphics
, 1988
"... Spacetime constraints are a new method for creating character animation. The animator specifies what the character has to do, for instance, "jump from here to there, clearing a hurdle in between;" how the motion should be performed, for instance "don't waste energy," or "come down hard enough to spl ..."
Abstract

Cited by 315 (6 self)
 Add to MetaCart
Spacetime constraints are a new method for creating character animation. The animator specifies what the character has to do, for instance, "jump from here to there, clearing a hurdle in between;" how the motion should be performed, for instance "don't waste energy," or "come down hard enough to splatter whatever you land on;" the character's physical structurethe geometry, mass, connectivity, etc. of the parts; and the physical resources available to the character to accomplish the motion, for instance the character 's muscles, a floor to push off from, etc. The requirements contained in this description, together with Newton 's laws, comprise a problem of constrained optimization. The solution to this problem is a physically valid motion satisfying the "what" constraints and optimizing the "how" criteria. We present as examples a Luxo lamp performing a variety of coordinated motions. These realistic motions conform to such principles of traditional animation as anticipation, squas...
Locally Weighted Learning for Control
, 1996
"... Lazy learning methods provide useful representations and training algorithms for learning about complex phenomena during autonomous adaptive control of complex systems. This paper surveys ways in which locally weighted learning, a type of lazy learning, has been applied by us to control tasks. We ex ..."
Abstract

Cited by 159 (17 self)
 Add to MetaCart
Lazy learning methods provide useful representations and training algorithms for learning about complex phenomena during autonomous adaptive control of complex systems. This paper surveys ways in which locally weighted learning, a type of lazy learning, has been applied by us to control tasks. We explain various forms that control tasks can take, and how this affects the choice of learning paradigm. The discussion section explores the interesting impact that explicitly remembering all previous experiences has on the problem of learning to control.
Robot Trajectory Optimization using Approximate Inference
"... The general stochastic optimal control (SOC) problem in robotics scenarios is often too complex to be solved exactly and in near real time. A classical approximate solution is to first compute an optimal (deterministic) trajectory and then solve a local linearquadraticgaussian (LQG) perturbation m ..."
Abstract

Cited by 41 (14 self)
 Add to MetaCart
The general stochastic optimal control (SOC) problem in robotics scenarios is often too complex to be solved exactly and in near real time. A classical approximate solution is to first compute an optimal (deterministic) trajectory and then solve a local linearquadraticgaussian (LQG) perturbation model to handle the system stochasticity. We present a new algorithm for this approach which improves upon previous algorithms like iLQG. We consider a probabilistic model for which the maximum likelihood (ML) trajectory coincides with the optimal trajectory and which, in the LQG case, reproduces the classical SOC solution. The algorithm then utilizes approximate inference methods (similar to expectation propagation) that efficiently generalize to nonLQG systems. We demonstrate the algorithm on a simulated 39DoF humanoid robot. 1.
Stochastic Plans for Robotic Manipulation
, 1990
"... Geometric uncertainty is unavoidable when programming robots for physical applications. We propose a stochastic framework for manipulation planning where plans are ranked on the basis of expected cost. That is, we express the desirability of states and actions with a cost function and describe uncer ..."
Abstract

Cited by 35 (7 self)
 Add to MetaCart
Geometric uncertainty is unavoidable when programming robots for physical applications. We propose a stochastic framework for manipulation planning where plans are ranked on the basis of expected cost. That is, we express the desirability of states and actions with a cost function and describe uncertainty with probability distributions. We illustrate the approach with a new design for a programmable parts feeder, a mechanism that orients twodimensional parts using a sequence of openloop mechanical motions. We present a planning algorithm that accepts an nsided polygonal part as input and, in time O(n²), generates a stochastically optimal plan for orienting the part.
Nonlinear State Dynamics: Computational Methods and Manufacturing Application
 International Journal of Control
, 1999
"... : Stochastic optimal control problems are considered that are nonlinear in the state dynamics, but otherwise are an LQGP problem in the control, i.e. the dynamics are linear in the control vector and the costs are quadratic in the control. In addition the system is randomly perturbed by both continu ..."
Abstract

Cited by 13 (11 self)
 Add to MetaCart
: Stochastic optimal control problems are considered that are nonlinear in the state dynamics, but otherwise are an LQGP problem in the control, i.e. the dynamics are linear in the control vector and the costs are quadratic in the control. In addition the system is randomly perturbed by both continuous Gaussian (G) and discontinuous Poisson (P) noise. The approach to the solution is by way of computational stochastic dynamic programming using a new enhancement with a least squares equivalent LQGP problem in the state to accelerate the iterative convergence, without adding to the state space computational complexity since the LQGP coefficient equations are independent of the state. General Gauss statistics quadratures are developed to numerically handle Poisson jump integrals. The methods are illustrated for a multistage manufacturing system (MMS) with sufficient realism in an uncertain environment, together with implementation procedures needed to modify the formal general theory. 1. ...
Techniques in Computational Stochastic Dynamic Programming
 in Control and Dynamic Systems
, 1996
"... INTRODUCTION When Bellman introduced dynamic programming in his original monograph [8], computers were not as powerful as current personal computers. Hence, his description of the extreme computational demands as the Curse of Dimensionality [9] would not have had the super and massively parallel p ..."
Abstract

Cited by 12 (8 self)
 Add to MetaCart
INTRODUCTION When Bellman introduced dynamic programming in his original monograph [8], computers were not as powerful as current personal computers. Hence, his description of the extreme computational demands as the Curse of Dimensionality [9] would not have had the super and massively parallel processors of today in mind. However, massive and super computers can not overcome the Curse of Dimensionality alone, but parallel and vector computation can permit the solution of higher dimension than was previously possible and thus permit more realistic dynamic programming applications. Today such large problems are called Grand and National Challenge problems [45, 46] in high performance computing. Today's availability of high performance vector supercomputers and massively parallel processors have made it possible to compute optimal policies and values of control systems for much larger dimensions than was possible earlier. Advance
The NLQGP Problem: Application to a Multistage Manufacturing System
 Proceedings of the 1998 American Control Conference,vol
, 1998
"... The Nonlinear Quadratic Gaussian Poisson (NLQGP) problem denotes an optimal control problem with nonlinear dynamics and quadratic costs with both Gaussian and Poisson noise disturbances. The NLQGP problem provides a comprehensive model for many applications since the noises considered are quite robu ..."
Abstract

Cited by 8 (8 self)
 Add to MetaCart
The Nonlinear Quadratic Gaussian Poisson (NLQGP) problem denotes an optimal control problem with nonlinear dynamics and quadratic costs with both Gaussian and Poisson noise disturbances. The NLQGP problem provides a comprehensive model for many applications since the noises considered are quite robust and add extra realism to physical models. The problem is examined and is illustrated with an application to a multistage manufacturing system (MMS) in an uncertain environment. 1 Introduction The nonlinear dynamics, quadratic performance, Gaussian noise and Poisson noise or NLQGP problem, has its dynamics governed by the stochastic differential equation (SDE) dX(t) = [F 0 (X(t); t) + F 1 (X(t); t)U(t)]dt +G(t)dW(t) + H 0 (X(t); t)dP 0 (t) + [H 1 (X(t); t)U(t)]dP 1 (t) (1) for general Markov processes in continuous time, with m \Theta 1 state vector X(t), n \Theta 1 control vector U(t), r \Theta 1 Gaussian noise vector dW(t), and q ` \Theta1 spacetime Poisson noise vectors dP ` (t), f...
Design of affine controllers via convex optimization
, 2008
"... Abstract—We consider a discretetime timevarying linear dynamical system, perturbed by process noise, with linear noise corrupted measurements, over a finite horizon. We address the problem of designing a general affine causal controller, in which the control input is an affine function of all prev ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
Abstract—We consider a discretetime timevarying linear dynamical system, perturbed by process noise, with linear noise corrupted measurements, over a finite horizon. We address the problem of designing a general affine causal controller, in which the control input is an affine function of all previous measurements, in order to minimize a convex objective, in either a stochastic or worstcase setting. This controller design problem is not convex in its natural form, but can be transformed to an equivalent convex optimization problem by a nonlinear change of variables, which allows us to efficiently solve the problem. Our method is related to the classicaldesign procedure for timeinvariant, infinitehorizon linear controller design, and the more recent purified output control method. We illustrate the method with applications to supply chain optimization and dynamic portfolio optimization, and show the method can be combined with model predictive control techniques when perfect state information is available. Index Terms—Affine controller, dynamical system, dynamic linear programming (DLP), linear exponential quadratic Gaussian (LEQG), linear quadratic Gaussian (LQG), model predictive control (MPC), proportionalintegralderivative (PID). I.
Computational Stochastic Dynamic Programming Problems: Groundwater Quality Remediation
 Proc. 33rd IEEE Conference on Decision and Control
, 1994
"... The general objective is the development of supercomputing algorithms for the optimal feedback control of larger scale, continuous time, nonlinear, Markov stochastic dynamical systems. The numerical procedures are based on PDE methods such as the finite element or difference methods for stochastic d ..."
Abstract

Cited by 4 (4 self)
 Add to MetaCart
The general objective is the development of supercomputing algorithms for the optimal feedback control of larger scale, continuous time, nonlinear, Markov stochastic dynamical systems. The numerical procedures are based on PDE methods such as the finite element or difference methods for stochastic dynamic programming, as well as other advanced numerical methods. The algorithms have been implemented on the Cray vector multiprocessors and massively parallel Connection Machines. These implementations have utilized advanced supercomputing techniques such as parallelization, vectorization and data structures and decompositions. Problems in 5 state space dimensions have been solved. Large dimensions are required by some applications, such as groundwater remediation and resource management. The principal application focus here is the remediation for groundwater quality through the control of pumping policies when the groundwater is subject to uncertain introduction of contaminants.