Results 1 - 10
of
10
Reinforcement Learning I: Introduction
, 1998
"... In which we try to give a basic intuitive sense of what reinforcement learning is and how it differs and relates to other fields, e.g., supervised learning and neural networks, genetic algorithms and artificial life, control theory. Intuitively, RL is trial and error (variation and selection, search ..."
Abstract
-
Cited by 2829 (76 self)
- Add to MetaCart
In which we try to give a basic intuitive sense of what reinforcement learning is and how it differs and relates to other fields, e.g., supervised learning and neural networks, genetic algorithms and artificial life, control theory. Intuitively, RL is trial and error (variation and selection, search) plus learning (association, memory). We argue that RL is the only field that seriously addresses the special features of the problem of learning from interaction to achieve long-term goals.
Learning and Sequential Decision Making
- LEARNING AND COMPUTATIONAL NEUROSCIENCE
, 1989
"... In this report we show how the class of adaptive prediction methods that Sutton called "temporal difference," or TD, methods are related to the theory of squential decision making. TD methods have been used as "adaptive critics" in connectionist learning systems, and have been proposed as models of ..."
Abstract
-
Cited by 185 (10 self)
- Add to MetaCart
In this report we show how the class of adaptive prediction methods that Sutton called "temporal difference," or TD, methods are related to the theory of squential decision making. TD methods have been used as "adaptive critics" in connectionist learning systems, and have been proposed as models of animal learning in classical conditioning experiments. Here we relate TD methods to decision tasks formulated in terms of a stochastic dynamical system whose behavior unfolds over time under the influence of a decision maker's actions. Strategies are sought for selecting actions so as to maximize a measure of long-term payoff gain. Mathematically, tasks such as this can be formulated as Markovian decision problems, and numerous methods have been proposed for learning how to solve such problems. We show how a TD method can be understood as a novel synthesis of concepts from the theory of stochastic dynamic programming, which comprises the standard method for solving such tasks when a model of the dynamical system is available, and the theory of parameter estimation, which provides the appropriate context for studying learning rules in the form of equations for updating associative strengths in behavioral models, or connection weights in connectionist networks. Because this report is oriented primarily toward the non-engineer interested in animal learning, it presents tutorials on stochastic sequential decision tasks, stochastic dynamic programming, and parameter estimation.
NeuroAnimator: Fast Neural Network Emulation and Control of Physics-Based Models
, 1998
"... Animation through the numerical simulation of physics-based graphics models offers unsurpassed realism, but it can be computationally demanding. Likewise, finding controllers that enable physics-based models to produce desired animations usually entails formidable computational cost. This paper de ..."
Abstract
-
Cited by 78 (3 self)
- Add to MetaCart
Animation through the numerical simulation of physics-based graphics models offers unsurpassed realism, but it can be computationally demanding. Likewise, finding controllers that enable physics-based models to produce desired animations usually entails formidable computational cost. This paper demonstrates the possibility of replacing the numerical simulation and control of model dynamics with a dramatically more efficient alternative. In particular, we propose the NeuroAnimator, a novel approach to creating physically realistic animation that exploits neural networks. NeuroAnimators are automatically trained off-line to emulate physical dynamics through the observation of physics-based models in action. Depending on the model, its neural network emulator can yield physically realistic animation one or two orders of magnitude faster than conventional numerical simulation. Furthermore, by exploiting the network structure of the NeuroAnimator, we introduce a fast algorithm for learning controllers that enables either physics-based models or their neural network emulators to synthesize motions satisfying prescribed animation goals. We demonstrate NeuroAnimators for passive and active (actuated) rigid body, articulated, and deformable physics-based models.
Strategy Learning with Multilayer Connectionist Representations
- In Proceedings of the Fourth International Workshop on Machine Learning
, 1987
"... Results are presented that demonstrate the learning and fine-tuning of search strategies using connectionist mechanisms. Previous studies of strategy learning within the symbolic, production-rule formalism have not addressed fine-tuning behavior. Here a two-layer connectionist system is presented th ..."
Abstract
-
Cited by 65 (4 self)
- Add to MetaCart
Results are presented that demonstrate the learning and fine-tuning of search strategies using connectionist mechanisms. Previous studies of strategy learning within the symbolic, production-rule formalism have not addressed fine-tuning behavior. Here a two-layer connectionist system is presented that develops its search from a weak to a task-specific strategy and fine-tunes its performance. The system is applied to a simulated, realtime, balance-control task. We compare the performance of one-layer and two-layer networks, showing that the ability of the two-layer network to discover new features and thus enhance the original representation is critical to solving the balancing task.
Learning and Problem Solving with Multilayer Connectionist Systems
, 1986
"... Learning and Problem Solving with Multilayer Connectionist Systems September 1986 Charles William Anderson B.S., University of Nebraska M.S., University of Massachusetts Ph.D., University of Massachusetts Directed by: Professor Andrew G. Barto The di#culties of learning in multilayered netwo ..."
Abstract
-
Cited by 49 (1 self)
- Add to MetaCart
Learning and Problem Solving with Multilayer Connectionist Systems September 1986 Charles William Anderson B.S., University of Nebraska M.S., University of Massachusetts Ph.D., University of Massachusetts Directed by: Professor Andrew G. Barto The di#culties of learning in multilayered networks of computational units has limited the use of connectionist systems in complex domains. This dissertation elucidates the issues of learning in a network's hidden units, and reviews methods for addressing these issues that have been developed through the years. Issues of learning in hidden units are shown to be analogous to learning issues for multilayer systems employing symbolic representations.
Reinforcement Learning And Its Application To Control
, 1992
"... Learning control involves modifying a controller's behavior to improve its performance as measured by some predefined index of performance (IP). If control actions that improve performance as measured by the IP are known, supervised learning methods, or methods for learning from examples, can be us ..."
Abstract
-
Cited by 49 (2 self)
- Add to MetaCart
Learning control involves modifying a controller's behavior to improve its performance as measured by some predefined index of performance (IP). If control actions that improve performance as measured by the IP are known, supervised learning methods, or methods for learning from examples, can be used to train the controller. But when such control actions are not known a priori, appropriate control behavior has to be inferred from observations of the IP. One can distinguish between two classes of methods for training controllers under such circumstances. Indirect methods involve constructing a model of the problem's IP and using the model to obtain training information for the controller. On the other hand, direct, or model-free,...
Reinforcement learning is direct adaptive optimal control
- In Proceedings of the American Control Conference
, 1991
"... optimal controls are estimated directly more attractive. We view reinforcement learning methods as a computationally simple, direct approach to the adaptive optimal control of nonlinear systems. For concreteness, we focus on one reinforcement learning method (Q-learning) and on its analytically prov ..."
Abstract
-
Cited by 39 (4 self)
- Add to MetaCart
optimal controls are estimated directly more attractive. We view reinforcement learning methods as a computationally simple, direct approach to the adaptive optimal control of nonlinear systems. For concreteness, we focus on one reinforcement learning method (Q-learning) and on its analytically proven capabilities for one class of adaptive optimal control problems (markov decision problems with unknown transition probabilities).
A New Approach for Multi-label Classification Based on Default Hierarchies and Organizational Learning
"... Learning Classifier Systems (LCSs) are a class of expert systems that use a knowledge base of decision rules and a genetic algorithm (GA) [9] as a discovery mechanism. The set of decision rules allows the LCS to represent and learn control strategies, while the robust search ability of the GA allows ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
Learning Classifier Systems (LCSs) are a class of expert systems that use a knowledge base of decision rules and a genetic algorithm (GA) [9] as a discovery mechanism. The set of decision rules allows the LCS to represent and learn control strategies, while the robust search ability of the GA allows it to search for new rules based on the performance of existing rules. LCS were first designed to solve machine learning problems, especially classification problems. Classification problems are problems where instances of a data set belong to a set of classes, and the system needs to infer, based on past experience, the correct class (or classes) of new, previously unseen, instances. However, the features of LCSs are also very useful for solving reinforcement learning problems, a class of problems where the system should learn
Approximate Dynamic Programming- I: Modeling
"... The first step in solving a stochastic optimization problem is providing a mathematical model. How the problem is modeled can impact the solution strategy. In this chapter, we provide a flexible modeling framework that uses a classic control-theoretic framework, avoiding devices such as onestep tran ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
The first step in solving a stochastic optimization problem is providing a mathematical model. How the problem is modeled can impact the solution strategy. In this chapter, we provide a flexible modeling framework that uses a classic control-theoretic framework, avoiding devices such as onestep transition matrices. We describe the five fundamental elements of any stochastic, dynamic program. Different notational conventions are introduced, and the types of policies that can be used to guide decisions are described in detail. This discussion puts approximate dynamic programming in the context of a variety of other algorithmic strategies by using the modeling framework to describe Stochastic optimization problems pose unique challenges in how they are represented mathematically. These problems arise in a number of different communities, often in the context of problems which introduce specific computational characteristics. As a result, a number of contrasting notational styles have evolved which complicate our ability to communicate research across communities. This
Learning to Control Dynamic Systems Via Associative Reinforcement Learning
"... this paper. The internal critic network has 8 input units, a hidden layer of 10 back-propagation units, and a single temporal difference output unit. The controller has 4 input units and a single action unit. In simulations of the supervised learning method, a "noisy" linear unit was used as the act ..."
Abstract
- Add to MetaCart
this paper. The internal critic network has 8 input units, a hidden layer of 10 back-propagation units, and a single temporal difference output unit. The controller has 4 input units and a single action unit. In simulations of the supervised learning method, a "noisy" linear unit was used as the action unit, while in simulations of the reinforcement learning method, a stochastic real-valued (SRV) unit [25] was used.

