Results 1 - 10
of
23
Reinforcement learning: a survey
- Journal of Artificial Intelligence Research
, 1996
"... This paper surveys the field of reinforcement learning from a computer-science perspective. It is written to be accessible to researchers familiar with machine learning. Both the historical basis of the field and a broad selection of current work are summarized. Reinforcement learning is the problem ..."
Abstract
-
Cited by 1134 (21 self)
- Add to MetaCart
This paper surveys the field of reinforcement learning from a computer-science perspective. It is written to be accessible to researchers familiar with machine learning. Both the historical basis of the field and a broad selection of current work are summarized. Reinforcement learning is the problem faced by an agent that learns behavior through trial-and-error interactions with a dynamic environment. The work described here has a resemblance to work in psychology, but differs considerably in the details and in the use of the word "reinforcement." The paper discusses central issues of reinforcement learning, including trading off exploration and exploitation, establishing the foundations of the field via Markov decision theory, learning from delayed reinforcement, constructing empirical models to accelerate learning, making use of generalization and hierarchy, and coping with hidden state. It concludes with a survey of some implemented systems and an assessment of the practical utility of current methods for reinforcement learning.
A Feedback Control Structure for On-line Learning Tasks
- Robotics and Autonomous Systems
, 1997
"... This paper addresses adaptive control architectures for systems that respond autonomously to changing tasks. Such systems often have many sensory and motor alternatives and behavior drawn from these produces varying quality solutions. The objective is then to ground behavior in control laws which, c ..."
Abstract
-
Cited by 57 (19 self)
- Add to MetaCart
This paper addresses adaptive control architectures for systems that respond autonomously to changing tasks. Such systems often have many sensory and motor alternatives and behavior drawn from these produces varying quality solutions. The objective is then to ground behavior in control laws which, combined with resources, enumerate closed-loop behavioral alternatives. Use of such controllers leads to analyzable and predictable composite systems, permitting the construction of abstract behavioral models. Here, discrete event system and reinforcement learning techniques are employed to constrain the behavioral alternatives and to synthesize behavior on-line. To illustrate this, a quadruped robot learning a turning gait subject to safety and kinematic constraints is presented. Keywords: Control Composition, DEDS, Reinforcement Learning, Walking. 1 Introduction Behavior generation in complex sensorimotor systems can be viewed as a scheduling problem in which a policy for engaging resour...
Spatial Cognition and Neuro-Mimetic Navigation: A Model of Hippocampal Place Cell Activity
, 2000
"... . A computational model of hippocampal activity during spatial cognition and navigation tasks is presented. The spatial representation in our model of the rat hippocampus is built on-line during exploration via two processing streams. An allothetic vision-based representation is built by unsupervise ..."
Abstract
-
Cited by 52 (13 self)
- Add to MetaCart
. A computational model of hippocampal activity during spatial cognition and navigation tasks is presented. The spatial representation in our model of the rat hippocampus is built on-line during exploration via two processing streams. An allothetic vision-based representation is built by unsupervised Hebbian learning extracting spatio-temporal properties of the environment from visual input. An idiothetic representation is learned based on internal movement-related information provided by path integration. On the level of the hippocampus, allothetic and idiothetic representations are integrated to yield a stable representation of the environment by a population of localized overlapping CA3-CA1 place fields. The hippocampal spatial representation is used as a basis for goal-oriented spatial behavior. We focus on the neural pathway connecting the hippocampus to the nucleus accumbens. Place cells drive a population of locomotor action neurons in the nucleus accumbens. Reward-based learnin...
Shaping Robot Behavior Using Principles from Instrumental Conditioning
, 1997
"... Shaping by successive approximations is an important animal training technique in which behavior is gradually adjusted in response to strategically timed reinforcements. We describe a computational model of this shaping process and its implementation on a mobile robot. Innate behaviors in our model ..."
Abstract
-
Cited by 36 (1 self)
- Add to MetaCart
Shaping by successive approximations is an important animal training technique in which behavior is gradually adjusted in response to strategically timed reinforcements. We describe a computational model of this shaping process and its implementation on a mobile robot. Innate behaviors in our model are sequences of actions and enabling conditions, and shaping is a behavior editing process realized by multiple editing mechanisms. The model replicates some fundamental phenomena associated with instrumental learning in animals, and allows an RWI B21 robot to learn several distinct tasks derived from the same innate behavior. 1. Introduction Service dogs trained to assist a disabled person will respond to over 60 verbal commands to, for example, turn on lights, open a refrigerator door, or retrieve a dropped object [9]. Chicks can be taught to play a toy piano (peck out a key sequence until a reinforcement is received at the end of the tune) [6], and rats have been conditioned to perform c...
A Hybrid Architecture for Adaptive Robot Control
, 2000
"... The autonomous operation of robot systems in an uncertain environment poses many challenges to their control architecture. Such systems must be reactive with respect to local disturbances and uncertainties and have to adapt to more persistent changes in environmental conditions and task requirements ..."
Abstract
-
Cited by 29 (0 self)
- Add to MetaCart
The autonomous operation of robot systems in an uncertain environment poses many challenges to their control architecture. Such systems must be reactive with respect to local disturbances and uncertainties and have to adapt to more persistent changes in environmental conditions and task requirements. In autonomous systems, this adaptation must often occur without outside intervention and within a single trial while avoiding catastrophic failure. This dissertation
Rapid Unsupervised Connectionist Learning for Backing a Robot with Two Trailers
- In IEEE Int'l Conf. on Robotics and Automation
"... This paper presents an application of a connectionist control-learning system designed for use on an autonomous mini-robot. This system was formerly shown to form useful two-dimensional mappings rapidly when applied to backing a car with a single trailer. In the current paper the learning system is ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
This paper presents an application of a connectionist control-learning system designed for use on an autonomous mini-robot. This system was formerly shown to form useful two-dimensional mappings rapidly when applied to backing a car with a single trailer. In the current paper the learning system is extended to three dimensions and applied to a similar but significantly more difficult problem. The system is shown to be capable of rapid unsupervised learning of output responses in temporal domains through the use of eligibility traces and inter-neural cooperation within topologically defined neighborhoods. 1 Introduction Connectionist control-learning systems have recently received much attention; numerous papers and several books have been published on this topic in the last few years (e.g. [13, 17]). An overview of many such systems as they have been applied to robot control is given by Prabhu and Garg [16]. Most of these works, however, have concentrated on simulated systems and the...
Learning from innate behaviors: A Quantitative Evaluation of Neural Network Controllers
- Machine Learning
, 1998
"... this paper, only the front seven sonar and seven IR sensors were used. These covered approximately 157 ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
this paper, only the front seven sonar and seven IR sensors were used. These covered approximately 157
A hybrid architecture for learning robot control tasks
- Systems. Stanford University
, 1999
"... ..."
Learning Robot Control -- Using Control Policies as Abstract Actions
, 1998
"... Autonomous robot systems operating in an uncertain environment have to be able to cope with new situations and task requirements. Important properties of the control architecture of such systems are thus that it is reactive, allows for flexible responses to novel situations, and that it adapts to ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
Autonomous robot systems operating in an uncertain environment have to be able to cope with new situations and task requirements. Important properties of the control architecture of such systems are thus that it is reactive, allows for flexible responses to novel situations, and that it adapts to longer lasting changes in the environment or the task requirements. In the extreme case, this learning has to occur without the direct influence of an outside teacher, making the reinforcement learning paradigm an attractive option since it allows to learn sequences of behavior from simple reinforcement signals [1, 17]. However, while these techniques have been applied to simple robot systems and in simulation [2, 5, 7, 10, 11, 12, 6],...
Structure-Adaptable Neurocontrollers: A Hardware-Friendly Approach
- In Proceedings of the International Work-Conference on Artificial and Natural Neural Networks IWANN97
, 1997
"... . This paper presents a hardware-friendly approach for adapting the structure of a reinforcement, learning-based neurocontroller. An unsupervised clustering algorithm is used to partition the state space of a system and to adapt the size of its reinforcement module. In the wellknown inverted pendulu ..."
Abstract
-
Cited by 7 (4 self)
- Add to MetaCart
. This paper presents a hardware-friendly approach for adapting the structure of a reinforcement, learning-based neurocontroller. An unsupervised clustering algorithm is used to partition the state space of a system and to adapt the size of its reinforcement module. In the wellknown inverted pendulum problem, the system has proven to be much faster than previous neurocontroller approaches. We are currently working on an implementation of the system using field-programmable logic devices. 1 Introduction A major problem in nonlinear control is the tuning and adaptation of the controller. For this purpose a model of the process is usually developed. Then, following an approximation of the inverse relation between the desired outputs and the control actions, the controller is adjusted. However, for many real-world problems there is no available quantitative data regarding input-output relations, rendering analytical modeling very difficult [6]; furthermore, errors in the model can lead to...

