Results 1 - 10
of
19
Reinforcement learning: a survey
- Journal of Artificial Intelligence Research
, 1996
"... This paper surveys the field of reinforcement learning from a computer-science perspective. It is written to be accessible to researchers familiar with machine learning. Both the historical basis of the field and a broad selection of current work are summarized. Reinforcement learning is the problem ..."
Abstract
-
Cited by 1134 (21 self)
- Add to MetaCart
This paper surveys the field of reinforcement learning from a computer-science perspective. It is written to be accessible to researchers familiar with machine learning. Both the historical basis of the field and a broad selection of current work are summarized. Reinforcement learning is the problem faced by an agent that learns behavior through trial-and-error interactions with a dynamic environment. The work described here has a resemblance to work in psychology, but differs considerably in the details and in the use of the word "reinforcement." The paper discusses central issues of reinforcement learning, including trading off exploration and exploitation, establishing the foundations of the field via Markov decision theory, learning from delayed reinforcement, constructing empirical models to accelerate learning, making use of generalization and hierarchy, and coping with hidden state. It concludes with a survey of some implemented systems and an assessment of the practical utility of current methods for reinforcement learning.
An On-Line Method to Evolve Behavior and to Control a Miniature Robot in Real Time with Genetic Programming
- ADAPTIVE BEHAVIOR
, 1997
"... We present a novel evolutionary approach to robotic control of a real robot based on genetic programming (GP). Our approach uses genetic programming techniques that manipulate machine code to evolve control programs for robots. This variant of GP has several advantages over a conventional GP system, ..."
Abstract
-
Cited by 31 (5 self)
- Add to MetaCart
We present a novel evolutionary approach to robotic control of a real robot based on genetic programming (GP). Our approach uses genetic programming techniques that manipulate machine code to evolve control programs for robots. This variant of GP has several advantages over a conventional GP system, such as higher speed, lower memory requirements and better real time properties. Previous attempts to apply GP in robotics use simulations to evaluate control programs and have difficulties with learning tasks involving a real robot. We present an on-line control method that is evaluated in two different physical environments and applied to two tasks using the Khepera robot platform: obstacle avoidance and object following. The results show fast learning and good generalization.
Learning from History for Behavior-Based Mobile Robots in Non-Stationary Conditions
, 1998
"... . Learning in the mobile robot domain is a very challenging task, especially in nonstationary conditions. The behavior-based approach has proven to be useful in making mobile robots work in real-world situations. Since the behaviors are responsible for managing the interactions between the robots an ..."
Abstract
-
Cited by 27 (10 self)
- Add to MetaCart
. Learning in the mobile robot domain is a very challenging task, especially in nonstationary conditions. The behavior-based approach has proven to be useful in making mobile robots work in real-world situations. Since the behaviors are responsible for managing the interactions between the robots and its environment, observing their use can be exploited to model these interactions. In our approach, the robot is initially given a set of "behavior-producing" modules to choose from, and the algorithm provides a memory-based approach to dynamically adapt the selection of these behaviors according to the history of their use. The approach is validated using a vision- and sonar-based Pioneer I robot in non-stationary conditions, in the context of a multirobot foraging task. Results show the effectiveness of the approach in taking advantage of any regularities experienced in the world, leading to fast and adaptable specialization for the learning robot. Keywords: Multi-robot learning, histor...
A Consideration of the Biological and Psychological Foundations of Autonomous Robotics
, 1998
"... The new wave of robotics aims to provide robots with the capacity to learn, develop and evolve in interaction with their environments using biologically inspired techniques. This work is placed in perspective by considering its biological and psychological basis with reference to some of the grand t ..."
Abstract
-
Cited by 20 (9 self)
- Add to MetaCart
The new wave of robotics aims to provide robots with the capacity to learn, develop and evolve in interaction with their environments using biologically inspired techniques. This work is placed in perspective by considering its biological and psychological basis with reference to some of the grand theorists of living systems. In particular, we examine what it means to have a body by outlining theories of the mechanisms of bodily integration in multicellular organisms and their means of solidarity with the environment. We consider the implications of not having a living body for current ideas on robot learning, evolution, and cognition and issue words of caution about wishful attributions that can smuggle more into observations of robot behaviour than is scientifically supportable. To round off the arguments we take an obligatory swipe at ungrounded artificial intelligence but quickly move on to assess physical grounding and embodiment in terms of the rooted cognition of the living.
Neural reinforcement learning for behaviour synthesis
- Robotics and Autonomous Systems
, 1997
"... We present the results of a research aimed at improving the Q-learning method through the use of artificial neural networks. Neural implementations are interesting due to their generalisation ability. Two implementations are proposed: one with a competitive multilayer perceptron and the other with a ..."
Abstract
-
Cited by 17 (2 self)
- Add to MetaCart
We present the results of a research aimed at improving the Q-learning method through the use of artificial neural networks. Neural implementations are interesting due to their generalisation ability. Two implementations are proposed: one with a competitive multilayer perceptron and the other with a self-organising map. Results obtained on a task of learning an obstacle avoidance behaviour for the mobile miniature robot Khepera show that this last implementation is very effective, learning more than 40 times faster than the basic Q-learning implementation. These neural implementations are also compared with several Q-learning enhancements, like the Q-learning with Hamming distance, Q-learning with statistical clustering and Dyna-Q. Key Words: Neural Q-learning, reinforcement learning, obstacle avoidance behaviour, self-organising map, autonomous robotics.
Operant conditioning in skinnerbots
- Adaptive Behavior
, 1997
"... Instrumental (or operant) conditioning, a form of animal learning, is similar to reinforcement learning (Watkins, 1989) in that it allows an agent to adapt its actions to gain maximally from the environment while only being rewarded for correct performance. But animals learn much more complicated be ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
Instrumental (or operant) conditioning, a form of animal learning, is similar to reinforcement learning (Watkins, 1989) in that it allows an agent to adapt its actions to gain maximally from the environment while only being rewarded for correct performance. But animals learn much more complicated behaviors through instrumental conditioning than robots presently acquire through reinforcement learning. We describe a new computational model of the conditioning process that attempts to capture some of the aspects that are missing from simple reinforcement learning: conditioned reinforcers, shifting reinforcement contingencies, explicit action sequencing, and state space re nement. We apply our model to a task commonly used to study working memory in rats and monkeys: the DMTS (Delayed Match to Sample) task. Animals learn this task in stages. In simulation, our model also acquires the task in stages, in a similar manner. We have used the model to train an RWI B21 robot.
Learning by Biasing
- Proceeding of the 1998 IEEE International Conference on Robotics and Automation
, 1998
"... In the quest for machines that are able to learn, reinforcement learning (RL) is found to be an appealing learning methodology. A known problem in this learning method, however, is that it takes too long before the robot learns to associate suitable situation - action pairs. Due to this problem, RL ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
In the quest for machines that are able to learn, reinforcement learning (RL) is found to be an appealing learning methodology. A known problem in this learning method, however, is that it takes too long before the robot learns to associate suitable situation - action pairs. Due to this problem, RL has remained applicable only to simple tasks and discrete environment. To accelerate the learning process to a level required by real robot tasks, the traditional learning architecture has to be modified. We propose a modified reinforcement based robot skill acquisition and adaptation architecture. The architecture has two components: a bias and a learning components. The bias component imparts to the learner coarse a priori knowledge about the task. Subsequently, the learner refines the acquired actions through reinforcement learning. We have validated the architecture and the learning algorithm on a simulated TRC mobile robot for a goal reaching task. 1 Introduction Programming an autono...
On amount and quality of bias in reinforcement learning
- IEEE International Conference on Systems, Man and Cybernetics
, 1999
"... Reinforcement learning is widely regarded as elegant in theory but hopelessly slow in practice. This is because it is often studied under the assumption that there is little or no prior information about the task at hand. This assumption, however, is not the defining characteristic of learning. Lear ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Reinforcement learning is widely regarded as elegant in theory but hopelessly slow in practice. This is because it is often studied under the assumption that there is little or no prior information about the task at hand. This assumption, however, is not the defining characteristic of learning. Learning involves the incorporation of prior knowledge or bias that can greatly accelerates or otherwise improves the learning process. In this paper we address the influence of the amount and quality of bias on the speed of reinforcement learning. For a chosen class of learning problem different forms of biases are initially identified. Some of the bias are extracted from the knowledge of the environment, others from the task, and yet a few from both. Belief matrices, which reset Q-tables before learning commences, encode the biases. The average number of interactions between the agent and the environment is used to quantify the biases. Based on this performance measure, the biases are graded and some new results are reported. In addition, the paper compares continual learning to learning from scratch and presents results that clearly demonstrate the advantages of the former. Key Words: reinforcemnt learning, bias, continual learning.
Neural Network Learning of Variable Grid-Based Maps for the Autonomous Navigation of Robots
, 1997
"... This paper presents a map learning method that integrates the geometrical and topological paradigms. The geometrical component consists of a feed-forward neural network that interprets the robot's sensor readings efficiently. The topological map is created by learning a variable resolution partition ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
This paper presents a map learning method that integrates the geometrical and topological paradigms. The geometrical component consists of a feed-forward neural network that interprets the robot's sensor readings efficiently. The topological map is created by learning a variable resolution partitioning of the world. Every partition corresponds to a perceptually homogeneous region. The efficiency of the learning process is based on the use of local memory-based techniques for partitioning and of active learning techniques for selecting the most appropriate region to be explored next. Finally, the paper reports experimental results obtained with the autonomous mobile robot TESEO. 1 Introduction If it would be possible to classify all the situations a robot will face in a given enviroment, we could provide the robot with a priori knowledge to manage effectively each of them. However, the non-determinism of the interactions between a robot and its environment makes this approach infeasib...
Analysis and Design of Robot's Behavior: Towards a Methodology
- Proceedings of Sixth European Workshop on Learning Robot
, 1997
"... . We introduce a methodology to design reinforcement based control architectures for autonomous robots. It aims at systematizing the behavior analysis and the controller design. The methodology has to be seen as a conceptual framework in which a number of methods are to be defined. In this paper we ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
. We introduce a methodology to design reinforcement based control architectures for autonomous robots. It aims at systematizing the behavior analysis and the controller design. The methodology has to be seen as a conceptual framework in which a number of methods are to be defined. In this paper we use some more or less known methods to show the feasibility of the methodology. The postman-robot case study illustrates how the proposed methodology is applied. 1 Introduction An autonomous robot is defined as a physical device which performs a predefined task in a dynamic and unknown environment without any external help. It has the ability to sense the state of the environment using sensors, to perform physical actions in the environment like object-grasping or locomotion, and has a control architecture which determines the action to be performed given a sensed state. Designing and then programming such an autonomous robot is very hard. The main reason is that we do not have to design an...

