Results 1 - 10
of
31
Modeling Adaptive Autonomous Agents
- Artificial Life
, 1994
"... One category of researchers in artificial life is concerned with modeling and building so-called adaptive autonomous agents. Autonomous agents are systems that inhabit a dynamic, unpredictable environment in which they try to satisfy a set of time-dependent goals or motivations. Agents are said to b ..."
Abstract
-
Cited by 257 (2 self)
- Add to MetaCart
(Show Context)
One category of researchers in artificial life is concerned with modeling and building so-called adaptive autonomous agents. Autonomous agents are systems that inhabit a dynamic, unpredictable environment in which they try to satisfy a set of time-dependent goals or motivations. Agents are said to be adaptive if they improve their competence at dealing with these goals based on experience. Autonomous agents constitute a new approach to the study of artificial intelligence (AI) which is highly inspired by biology, in particular ethology, the study of animal behavior. Research in autonomous agents has brought about a new wave of excitement into the field of AI. This paper reflects on the state of the art of this new approach.
Purposive behavior acquisition on a real robot by vision-based reinforcement learning
- MACHINE LEARNING
, 1996
"... This paper presents a method of vision-based reinforcement learning by which a robot learns to shoot a ball into a goal. We discuss several issues in applying the reinforcement learning method to a real robot with vision sensor by which the robot can obtain information about the changes in an envi ..."
Abstract
-
Cited by 130 (30 self)
- Add to MetaCart
(Show Context)
This paper presents a method of vision-based reinforcement learning by which a robot learns to shoot a ball into a goal. We discuss several issues in applying the reinforcement learning method to a real robot with vision sensor by which the robot can obtain information about the changes in an environment. First, we construct a state space in terms of size, position, and orientation of a ball and a goal in an image, and an action space is designed in terms of the action commands to be sent to the left and right motors of a mobile robot. This causes a “state-action deviation ” problem in constructing the state and action spaces that reflect the outputs from physical sensors and actuators, respectively. To deal with this issue, an action set is constructed in a way that one action consists of a series of the same action primitive which is successively executed until the current state changes. Next, to speed up the learning time, a mechanism of Learning from Easy Missions (or LEM) is implemented. LEM reduces the learning time from exponential to almost linear order in the size of the state space. The results of computer simulations and real robot experiments are given.
Learning from Observation Using Primitives
, 2004
"... This paper describes the use of task primitives in robot learning from observation. A framework has been developed that uses observed data to initially learn a task and then the agent goes on to increase its performance through repeated task performance (learning from practice). Data that is collect ..."
Abstract
-
Cited by 70 (5 self)
- Add to MetaCart
(Show Context)
This paper describes the use of task primitives in robot learning from observation. A framework has been developed that uses observed data to initially learn a task and then the agent goes on to increase its performance through repeated task performance (learning from practice). Data that is collected while a human performs a task is parsed into small parts of the task called primitives. Modules are created for each primitive that encode the movements required during the performance of the primitive, and when and where the primitives are performed. The feasibility of this method is currently being tested with agents that learn to play a virtual and an actual air hockey game. 1
Temporal Abstraction in Reinforcement Learning
, 2000
"... Decision making usually involves choosing among different courses of action over a broad range of time scales. For instance, a person planning a trip to a distant location makes high-level decisions regarding what means of transportation to use, but also chooses low-level actions, such as the moveme ..."
Abstract
-
Cited by 65 (2 self)
- Add to MetaCart
Decision making usually involves choosing among different courses of action over a broad range of time scales. For instance, a person planning a trip to a distant location makes high-level decisions regarding what means of transportation to use, but also chooses low-level actions, such as the movements for getting into a car. The problem of picking an appropriate time scale for reasoning and learning has been explored in artificial intelligence, control theory and robotics. In this dissertation we develop a framework that allows novel solutions to this problem, in the context of Markov Decision Processes (MDPs) and reinforcement learning. In this dissertation, we present a general framework for prediction, control and learning at multipl...
Vision-Based Reinforcement Learning for Purposive Behavior Acquisition.
- Proc. of the IEEE Int. Conf. on Robotics and Automation
, 1995
"... ..."
(Show Context)
Coordination Of Multiple Behaviors Acquired By A Vision-Based Reinforcement Learning
- In Proc. of IEEE/RSJ/GI International Conference on Intelligent Robots and Systems 1994 (IROS '94
, 1994
"... A method is proposed which accomplishes a whole task consisting of plural subtasks by coordinating multiple behaviors acquired by a vision-based reinforcement learning. First, individual behaviors which achieve the corresponding subtasks are independently acquired by Q-learning, a widely used reinfo ..."
Abstract
-
Cited by 32 (5 self)
- Add to MetaCart
(Show Context)
A method is proposed which accomplishes a whole task consisting of plural subtasks by coordinating multiple behaviors acquired by a vision-based reinforcement learning. First, individual behaviors which achieve the corresponding subtasks are independently acquired by Q-learning, a widely used reinforcement learning method. Each learned behavior can be represented by an action-value function in terms of state of the environment and robot action. Next, three kinds of coordinations of multiple behaviors are considered; simple summation of dierent action-value functions, switching action-value functions according to situations, and learning with previously obtained actionvalue functions as initial values of a new action-value function. A task of shooting a ball into the goal avoiding collisions with an enemy is examined. The task can be decomposed into a ball shooting subtask and a collision avoiding subtask. These subtasks should be accomplished simultaneously, but they are not independe...
Vision-based behavior acquisition for a shooting robot by using a reinforcement learning
- In Proc. of IAPR/IEEE Workshop on Visual Behaviors-1994
, 1994
"... We propose a method which acquires a purposive behavior for a mobile robot to shoot a ball into the goal by using a vision-based reinforcement learning. A mobile robot (an agent) does not need to know any parameters of the 3-D environment or its kinematics/dynamics. Information about the changes of ..."
Abstract
-
Cited by 29 (8 self)
- Add to MetaCart
(Show Context)
We propose a method which acquires a purposive behavior for a mobile robot to shoot a ball into the goal by using a vision-based reinforcement learning. A mobile robot (an agent) does not need to know any parameters of the 3-D environment or its kinematics/dynamics. Information about the changes of the environment is only the image captured from a single TV camera mounted on the robot. An action-value function in terms of state is to be learned. Image positions of a ball and a goal are used as a state variable which shows the effect of an action previously taken. After the learning process, the robot tries to carry a ball near the goal and to shoot it. Both computer simulation and real robot experiments are shown, and discussion on the role of vision in the context of the vision-based reinforcement learning is given. 1
Multitimescale nexting in a reinforcement learning robot
- In Proceedings of the International Conference on Simulation of Adaptive Behaviour
, 2012
"... Abstract. The term “nexting ” has been used by psychologists to refer to the propensity of people and many other animals to continually predict what will happen next in an immediate, local, and personal sense. The ability to “next ” constitutes a basic kind of awareness and knowledge of one’s enviro ..."
Abstract
-
Cited by 28 (17 self)
- Add to MetaCart
(Show Context)
Abstract. The term “nexting ” has been used by psychologists to refer to the propensity of people and many other animals to continually predict what will happen next in an immediate, local, and personal sense. The ability to “next ” constitutes a basic kind of awareness and knowledge of one’s environment. In this paper we present results with a robot that learns to next in real time, predicting thousands of features of the world’s state, including all sensory inputs, at timescales from 0.1 to 8 seconds. This was achieved by treating each state feature as a reward-like target and applying temporal-difference methods to learn a corresponding value function with a discount rate corresponding to the timescale. We show that two thousand predictions, each dependent on six thousand state features, can be learned and updated online at better than 10Hz on a laptop computer, using the standard TD(λ) algorithm with linear function approximation. We show that this approach is efficient enough to be practical, with most of the learning complete within 30 minutes. We
Using Options for Knowledge Transfer in Reinforcement Learning
, 1999
"... One of the original motivations for the use of temporally extended actions, or options, in reinforcement learning was to enable the transfer of learned value functions or policies to new problems. Many experimenters have used options to speed learning on single problems, but options have not bee ..."
Abstract
-
Cited by 23 (0 self)
- Add to MetaCart
(Show Context)
One of the original motivations for the use of temporally extended actions, or options, in reinforcement learning was to enable the transfer of learned value functions or policies to new problems. Many experimenters have used options to speed learning on single problems, but options have not been studied in depth as a tool for transfer. In this paper we introduce a formal model of a learning problem as a distribution of Markov Decision Problems (MDPs). Each MDP represents a task the agent will have to solve. Our model can also be viewed as a partially observable Markov decision problem (POMDP), with a special structure that we describe. We study two learning algorithms, one which keeps a single value function that generalizes across tasks, and an incremental POMDPinspired method maintaining separate value functions for each task. We evaluate the learning algorithms on an extension of the Mountain Car domain, in terms of both learning speed and asymptotic performance. Empi...
CHILD: A First Step Towards Continual Learning
- Machine Learning
, 1997
"... Continual learning is the constant development of increasingly complex behaviors; the process of building more complicated skills on top of those already developed. A continual-learning agent should therefore learn incrementally and hierarchically. This paper describes CHILD, an agent capable of Con ..."
Abstract
-
Cited by 18 (0 self)
- Add to MetaCart
(Show Context)
Continual learning is the constant development of increasingly complex behaviors; the process of building more complicated skills on top of those already developed. A continual-learning agent should therefore learn incrementally and hierarchically. This paper describes CHILD, an agent capable of Continual, Hierarchical, Incremental Learning and Development. CHILD can quickly solve complicated non-Markovian reinforcement-learning tasks and can then transfer its skills to similar but even more complicated tasks, learning these faster still.