Results 1 - 10
of
10
Self-improving reactive agents based on reinforcement learning, planning and teaching
- Machine Learning
, 1992
"... Abstract. To date, reinforcement learning has mostly been studied solving simple learning tasks. Reinforcement learning methods that have been studied so far typically converge slowly. The purpose of this work is thus two-fold: 1) to investigate the utility of reinforcement learning in solving much ..."
Abstract
-
Cited by 256 (2 self)
- Add to MetaCart
Abstract. To date, reinforcement learning has mostly been studied solving simple learning tasks. Reinforcement learning methods that have been studied so far typically converge slowly. The purpose of this work is thus two-fold: 1) to investigate the utility of reinforcement learning in solving much more complicated learning tasks than previously studied, and 2) to investigate methods that will speed up reinforcement learning. This paper compares eight reinforcement learning frameworks: adaptive heuristic critic (AHC) learning due to Sutton, Q-learning due to Watkins, and three extensions to both basic methods for speeding up learning. The three extensions are experience replay, learning action models for planning, and teaching. The frameworks were investigated using connectionism as an approach to generalization. To evaluate the performance of different frame-works, a dynamic environment was used as a testbed. The enviromaaent is moderately complex and nondetermin-istic. This paper describes these frameworks and algorithms in detail and presents empirical evaluation of the frameworks.
Planning Under Time Constraints in Stochastic Domains
- ARTIFICIAL INTELLIGENCE
, 1993
"... We provide a method, based on the theory of Markov decision processes, for efficient planning in stochastic domains. Goals are encoded as reward functions, expressing the desirability of each world state; the planner must find a policy (mapping from states to actions) that maximizes future reward ..."
Abstract
-
Cited by 150 (17 self)
- Add to MetaCart
We provide a method, based on the theory of Markov decision processes, for efficient planning in stochastic domains. Goals are encoded as reward functions, expressing the desirability of each world state; the planner must find a policy (mapping from states to actions) that maximizes future rewards. Standard goals of achievement, as well as goals of maintenance and prioritized combinations of goals, can be specified in this way. An optimal policy can be found using existing methods, but these methods require time at best polynomial in the number of states in the domain, where the number of states is exponential in the number of propositions (or state variables). By using information about the starting state, the reward function, and the transition probabilities of the domain, we restrict the planner's attention to a set of world states that are likely to be encountered in satisfying the goal. Using this restricted set of states, the planner can generate more or less complete ...
Hierarchical Learning in Stochastic Domains: Preliminary Results
- In Proceedings of the Tenth International Conference on Machine Learning
, 1993
"... This paper presents the HDG learning algorithm, which uses a hierarchical decomposition of the state space to make learning to achieve goals more efficient with a small penalty in path quality. Special care must be taken when performing hierarchical planning and learning in stochastic domains, ..."
Abstract
-
Cited by 94 (7 self)
- Add to MetaCart
This paper presents the HDG learning algorithm, which uses a hierarchical decomposition of the state space to make learning to achieve goals more efficient with a small penalty in path quality. Special care must be taken when performing hierarchical planning and learning in stochastic domains, because macro-operators cannot be executed ballistically. The HDG algorithm, which is a descendent of Watkins' Q-learning algorithm, is described here and preliminary empirical results are presented. 1 INTRODUCTION Reinforcement learning is a general tool for deriving strategies that optimize a fixed reinforcement function in a stochastic environment. A crucial problem in reinforcement learning is temporal credit assignment: how to choose actions based on good results that happen after (perhaps long after) the action is taken. This problem is solved well in the general case by temporal difference methods, such as Watkins' Q learning [Barto et al., 1989, Watkins, 1989] and Sutton's TD ...
Learning to Solve Multiple Goals
, 1997
"... In many domains, the task can be decomposed into a set of independent subgoals. Often, such tasks are too complex to be learned using standard techniques such as Reinforcement Learning. The complexity is caused by the learning system having to keep track of the status of all sub-goals concurrently. ..."
Abstract
-
Cited by 21 (0 self)
- Add to MetaCart
In many domains, the task can be decomposed into a set of independent subgoals. Often, such tasks are too complex to be learned using standard techniques such as Reinforcement Learning. The complexity is caused by the learning system having to keep track of the status of all sub-goals concurrently. Thus, if the solution to one sub-goal is known when another sub-goal is in some given state, the known solution must be relearned when the status of the other sub-goal changes. This dissertation presents a modular approach to reinforcement learning that takes advantage of task decomposition to avoid unnecessary relearning. In the modular approach, modules are created to learn each sub-goal. Each module receives only those inputs relevant to its associated sub-goal, and can therefore learn without being affected by the state of other sub-goals. Furthermore, each module searches a much smaller space than that defined by all inputs considered together, thereby greatly reducing learning time. Si...
Automated Learning of Load-Balancing Strategies For A Distributed Computer System
, 1992
"... (or derived) decision metrics are exemplified by MinLoad, which denotes the least among all the Load values. ###################################################################################### SENDER-SIDE RULES (s) Possible-destinations = { site: Load(site) - Reference(s) < d(s) } Destination = ..."
Abstract
-
Cited by 17 (4 self)
- Add to MetaCart
(or derived) decision metrics are exemplified by MinLoad, which denotes the least among all the Load values. ###################################################################################### SENDER-SIDE RULES (s) Possible-destinations = { site: Load(site) - Reference(s) < d(s) } Destination = Random(Possible-destinations) IF Load(s) - Reference(s) > q 1 (s) THEN Send RECEIVER-SIDE RULES (r) IF Load(r) < q 2 (r) THEN Receive Figure 3. The load-balancing policy considered in this thesis The sender-side rules are applied by the load-balancing software at the site of arrival (s) of a task. Reference can be either 0 or MinLoad; the other parameters --- d, q 1 , and q 2 --- take non-negative floating-point values. A remote destination (r) is chosen randomly from Destinations, a set of sites whose load index falls within a small neighborhood of Reference. If Destinations is the empty set, or if the rule for sending fails, then the task is executed locally at s, its site of arrival; ot...
Off-line Model-free and On-line Model-based Evolution for Tracking Navigation using Evolvable Hardware
, 1998
"... Recently there has been great interest in the idea that evolvable systems based on the principles of Artificial Life can be used to continuously and autonomously adapt the behavior of physically embedded systems such as mobile robots, plants and intelligent home devices. At the same time, we have se ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
Recently there has been great interest in the idea that evolvable systems based on the principles of Artificial Life can be used to continuously and autonomously adapt the behavior of physically embedded systems such as mobile robots, plants and intelligent home devices. At the same time, we have seen the introduction of evolvable hardware (EHW): new integrated circuits that are able to adapt their hardware autonomously and almost continuously to changes in the environment [11]. This paper describes how a navigation system for a physical mobile robot can be evolved using a Boolean function approach implemented on evolvable hardware. The task of the mobile robot is to track a moving target represented by a colored ball, while avoiding obstacles during its motion. Our results show that a dynamic Boolean function approach is sufficient to produce this navigation behavior. Although the classical model-free evolution method is often infeasible in the real world due to the number of possib...
Evolvable Reactive Execution System using Reconfigurable Hardware: a Robot Navigation System Case Study
"... Recently there has been great interest in the design and study of evolvable systems in order to control the behavior of physically embedded systems. Due to the complexity of their architecture and their interaction with the environment, a Model-based Autonomous System approach was proposed by Willia ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Recently there has been great interest in the design and study of evolvable systems in order to control the behavior of physically embedded systems. Due to the complexity of their architecture and their interaction with the environment, a Model-based Autonomous System approach was proposed by Williams to integrate a priori knowledge and reasoning methods of different kinds (Williams & Nayak 1996a). But the difficulty of precomputing all possible interactions obliges also the autonomous system to self configure itself by modifying its own structure as well as self modeling by adapting or even building a model using for example sensor information. This paper examines the architecture of a self configuration component of a model-based autonomous system using a evolvable hardware (EHW). The self configure component is applied to a navigation system for a mobile robot. It uses a Boolean function approach implemented on gate-level reconfigurable hardware. The task of the mobile robot is to r...
Off-line Evolution for a Robot Navigation System based on a Gate-Level Evolvable Hardware
, 1997
"... Recently there has been a great interest in the design and study of evolvable systems based on Artificial Life principles in order to control the behavior of physically embedded systems such as a mobile robot. This paper studies an evolutionary navigation system for a mobile robot using a Boolean fu ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Recently there has been a great interest in the design and study of evolvable systems based on Artificial Life principles in order to control the behavior of physically embedded systems such as a mobile robot. This paper studies an evolutionary navigation system for a mobile robot using a Boolean function approach implemented on gatelevel evolvable hardware (EHW). The task of the mobile robot is to reach a goal represented by a colored light while avoiding obstacles during its motion. Using the evolution principles to build the desired behaviors, we show that the Boolean function approach using gate-level evolvable hardware is sufficient. We demonstrate the effectiveness of the generalization ability of EHW by generating off-line the robot behavior. The results show that the evolvable hardware system is able to obtain the desired behaviors and to generate a robust robot behavior insensitive to the gap between the real and simulated world. 1 Introduction Robotics has until recently de...
Learning to Achieve Goals
- In Proc. of IJCAI-93
, 1993
"... Temporal difference methods solve the temporal credit assignment problem for reinforcement learning. An important subproblem of general reinforcement learning is learning to achieve dynamic goals. Although existing temporal difference methods, such as Q learning, can be applied to this problem, they ..."
Abstract
- Add to MetaCart
Temporal difference methods solve the temporal credit assignment problem for reinforcement learning. An important subproblem of general reinforcement learning is learning to achieve dynamic goals. Although existing temporal difference methods, such as Q learning, can be applied to this problem, they do not take advantage of its special structure. This paper presents the DG-learning algorithm, which learns efficiently to achieve dynamically changing goals and exhibits good knowledge transfer between goals. In addition, this paper shows how traditional relaxation techniques can be applied to the problem. Finally, experimental results are given that demonstrate the superiority of DG learning over Q learning in a moderately large, synthetic, non-deterministic domain. 1 Introduction Reinforcement learning is a general tool for deriving strategies that optimize a fixed reinforcement function in a probabilistic environment. A crucial problem in reinforcement learning is temporal credit assi...

