Results 1 - 10
of
64
Dyna, an Integrated Architecture for Learning, Planning, and Reacting
- WORKING NOTES OF THE 1991 AAAI SPRING SYMPOSIUM
, 1991
"... Dyna is an AI architecture that integrates learning, planning, and reactive execution. Learning methods are used in Dyna both for compiling planning results and for updating a model of the effects of the agent's actions on the world. Planning is incremental and can use the probabilistic and ofttimes ..."
Abstract
-
Cited by 427 (13 self)
- Add to MetaCart
Dyna is an AI architecture that integrates learning, planning, and reactive execution. Learning methods are used in Dyna both for compiling planning results and for updating a model of the effects of the agent's actions on the world. Planning is incremental and can use the probabilistic and ofttimes incorrect world models generated by learning processes. Execution is fully reactive in the sense that no planning intervenes between perception and action. Dyna relies on machine learning methods for learning from examples -- these are among the basic building blocks making up the architecture -- yet is not tied to any particular method. This paper briefly introduces Dyna and discusses its strengths and weaknesses with respect to other architectures.
Self-improving reactive agents based on reinforcement learning, planning and teaching
- Machine Learning
, 1992
"... Abstract. To date, reinforcement learning has mostly been studied solving simple learning tasks. Reinforcement learning methods that have been studied so far typically converge slowly. The purpose of this work is thus two-fold: 1) to investigate the utility of reinforcement learning in solving much ..."
Abstract
-
Cited by 256 (2 self)
- Add to MetaCart
Abstract. To date, reinforcement learning has mostly been studied solving simple learning tasks. Reinforcement learning methods that have been studied so far typically converge slowly. The purpose of this work is thus two-fold: 1) to investigate the utility of reinforcement learning in solving much more complicated learning tasks than previously studied, and 2) to investigate methods that will speed up reinforcement learning. This paper compares eight reinforcement learning frameworks: adaptive heuristic critic (AHC) learning due to Sutton, Q-learning due to Watkins, and three extensions to both basic methods for speeding up learning. The three extensions are experience replay, learning action models for planning, and teaching. The frameworks were investigated using connectionism as an approach to generalization. To evaluate the performance of different frame-works, a dynamic environment was used as a testbed. The enviromaaent is moderately complex and nondetermin-istic. This paper describes these frameworks and algorithms in detail and presents empirical evaluation of the frameworks.
Efficient Exploration In Reinforcement Learning
, 1992
"... Exploration plays a fundamental role in any active learning system. This study evaluates the role of exploration in active learning and describes several local techniques for exploration in finite, discrete domains, embedded in a reinforcement learning framework (delayed reinforcement). This paper d ..."
Abstract
-
Cited by 115 (4 self)
- Add to MetaCart
Exploration plays a fundamental role in any active learning system. This study evaluates the role of exploration in active learning and describes several local techniques for exploration in finite, discrete domains, embedded in a reinforcement learning framework (delayed reinforcement). This paper distinguishes between two families of exploration schemes: undirected and directed exploration. While the former family is closely related to random walk exploration, directed exploration techniques memorize exploration-specific knowledge which is used for guiding the exploration search. In many finite deterministic domains, any learning technique based on undirected exploration is inefficient in terms of learning time, i.e. learning time is expected to scale exponentially with the size of the state space (Whitehead, 1991b) . We prove that for all these domains, reinforcement learning using a directed technique can always be performed in polynomial time, demonstrating the important role of e...
Control of Selective Perception Using Bayes Nets and Decision Theory
, 1993
"... A selective vision system sequentially collects evidence to support a specified hypothesis about a scene, as long as the additional evidence is worth the effort of obtaining it. Efficiency comes from processing the scene only where necessary, to the level of detail necessary, and with only the neces ..."
Abstract
-
Cited by 87 (1 self)
- Add to MetaCart
A selective vision system sequentially collects evidence to support a specified hypothesis about a scene, as long as the additional evidence is worth the effort of obtaining it. Efficiency comes from processing the scene only where necessary, to the level of detail necessary, and with only the necessary operators. Knowledge representation and sequential decision-making are central issues for selective vision, which takes advantage of prior knowledge of a domain's abstract and geometrical structure and models for the expected performance and cost of visual operators. The TEA-1 selective vision system uses Bayes nets for representation and benefitcost analysis for control of visual and non-visual actions. It is the high-level control for an active vision system, enabling purposive behavior, the use of qualitative vision modules and a pointable multiresolution sensor. TEA-1 demonstrates that Bayes nets and decision theoretic techniques provide a general, re-usable framework for constructi...
Instance-Based Utile Distinctions for Reinforcement Learning with Hidden State
- In Proceedings of the Twelfth International Conference on Machine Learning
, 1995
"... We present Utile Suffix Memory, a reinforcement learning algorithm that uses short-term memory to overcome the state aliasing that results from hidden state. By combining the advantages of previous work in instance-based (or "memorybased ") learning and previous work with statistical tests for separ ..."
Abstract
-
Cited by 84 (1 self)
- Add to MetaCart
We present Utile Suffix Memory, a reinforcement learning algorithm that uses short-term memory to overcome the state aliasing that results from hidden state. By combining the advantages of previous work in instance-based (or "memorybased ") learning and previous work with statistical tests for separating noise from task structure, the method learns quickly, creates only as much memory as needed for the task at hand, and handles noise well. Utile Suffix Memory uses a tree-structured representation, and is related to work on Prediction Suffix Trees [Ron et al., 1994] , Parti-game [Moore, 1993] , G-algorithm [Chapman and Kaelbling, 1991] , and Variable Resolution Dynamic Programming [Moore, 1991] . 1 INTRODUCTION The sensory systems of embedded agents are inherently limited. When a reinforcement learning agent's sensory limitations hide features of the environment from the agent, we say that the agent suffers from hidden state. There are many reasons why important features can be hidden...
Memory Approaches To Reinforcement Learning In Non-Markovian Domains
, 1992
"... Reinforcement learning is a type of unsupervised learning for sequential decision making. Qlearning is probably the best-understood reinforcement learning algorithm. In Q-learning, the agent learns a mapping from states and actions to their utilities. An important assumption of Q-learning is the Ma ..."
Abstract
-
Cited by 59 (3 self)
- Add to MetaCart
Reinforcement learning is a type of unsupervised learning for sequential decision making. Qlearning is probably the best-understood reinforcement learning algorithm. In Q-learning, the agent learns a mapping from states and actions to their utilities. An important assumption of Q-learning is the Markovian environment assumption, meaning that any information needed to determine the optimal actions is reflected in the agent's state representation. Consider an agent whose state representation is based solely on its immediate perceptual sensations. When its sensors are not able to make essential distinctions among world states, the Markov assumption is violated, causing a problem called perceptual aliasing. For example, when facing a closed box, an agent based on its current visual sensation cannot act optimally if the optimal action depends on the contents of the box. There are two basic approaches to addressing this problem--- using more sensors or using history to figure out the curren...
Robot Shaping: Developing Situated Agents through Learning
, 1993
"... Learning plays a vital role in the development of situated agents. In this paper, we explore the use of reinforcement learning to "shape" a robot to perform a predefined target behavior. We connect both simulated and real robots to ALECSYS, a parallel implementation of a learning classifier system w ..."
Abstract
-
Cited by 48 (1 self)
- Add to MetaCart
Learning plays a vital role in the development of situated agents. In this paper, we explore the use of reinforcement learning to "shape" a robot to perform a predefined target behavior. We connect both simulated and real robots to ALECSYS, a parallel implementation of a learning classifier system with an extended genetic algorithm. After classifying different kinds of Animatlike behaviors, we explore the effects on learning of different types of agent's architecture (monolithic, flat and hierarchical) and of training strategies. In particular, hierarchical architecture requires the agent to learn how to coordinate basic learned responses. We show that the best results are achieved when both the agent's architecture and the training strategy match the structure of the behavior pattern to be learned. We report the results of a number of experiments carried out both in simulated and in real environments, and show that the results of simulations carry smoothly to real robots. While most o...
Lifelong Robot Learning
- Robotics and Autonomous Systems
, 1993
"... . Learning provides a useful tool for the automatic design of autonomous robots. Recent research on learning robot control has predominantly focussed on learning single tasks that were studied in isolation. If robots encounter a multitude of control learning tasks over their entire lifetime, however ..."
Abstract
-
Cited by 47 (4 self)
- Add to MetaCart
. Learning provides a useful tool for the automatic design of autonomous robots. Recent research on learning robot control has predominantly focussed on learning single tasks that were studied in isolation. If robots encounter a multitude of control learning tasks over their entire lifetime, however, there is an opportunity to transfer knowledge between them. In order to do so, robots may learn the invariants of the individual tasks and environments. This task-independent knowledge can be employed to bias generalization when learning control, which reduces the need for real-world experimentation. We argue that knowledge transfer is essential if robots are to learn control with moderate learning times in complex scenarios. Two approaches to lifelong robot learning which both capture invariant knowledge about the robot and its environments are presented. Both approaches have been evaluated using a HERO2000 mobile robot. Learning tasks included navigation in unknown indoor environments an...
Reinforcement learning is direct adaptive optimal control
- In Proceedings of the American Control Conference
, 1991
"... optimal controls are estimated directly more attractive. We view reinforcement learning methods as a computationally simple, direct approach to the adaptive optimal control of nonlinear systems. For concreteness, we focus on one reinforcement learning method (Q-learning) and on its analytically prov ..."
Abstract
-
Cited by 39 (4 self)
- Add to MetaCart
optimal controls are estimated directly more attractive. We view reinforcement learning methods as a computationally simple, direct approach to the adaptive optimal control of nonlinear systems. For concreteness, we focus on one reinforcement learning method (Q-learning) and on its analytically proven capabilities for one class of adaptive optimal control problems (markov decision problems with unknown transition probabilities).
The agent-based approach: A new direction for computational models of development
- Developmental Review
, 2001
"... The agent-based approach emphasizes the importance of learning through organism-environment interaction. This approach is part of a recent trend in computational models of learning and development toward studying autonomous organisms that are embedded in virtual or real environments. In this paper w ..."
Abstract
-
Cited by 36 (7 self)
- Add to MetaCart
The agent-based approach emphasizes the importance of learning through organism-environment interaction. This approach is part of a recent trend in computational models of learning and development toward studying autonomous organisms that are embedded in virtual or real environments. In this paper we introduce the concepts of online and offline sampling and highlight the role of online sampling in agent-based models. After comparing the strengths of each approach for modeling particular developmental phenomena and research questions, we describe a recent agent-based model of infant causal perception. We conclude by discussing some of the present limitations of agent-based models and suggesting how these challenges may be addressed. © 2001 Academic Press Computational models of learning and development are playing an increasingly critical role in child development research (Cassidy, 1990;

