Results 1 - 10
of
28
Dyna, an Integrated Architecture for Learning, Planning, and Reacting
- WORKING NOTES OF THE 1991 AAAI SPRING SYMPOSIUM
, 1991
"... Dyna is an AI architecture that integrates learning, planning, and reactive execution. Learning methods are used in Dyna both for compiling planning results and for updating a model of the effects of the agent's actions on the world. Planning is incremental and can use the probabilistic and ofttimes ..."
Abstract
-
Cited by 427 (13 self)
- Add to MetaCart
Dyna is an AI architecture that integrates learning, planning, and reactive execution. Learning methods are used in Dyna both for compiling planning results and for updating a model of the effects of the agent's actions on the world. Planning is incremental and can use the probabilistic and ofttimes incorrect world models generated by learning processes. Execution is fully reactive in the sense that no planning intervenes between perception and action. Dyna relies on machine learning methods for learning from examples -- these are among the basic building blocks making up the architecture -- yet is not tied to any particular method. This paper briefly introduces Dyna and discusses its strengths and weaknesses with respect to other architectures.
Self-improving reactive agents based on reinforcement learning, planning and teaching
- Machine Learning
, 1992
"... Abstract. To date, reinforcement learning has mostly been studied solving simple learning tasks. Reinforcement learning methods that have been studied so far typically converge slowly. The purpose of this work is thus two-fold: 1) to investigate the utility of reinforcement learning in solving much ..."
Abstract
-
Cited by 256 (2 self)
- Add to MetaCart
Abstract. To date, reinforcement learning has mostly been studied solving simple learning tasks. Reinforcement learning methods that have been studied so far typically converge slowly. The purpose of this work is thus two-fold: 1) to investigate the utility of reinforcement learning in solving much more complicated learning tasks than previously studied, and 2) to investigate methods that will speed up reinforcement learning. This paper compares eight reinforcement learning frameworks: adaptive heuristic critic (AHC) learning due to Sutton, Q-learning due to Watkins, and three extensions to both basic methods for speeding up learning. The three extensions are experience replay, learning action models for planning, and teaching. The frameworks were investigated using connectionism as an approach to generalization. To evaluate the performance of different frame-works, a dynamic environment was used as a testbed. The enviromaaent is moderately complex and nondetermin-istic. This paper describes these frameworks and algorithms in detail and presents empirical evaluation of the frameworks.
Reinforcement Learning in the Multi-Robot Domain
- Autonomous Robots
, 1997
"... This paper describes a formulation of reinforcement learning that enables learning in noisy, dynamic environemnts such as in the complex concurrent multi-robot learning domain. The methodology involves minimizing the learning space through the use behaviors and conditions, and dealing with the credi ..."
Abstract
-
Cited by 121 (19 self)
- Add to MetaCart
This paper describes a formulation of reinforcement learning that enables learning in noisy, dynamic environemnts such as in the complex concurrent multi-robot learning domain. The methodology involves minimizing the learning space through the use behaviors and conditions, and dealing with the credit assignment problem through shaped reinforcement in the form of heterogeneous reinforcement functions and progress estimators. We experimentally validate the approach on a group of four mobile robots learning a foraging task. 1 Introduction Developing effective methods for real-time learning has been an on-going challenge in autonomous agent research and is being explored in the mobile robot domain. In the last decade, reinforcement learning (RL), a class of approaches in which the agent learns based on reward and punishment it receives from the environment, has become the methodology of choice for learning in a variety of domains, including robotics. In this paper we describe a formulat...
Input generalization in delayed reinforcement learning: An algorithm and performance comparisons
, 1991
"... Delayed reinforcement learning is an attractive framework for the unsupervised learning of action policies for autonomous agents. Some existing delayed reinforcement learning techniques have shown promise in simple domains. However, a number of hurdles must be passed before they are applicable to re ..."
Abstract
-
Cited by 117 (3 self)
- Add to MetaCart
Delayed reinforcement learning is an attractive framework for the unsupervised learning of action policies for autonomous agents. Some existing delayed reinforcement learning techniques have shown promise in simple domains. However, a number of hurdles must be passed before they are applicable to realistic problems. This paper describes one such difficulty, the input generalization problem (whereby the system must generalize to produce similar actions in similar situations) and an implemented solution, the G algorithm. This algorithm is based on recursive splitting of the state space based on statistical measures of differences in reinforcements received. Connectionist backpropagation has previously been used for input generalization in reinforcement learning. We compare the two techniques analytically and empirically. The G algorithm's sound statistical basis makes it easy to predict when it should and should not work, whereas the behavior of backpropagation is unpredictable. We found that a previous successful use of backpropagation can be explained by the linearity of the application domain. We found that in another domain, G reliably found the optimal policy, whereas none of a set of runs of backpropagation with many combinations of parameters did. 1
Efficient Exploration In Reinforcement Learning
, 1992
"... Exploration plays a fundamental role in any active learning system. This study evaluates the role of exploration in active learning and describes several local techniques for exploration in finite, discrete domains, embedded in a reinforcement learning framework (delayed reinforcement). This paper d ..."
Abstract
-
Cited by 115 (4 self)
- Add to MetaCart
Exploration plays a fundamental role in any active learning system. This study evaluates the role of exploration in active learning and describes several local techniques for exploration in finite, discrete domains, embedded in a reinforcement learning framework (delayed reinforcement). This paper distinguishes between two families of exploration schemes: undirected and directed exploration. While the former family is closely related to random walk exploration, directed exploration techniques memorize exploration-specific knowledge which is used for guiding the exploration search. In many finite deterministic domains, any learning technique based on undirected exploration is inefficient in terms of learning time, i.e. learning time is expected to scale exponentially with the size of the state space (Whitehead, 1991b) . We prove that for all these domains, reinforcement learning using a directed technique can always be performed in polynomial time, demonstrating the important role of e...
Planning by Incremental Dynamic Programming
- In Proceedings of the Eighth International Workshop on Machine Learning
, 1991
"... This paper presents the basic results and ideas of dynamic programming as they relate most directly to the concerns of planning in AI. These form the theoretical basis for the incremental planning methods used in the integrated architecture Dyna. These incremental planning methods are based on conti ..."
Abstract
-
Cited by 53 (2 self)
- Add to MetaCart
This paper presents the basic results and ideas of dynamic programming as they relate most directly to the concerns of planning in AI. These form the theoretical basis for the incremental planning methods used in the integrated architecture Dyna. These incremental planning methods are based on continually updating an evaluation function and the situation-action mapping of a reactive system. Actions are generated by the reactive system and thus involve minimal delay, while the incremental planning process guarantees that the actions and evaluation function will eventually be optimal -- no matter how extensive a search is required. These methods are well suited to stochastic tasks and to tasks in which a complete and accurate model is not available. For tasks too large to implement the situation-action mapping as a table, supervised-learning methods must be used, and their capabilities remain a significant limitation of the approach.
Multi-Agent Simulation as a Tool for Modeling Societies: Application to Social Differentiation
- in Ant Colonies. 4th European Workshop on Modelling Autonomous Agents in a Multi-Agent World, Artificial Social Systems
"... Abstract. This paper presents the notion of multi-agent simulation that is based on the definition of computational agents that represent individual organisms (or groups of organisms) in a one to one correspondence. We discuss the properties of multi-agent simulation. We then present a multiagent si ..."
Abstract
-
Cited by 39 (3 self)
- Add to MetaCart
Abstract. This paper presents the notion of multi-agent simulation that is based on the definition of computational agents that represent individual organisms (or groups of organisms) in a one to one correspondence. We discuss the properties of multi-agent simulation. We then present a multiagent simulation system based on the definition of reactive agents whose behavior is governed by the selection of simple competing tasks due to stimulus's perception. An example of a simulation of an ant colony follows as an illustration of the multiple domains in which multi-agent simulation may be used. 1.
Reinforcement learning is direct adaptive optimal control
- In Proceedings of the American Control Conference
, 1991
"... optimal controls are estimated directly more attractive. We view reinforcement learning methods as a computationally simple, direct approach to the adaptive optimal control of nonlinear systems. For concreteness, we focus on one reinforcement learning method (Q-learning) and on its analytically prov ..."
Abstract
-
Cited by 39 (4 self)
- Add to MetaCart
optimal controls are estimated directly more attractive. We view reinforcement learning methods as a computationally simple, direct approach to the adaptive optimal control of nonlinear systems. For concreteness, we focus on one reinforcement learning method (Q-learning) and on its analytically proven capabilities for one class of adaptive optimal control problems (markov decision problems with unknown transition probabilities).
From SAB90 to SAB94 : Four Years of Animat Research
, 1994
"... This paper builds on a previous review of significant research on adaptive behavior in animats. It summarizes the current state of the art and suggests some directions likely to provide interesting results in the near future. 1 Introduction An animat is a simulated animal or a real robot whose rule ..."
Abstract
-
Cited by 33 (8 self)
- Add to MetaCart
This paper builds on a previous review of significant research on adaptive behavior in animats. It summarizes the current state of the art and suggests some directions likely to provide interesting results in the near future. 1 Introduction An animat is a simulated animal or a real robot whose rules of behavior are inspired by those of animals. It is usually equipped with sensors, with actuators, and with a behavioral control architecture that allow it to react or to respond to variations in its environment (internal or external), notably to those that might impair its chances of survival. The behavior of an animat is what the animat does. This is characterized by a sequence of actions which reflects the dynamic interplay between the animat and its environment, mediated through the animat's sensors and actuators. The behavior of an animat is adaptive so long as it allows the animat to survive or to fulfill its mission. This requires that the animat's essential variables be monitored a...

