Results 1  10
of
41
Integrated architectures for learning, planning, and reacting based on approximating dynamic programming
 Proceedings of the SevenLh International Conference on Machine Learning
, 1990
"... gutton~gte.com Dyna is an AI architecture that integrates learning, planning, and reactive execution. Learning methods are used in Dyna both for compiling planning results and for updating a model of the effects of the agent's actions on the world. Planning is incremental and can use the probabilist ..."
Abstract

Cited by 473 (18 self)
 Add to MetaCart
gutton~gte.com Dyna is an AI architecture that integrates learning, planning, and reactive execution. Learning methods are used in Dyna both for compiling planning results and for updating a model of the effects of the agent's actions on the world. Planning is incremental and can use the probabilistic and ofttimes incorrect world models generated by learning processes. Execution is fully reactive in the sense that no planning intervenes between perception and action. Dyna relies on machine learning methods for learning from examplesthese are among the basic building blocks making up the architectureyet is not tied to any particular method. This paper
Forward models: Supervised learning with a distal teacher
 Cognitive Science
, 1992
"... Internal models of the environment have an important role to play in adaptive systems in general and are of particular importance for the supervised learning paradigm. In this paper we demonstrate that certain classical problems associated with the notion of the \teacher " in supervised learnin ..."
Abstract

Cited by 295 (7 self)
 Add to MetaCart
Internal models of the environment have an important role to play in adaptive systems in general and are of particular importance for the supervised learning paradigm. In this paper we demonstrate that certain classical problems associated with the notion of the \teacher " in supervised learning can be solved by judicious use of learned internal models as components of the adaptive system. In particular, we show how supervised learning algorithms can be utilized in cases in which an unknown dynamical system intervenes between actions and desired outcomes. Our approach applies to any supervised learning algorithm that is capable of learning in multilayer networks.
Learning and Sequential Decision Making
 LEARNING AND COMPUTATIONAL NEUROSCIENCE
, 1989
"... In this report we show how the class of adaptive prediction methods that Sutton called "temporal difference," or TD, methods are related to the theory of squential decision making. TD methods have been used as "adaptive critics" in connectionist learning systems, and have been proposed as models of ..."
Abstract

Cited by 195 (10 self)
 Add to MetaCart
In this report we show how the class of adaptive prediction methods that Sutton called "temporal difference," or TD, methods are related to the theory of squential decision making. TD methods have been used as "adaptive critics" in connectionist learning systems, and have been proposed as models of animal learning in classical conditioning experiments. Here we relate TD methods to decision tasks formulated in terms of a stochastic dynamical system whose behavior unfolds over time under the influence of a decision maker's actions. Strategies are sought for selecting actions so as to maximize a measure of longterm payoff gain. Mathematically, tasks such as this can be formulated as Markovian decision problems, and numerous methods have been proposed for learning how to solve such problems. We show how a TD method can be understood as a novel synthesis of concepts from the theory of stochastic dynamic programming, which comprises the standard method for solving such tasks when a model of the dynamical system is available, and the theory of parameter estimation, which provides the appropriate context for studying learning rules in the form of equations for updating associative strengths in behavioral models, or connection weights in connectionist networks. Because this report is oriented primarily toward the nonengineer interested in animal learning, it presents tutorials on stochastic sequential decision tasks, stochastic dynamic programming, and parameter estimation.
Linear leastsquares algorithms for temporal difference learning
 Machine Learning
, 1996
"... Abstract. We introduce two new temporal difference (TD) algorithms based on the theory of linear leastsquares function approximation. We define an algorithm we call LeastSquares TD (LS TD) for which we prove probabilityone convergence when it is used with a function approximator linear in the adju ..."
Abstract

Cited by 182 (0 self)
 Add to MetaCart
Abstract. We introduce two new temporal difference (TD) algorithms based on the theory of linear leastsquares function approximation. We define an algorithm we call LeastSquares TD (LS TD) for which we prove probabilityone convergence when it is used with a function approximator linear in the adjustable parameters. We then define a recursive version of this algorithm, Recursive LeastSquares TD (RLS TD). Although these new TD algorithms require more computation per timestep than do Sutton's TD(A) algorithms, they are more efficient in a statistical sense because they extract more information from training experiences. We describe a simulation experiment showing the substantial improvement in learning rate achieved by RLS TD in an example Markov prediction problem. To quantify this improvement, we introduce the TD error variance of a Markov chain, arc,, and experimentally conclude that the convergence rate of a TD algorithm depends linearly on ~ro. In addition to converging more rapidly, LS TD and RLS TD do not have control parameters, such as a learning rate parameter, thus eliminating the possibility of achieving poor performance by an unlucky choice of parameters.
Task Decomposition Through Competition in a Modular Connectionist Architecture
 COGNITIVE SCIENCE
, 1990
"... A novel modular connectionist architecture is presented in which the networks composing the architecture compete to learn the training patterns. As a result of the competition, different networks learn different training patterns and, thus, learn to compute different functions. The architecture pe ..."
Abstract

Cited by 181 (5 self)
 Add to MetaCart
A novel modular connectionist architecture is presented in which the networks composing the architecture compete to learn the training patterns. As a result of the competition, different networks learn different training patterns and, thus, learn to compute different functions. The architecture performs task decomposition in the sense that it learns to partition a task into two or more functionally independent vii tasks and allocates distinct networks to learn each task. In addition, the architecture tends to allocate to each task the network whose topology is most appropriate to that task, and tends to allocate the same network to similar tasks and distinct networks to dissimilar tasks. Furthermore, it can be easily modified so as to...
Recent advances in hierarchical reinforcement learning
, 2003
"... A preliminary unedited version of this paper was incorrectly published as part of Volume ..."
Abstract

Cited by 161 (23 self)
 Add to MetaCart
A preliminary unedited version of this paper was incorrectly published as part of Volume
Curious ModelBuilding Control Systems
 In Proc. International Joint Conference on Neural Networks, Singapore
, 1991
"... A controller is a device which receives inputs from a (dynamic) environment and produces outputs that manipulate the environmental state. A modelbuilding control system is a controller with an additional module (the `world model') which is trained to predict future inputs from previous input/action ..."
Abstract

Cited by 107 (26 self)
 Add to MetaCart
A controller is a device which receives inputs from a (dynamic) environment and produces outputs that manipulate the environmental state. A modelbuilding control system is a controller with an additional module (the `world model') which is trained to predict future inputs from previous input/action pairs. The novel curious modelbuilding control system described in this paper is a modelbuilding control system which actively tries to provoke situations for which it learned to expect to learn something about the environment. Such a system has been implemented as a 4network system based on Watkins' Qlearning algorithm which can be used to maximize the expectation of the temporal derivative of the adaptive assumed reliability of future predictions. An experiment with an artificial nondeterministic environment demonstrates that the system can be superior to previous modelbuilding control systems (the latter do not address the problem of modelling the reliability of the world model's p...
Memory Approaches To Reinforcement Learning In NonMarkovian Domains
, 1992
"... Reinforcement learning is a type of unsupervised learning for sequential decision making. Qlearning is probably the bestunderstood reinforcement learning algorithm. In Qlearning, the agent learns a mapping from states and actions to their utilities. An important assumption of Qlearning is the Ma ..."
Abstract

Cited by 61 (3 self)
 Add to MetaCart
Reinforcement learning is a type of unsupervised learning for sequential decision making. Qlearning is probably the bestunderstood reinforcement learning algorithm. In Qlearning, the agent learns a mapping from states and actions to their utilities. An important assumption of Qlearning is the Markovian environment assumption, meaning that any information needed to determine the optimal actions is reflected in the agent's state representation. Consider an agent whose state representation is based solely on its immediate perceptual sensations. When its sensors are not able to make essential distinctions among world states, the Markov assumption is violated, causing a problem called perceptual aliasing. For example, when facing a closed box, an agent based on its current visual sensation cannot act optimally if the optimal action depends on the contents of the box. There are two basic approaches to addressing this problem using more sensors or using history to figure out the curren...
Planning by Incremental Dynamic Programming
 In Proceedings of the Eighth International Workshop on Machine Learning
, 1991
"... This paper presents the basic results and ideas of dynamic programming as they relate most directly to the concerns of planning in AI. These form the theoretical basis for the incremental planning methods used in the integrated architecture Dyna. These incremental planning methods are based on conti ..."
Abstract

Cited by 60 (2 self)
 Add to MetaCart
This paper presents the basic results and ideas of dynamic programming as they relate most directly to the concerns of planning in AI. These form the theoretical basis for the incremental planning methods used in the integrated architecture Dyna. These incremental planning methods are based on continually updating an evaluation function and the situationaction mapping of a reactive system. Actions are generated by the reactive system and thus involve minimal delay, while the incremental planning process guarantees that the actions and evaluation function will eventually be optimal  no matter how extensive a search is required. These methods are well suited to stochastic tasks and to tasks in which a complete and accurate model is not available. For tasks too large to implement the situationaction mapping as a table, supervisedlearning methods must be used, and their capabilities remain a significant limitation of the approach.