Results 11  20
of
1,045
Motivated Reinforcement Learning
, 2001
"... The standard reinforcement learning view of the involvement of neuromodulatory systems in instrumental conditioning includes a rather straightforward conception of motivation as prediction of sum future reward. Competition between actions is based on the motivating characteristics of their consequen ..."
Abstract

Cited by 252 (9 self)
 Add to MetaCart
The standard reinforcement learning view of the involvement of neuromodulatory systems in instrumental conditioning includes a rather straightforward conception of motivation as prediction of sum future reward. Competition between actions is based on the motivating characteristics of their consequent states in this sense. Substantial, careful, experiments reviewed in Dickinson & Balleine, into the neurobiology and psychology of motivation shows that this view is incomplete. In many cases, animals are faced with the choice not between many different actions at a given state, but rather whether a single response is worth executing at all. Evidence suggests that the motivational process underlying this choice has different psychological and neural properties from that underlying action choice. We describe and model these motivational systems, and consider the way they interact.
Generalization in Reinforcement Learning: Safely Approximating the Value Function
 Advances in Neural Information Processing Systems 7
, 1995
"... To appear in: G. Tesauro, D. S. Touretzky and T. K. Leen, eds., Advances in Neural Information Processing Systems 7, MIT Press, Cambridge MA, 1995. A straightforward approach to the curse of dimensionality in reinforcement learning and dynamic programming is to replace the lookup table with a genera ..."
Abstract

Cited by 251 (3 self)
 Add to MetaCart
To appear in: G. Tesauro, D. S. Touretzky and T. K. Leen, eds., Advances in Neural Information Processing Systems 7, MIT Press, Cambridge MA, 1995. A straightforward approach to the curse of dimensionality in reinforcement learning and dynamic programming is to replace the lookup table with a generalizing function approximator such as a neural net. Although this has been successful in the domain of backgammon, there is no guarantee of convergence. In this paper, we show that the combination of dynamic programming and function approximation is not robust, and in even very benign cases, may produce an entirely wrong policy. We then introduce GrowSupport, a new algorithm which is safe from divergence yet can still reap the benefits of successful generalization. 1 INTRODUCTION Reinforcement learningthe problem of getting an agent to learn to act from sparse, delayed rewardshas been advanced by techniques based on dynamic programming (DP). These algorithms compute a value function ...
Residual Algorithms: Reinforcement Learning with Function Approximation
 In Proceedings of the Twelfth International Conference on Machine Learning
, 1995
"... A number of reinforcement learning algorithms have been developed that are guaranteed to converge to the optimal solution when used with lookup tables. It is shown, however, that these algorithms can easily become unstable when implemented directly with a general functionapproximation system, such ..."
Abstract

Cited by 237 (5 self)
 Add to MetaCart
A number of reinforcement learning algorithms have been developed that are guaranteed to converge to the optimal solution when used with lookup tables. It is shown, however, that these algorithms can easily become unstable when implemented directly with a general functionapproximation system, such as a sigmoidal multilayer perceptron, a radialbasisfunction system, a memorybased learning system, or even a linear functionapproximation system. A new class of algorithms, residual gradient algorithms, is proposed, which perform gradient descent on the mean squared Bellman residual, guaranteeing convergence. It is shown, however, that they may learn very slowly in some cases. A larger class of algorithms, residual algorithms, is proposed that has the guaranteed convergence of the residual gradient algorithms, yet can retain the fast learning speed of direct algorithms. In fact, both direct and residual gradient algorithms are shown to be special cases of residual algorithms, and it is s...
Nearoptimal reinforcement learning in polynomial time
 Machine Learning
, 1998
"... We present new algorithms for reinforcement learning, and prove that they have polynomial bounds on the resources required to achieve nearoptimal return in general Markov decision processes. After observing that the number of actions required to approach the optimal return is lower bounded by the m ..."
Abstract

Cited by 237 (3 self)
 Add to MetaCart
We present new algorithms for reinforcement learning, and prove that they have polynomial bounds on the resources required to achieve nearoptimal return in general Markov decision processes. After observing that the number of actions required to approach the optimal return is lower bounded by the mixing time T of the optimal policy (in the undiscounted case) or by the horizon time T (in the discounted case), we then give algorithms requiring a number of actions and total computation time that are only polynomial in T and the number of states, for both the undiscounted and discounted cases. An interesting aspect of our algorithms is their explicit handling of the ExplorationExploitation tradeoff. 1
The partigame algorithm for variable resolution reinforcement learning in multidimensional statespaces
 Machine Learning
, 1995
"... Abstract. Partigame is a new algorithm for learning feasible trajectories to goal regions in high dimensional continuous statespaces. In high dimensions it is essential that learning does not plan uniformly over a statespace. Partigame maintains a decisiontree partitioning of statespace and ap ..."
Abstract

Cited by 224 (8 self)
 Add to MetaCart
Abstract. Partigame is a new algorithm for learning feasible trajectories to goal regions in high dimensional continuous statespaces. In high dimensions it is essential that learning does not plan uniformly over a statespace. Partigame maintains a decisiontree partitioning of statespace and applies techniques from gametheory and computational geometry to e ciently and adaptively concentrate high resolution only on critical areas. The currentversion of the algorithm is designed to nd feasible paths or trajectories to goal regions in high dimensional spaces. Future versions will be designed to nd a solution that optimizes a realvalued criterion. Many simulated problems have been tested, ranging from twodimensional to ninedimensional statespaces, including mazes, path planning, nonlinear dynamics, and planar snake robots in restricted spaces. In all cases, a good solution is found in less than ten trials and a few minutes.
Convergence of Stochastic Iterative Dynamic Programming Algorithms
 Neural Computation
, 1994
"... Increasing attention has recently been paid to algorithms based on dynamic programming (DP) due to the suitability of DP for learning problems involving control. In stochastic environments where the system being controlled is only incompletely known, however, a unifying theoretical account of th ..."
Abstract

Cited by 207 (8 self)
 Add to MetaCart
Increasing attention has recently been paid to algorithms based on dynamic programming (DP) due to the suitability of DP for learning problems involving control. In stochastic environments where the system being controlled is only incompletely known, however, a unifying theoretical account of the behavior of these methods has been missing. In this paper we relate DPbased learning algorithms to powerful techniques of stochastic approximation via a new convergence theorem, enabling us to establish a class of convergent algorithms to which both TD() and Qlearning belong. 1
Learning and Sequential Decision Making
 LEARNING AND COMPUTATIONAL NEUROSCIENCE
, 1989
"... In this report we show how the class of adaptive prediction methods that Sutton called "temporal difference," or TD, methods are related to the theory of squential decision making. TD methods have been used as "adaptive critics" in connectionist learning systems, and have been proposed as models of ..."
Abstract

Cited by 195 (10 self)
 Add to MetaCart
In this report we show how the class of adaptive prediction methods that Sutton called "temporal difference," or TD, methods are related to the theory of squential decision making. TD methods have been used as "adaptive critics" in connectionist learning systems, and have been proposed as models of animal learning in classical conditioning experiments. Here we relate TD methods to decision tasks formulated in terms of a stochastic dynamical system whose behavior unfolds over time under the influence of a decision maker's actions. Strategies are sought for selecting actions so as to maximize a measure of longterm payoff gain. Mathematically, tasks such as this can be formulated as Markovian decision problems, and numerous methods have been proposed for learning how to solve such problems. We show how a TD method can be understood as a novel synthesis of concepts from the theory of stochastic dynamic programming, which comprises the standard method for solving such tasks when a model of the dynamical system is available, and the theory of parameter estimation, which provides the appropriate context for studying learning rules in the form of equations for updating associative strengths in behavioral models, or connection weights in connectionist networks. Because this report is oriented primarily toward the nonengineer interested in animal learning, it presents tutorials on stochastic sequential decision tasks, stochastic dynamic programming, and parameter estimation.
Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach
 In Proceedings of the Tenth National Conference on Artificial Intelligence
, 1992
"... It is known that Perceptual Aliasing may significantly diminish the effectiveness of reinforcement learning algorithms [ Whitehead and Ballard, 1991 ] . Perceptual aliasing occurs when multiple situations that are indistinguishable from immediate perceptual input require different responses from the ..."
Abstract

Cited by 193 (0 self)
 Add to MetaCart
It is known that Perceptual Aliasing may significantly diminish the effectiveness of reinforcement learning algorithms [ Whitehead and Ballard, 1991 ] . Perceptual aliasing occurs when multiple situations that are indistinguishable from immediate perceptual input require different responses from the system. For example, if a robot can only see forward, yet the presence of a battery charger behind it determines whether or not it should backup, immediate perception alone is insufficient for determining the most appropriate action. It is problematic since reinforcement algorithms typically learn a control policy from immediate perceptual input to the optimal choice of action. This paper introduces the predictive distinctions approach to compensate for perceptual aliasing caused from incomplete perception of the world. An additional component, a predictive model, is utilized to track aspects of the world that may not be visible at all times. In addition to the control policy, the model mus...
Modeling Adaptive Autonomous Agents
 Artificial Life
, 1994
"... One category of researchers in artificial life is concerned with modeling and building socalled adaptive autonomous agents. Autonomous agents are systems that inhabit a dynamic, unpredictable environment in which they try to satisfy a set of timedependent goals or motivations. Agents are said to b ..."
Abstract

Cited by 192 (1 self)
 Add to MetaCart
One category of researchers in artificial life is concerned with modeling and building socalled adaptive autonomous agents. Autonomous agents are systems that inhabit a dynamic, unpredictable environment in which they try to satisfy a set of timedependent goals or motivations. Agents are said to be adaptive if they improve their competence at dealing with these goals based on experience. Autonomous agents constitute a new approach to the study of artificial intelligence (AI) which is highly inspired by biology, in particular ethology, the study of animal behavior. Research in autonomous agents has brought about a new wave of excitement into the field of AI. This paper reflects on the state of the art of this new approach.
Packet Routing in Dynamically Changing Networks: A Reinforcement Learning Approach
 Advances in Neural Information Processing Systems 6
, 1994
"... This paper describes the Qrouting algorithm for packet routing, in which a reinforcement learning module is embedded into each node of a switching network. Only local communication is used by each node to keep accurate statistics on which routing decisions lead to minimal delivery times. In simple ..."
Abstract

Cited by 182 (2 self)
 Add to MetaCart
This paper describes the Qrouting algorithm for packet routing, in which a reinforcement learning module is embedded into each node of a switching network. Only local communication is used by each node to keep accurate statistics on which routing decisions lead to minimal delivery times. In simple experiments involving a 36node, irregularly connected network, Qrouting proves superior to a nonadaptive algorithm based on precomputed shortest paths and is able to route efficiently even when critical aspects of the simulation, such as the network load, are allowed to vary dynamically. The paper concludes with a discussion of the tradeoff between discovering shortcuts and maintaining stable policies. 1 INTRODUCTION The field of reinforcement learning has grown dramatically over the past several years, but with the exception of backgammon [8, 2], has had few successful applications to largescale, practical tasks. This paper demonstrates that the practical task of routing packets through...