Results 1  10
of
519
Reinforcement learning: a survey
 Journal of Artificial Intelligence Research
, 1996
"... This paper surveys the field of reinforcement learning from a computerscience perspective. It is written to be accessible to researchers familiar with machine learning. Both the historical basis of the field and a broad selection of current work are summarized. Reinforcement learning is the problem ..."
Abstract

Cited by 1324 (23 self)
 Add to MetaCart
This paper surveys the field of reinforcement learning from a computerscience perspective. It is written to be accessible to researchers familiar with machine learning. Both the historical basis of the field and a broad selection of current work are summarized. Reinforcement learning is the problem faced by an agent that learns behavior through trialanderror interactions with a dynamic environment. The work described here has a resemblance to work in psychology, but differs considerably in the details and in the use of the word "reinforcement." The paper discusses central issues of reinforcement learning, including trading off exploration and exploitation, establishing the foundations of the field via Markov decision theory, learning from delayed reinforcement, constructing empirical models to accelerate learning, making use of generalization and hierarchy, and coping with hidden state. It concludes with a survey of some implemented systems and an assessment of the practical utility of current methods for reinforcement learning.
The dynamics of reinforcement learning in cooperative multiagent systems
 IN PROCEEDINGS OF NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI98
, 1998
"... Reinforcement learning can provide a robust and natural means for agents to learn how to coordinate their action choices in multiagent systems. We examine some of the factors that can influence the dynamics of the learning process in such a setting. We first distinguish reinforcement learners that a ..."
Abstract

Cited by 303 (1 self)
 Add to MetaCart
Reinforcement learning can provide a robust and natural means for agents to learn how to coordinate their action choices in multiagent systems. We examine some of the factors that can influence the dynamics of the learning process in such a setting. We first distinguish reinforcement learners that are unaware of (or ignore) the presence of other agents from those that explicitly attempt to learn the value of joint actions and the strategies of their counterparts. We study (a simple form of) Qlearning in cooperative multiagent systems under these two perspectives, focusing on the influence of that game structure and exploration strategies on convergence to (optimal and suboptimal) Nash equilibria. We then propose alternative optimistic exploration strategies that increase the likelihood of convergence to an optimal equilibrium.
Cooperative mobile robotics: Antecedents and directions
, 1995
"... There has been increased research interest in systems composed of multiple autonomous mobile robots exhibiting collective behavior. Groups of mobile robots are constructed, with an aim to studying such issues as group architecture, resource conflict, origin of cooperation, learning, and geometric pr ..."
Abstract

Cited by 301 (3 self)
 Add to MetaCart
There has been increased research interest in systems composed of multiple autonomous mobile robots exhibiting collective behavior. Groups of mobile robots are constructed, with an aim to studying such issues as group architecture, resource conflict, origin of cooperation, learning, and geometric problems. As yet, few applications of collective robotics have been reported, and supporting theory is still in its formative stages. In this paper, we give a critical survey of existing works and discuss open problems in this field, emphasizing the various theoretical issues that arise in the study of cooperative robotics. We describe the intellectual heritages that have guided early research, as well as possible additions to the set of existing motivations. 1
Multiagent Systems: A Survey from a Machine Learning Perspective
 AUTONOMOUS ROBOTS
, 1997
"... Distributed Artificial Intelligence (DAI) has existed as a subfield of AI for less than two decades. DAI is ..."
Abstract

Cited by 298 (23 self)
 Add to MetaCart
Distributed Artificial Intelligence (DAI) has existed as a subfield of AI for less than two decades. DAI is
Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm
, 1998
"... In this paper, we adopt generalsum stochastic games as a framework for multiagent reinforcement learning. Our work extends previous work by Littman on zerosum stochastic games to a broader framework. We design a multiagent Qlearning method under this framework, and prove that it converges to a Na ..."
Abstract

Cited by 287 (4 self)
 Add to MetaCart
In this paper, we adopt generalsum stochastic games as a framework for multiagent reinforcement learning. Our work extends previous work by Littman on zerosum stochastic games to a broader framework. We design a multiagent Qlearning method under this framework, and prove that it converges to a Nash equilibrium under specified conditions. This algorithm is useful for finding the optimal strategy when there exists a unique Nash equilibrium in the game. When there exist multiple Nash equilibria in the game, this algorithm should be combined with other learning techniques to find optimal strategies.
RMAX  A General Polynomial Time Algorithm for NearOptimal Reinforcement Learning
, 2001
"... Rmax is a very simple modelbased reinforcement learning algorithm which can attain nearoptimal average reward in polynomial time. In Rmax, the agent always maintains a complete, but possibly inaccurate model of its environment and acts based on the optimal policy derived from this model. The mod ..."
Abstract

Cited by 238 (9 self)
 Add to MetaCart
Rmax is a very simple modelbased reinforcement learning algorithm which can attain nearoptimal average reward in polynomial time. In Rmax, the agent always maintains a complete, but possibly inaccurate model of its environment and acts based on the optimal policy derived from this model. The model is initialized in an optimistic fashion: all actions in all states return the maximal possible reward (hence the name). During execution, it is updated based on the agent's observations. Rmax improves upon several previous algorithms: (1) It is simpler and more general than Kearns and Singh's E algorithm, covering zerosum stochastic games. (2) It has a builtin mechanism for resolving the exploration vs...
Multiagent Learning Using a Variable Learning Rate
 Artificial Intelligence
, 2002
"... Learning to act in a multiagent environment is a difficult problem since the normal definition of an optimal policy no longer applies. The optimal policy at any moment depends on the policies of the other agents and so creates a situation of learning a moving target. Previous learning algorithms hav ..."
Abstract

Cited by 180 (8 self)
 Add to MetaCart
Learning to act in a multiagent environment is a difficult problem since the normal definition of an optimal policy no longer applies. The optimal policy at any moment depends on the policies of the other agents and so creates a situation of learning a moving target. Previous learning algorithms have one of two shortcomings depending on their approach. They either converge to a policy that may not be optimal against the specific opponents' policies, or they may not converge at all. In this article we examine this learning problem in the framework of stochastic games. We look at a number of previous learning algorithms showing how they fail at one of the above criteria. We then contribute a new reinforcement learning technique using a variable learning rate to overcome these shortcomings. Specifically, we introduce the WoLF principle, "Win or Learn Fast", for varying the learning rate. We examine this technique theoretically, proving convergence in selfplay on a restricted class of iterated matrix games. We also present empirical results on a variety of more general stochastic games, in situations of selfplay and otherwise, demonstrating the wide applicability of this method.
Algorithms for Sequential Decision Making
, 1996
"... Sequential decision making is a fundamental task faced by any intelligent agent in an extended interaction with its environment; it is the act of answering the question "What should I do now?" In this thesis, I show how to answer this question when "now" is one of a finite set of ..."
Abstract

Cited by 179 (8 self)
 Add to MetaCart
Sequential decision making is a fundamental task faced by any intelligent agent in an extended interaction with its environment; it is the act of answering the question "What should I do now?" In this thesis, I show how to answer this question when "now" is one of a finite set of states, "do" is one of a finite set of actions, "should" is maximize a longrun measure of reward, and "I" is an automated planning or learning system (agent). In particular,
Recent advances in hierarchical reinforcement learning
, 2003
"... A preliminary unedited version of this paper was incorrectly published as part of Volume ..."
Abstract

Cited by 167 (23 self)
 Add to MetaCart
A preliminary unedited version of this paper was incorrectly published as part of Volume
Sequential optimality and coordination in multiagent systems
 In International Joint Conference on Artificial Intelligence
, 1999
"... Coordination of agent activities is a key problem in multiagent systems. Set in a larger decision theoretic context, the existence of coordination problems leads to difficulty in evaluating the utility of a situation. This in turn makes defining optimal policies for sequential decision processes pro ..."
Abstract

Cited by 144 (3 self)
 Add to MetaCart
Coordination of agent activities is a key problem in multiagent systems. Set in a larger decision theoretic context, the existence of coordination problems leads to difficulty in evaluating the utility of a situation. This in turn makes defining optimal policies for sequential decision processes problematic. We propose a method for solving sequential multiagent decision problems by allowing agents to reason explicitly about specific coordination mechanisms. We define an extension of value iteration in which the systemâ€™s state space is augmented with the state of the coordination mechanism adopted, allowing agents to reason about the short and long term prospects for coordination, the long term consequences of (mis)coordination, and make decisions to engage or avoid coordination problems based on expected value. We also illustrate the benefits of mechanism generalization. 1