## The dynamics of reinforcement learning in cooperative multiagent systems (1998)

### Cached

### Download Links

Venue: | IN PROCEEDINGS OF NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-98 |

Citations: | 303 - 1 self |

### BibTeX

@INPROCEEDINGS{Claus98thedynamics,

author = {Caroline Claus and Craig Boutilier},

title = {The dynamics of reinforcement learning in cooperative multiagent systems},

booktitle = {IN PROCEEDINGS OF NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-98},

year = {1998},

pages = {746--752},

publisher = {}

}

### Years of Citing Articles

### OpenURL

### Abstract

Reinforcement learning can provide a robust and natural means for agents to learn how to coordinate their action choices in multiagent systems. We examine some of the factors that can influence the dynamics of the learning process in such a setting. We first distinguish reinforcement learners that are unaware of (or ignore) the presence of other agents from those that explicitly attempt to learn the value of joint actions and the strategies of their counterparts. We study (a simple form of) Q-learning in cooperative multiagent systems under these two perspectives, focusing on the influence of that game structure and exploration strategies on convergence to (optimal and suboptimal) Nash equilibria. We then propose alternative optimistic exploration strategies that increase the likelihood of convergence to an optimal equilibrium.

### Citations

1302 | Reinforcement Learning: A Survey
- Kaelbling, Littman, et al.
- 1996
(Show Context)
Citation Context ...ous joint actions. In such a case, reinforcement learning can be used by the agents to estimate, based on past experience, the expected reward associated with individual or joint actions. We refer to =-=[8]-=- for a survey of RL techniques. A simple, well-understood algorithm for single agent learning is Q-learning [21]. The formulation of Q-learning for general sequential decision processes is more sophis... |

727 |
The Evolution of Cooperation. Basic
- Axelrod
- 1984
(Show Context)
Citation Context ... through repeated play of the game with the same agents [5, 6, 10, 13]. (Repeated play with a random selection of similar agents from a large population has also been the object of considerable study =-=[1, 18, 11, 24]-=-.) One especially simple, yet often effective, learning model for achieving coordination is fictitious play [4, 5]. Each agent i keeps a count C j a j , for each j 2 ff and a j 2 A j , of the number o... |

601 |
Game Theory: Analysis of Conflict
- Myerson
- 1991
(Show Context)
Citation Context ...drawn from the same distribution, reflecting the utility assessment of all agents. The agents wish to choose actions that maximize (expected) reward. We adopt some standard game theoretic terminology =-=[13]-=-. A randomized strategy for agent i is a distributions2 \Delta(A i ) (where \Delta(A i ) is the set of distributions over the agent's action set A i ). Intuitively, (a i ) denotes the probability of a... |

561 |
A stochastic approximation method
- ROBBINS, MONRO
- 1951
(Show Context)
Citation Context ... forming action a in state s, and incorporates consideration of the values of possible states s 0 to which action a leads. This learning method is, in fact, a basic stochastic approximation technique =-=[14]-=-. We use (perhaps, misuse) the Q notation and terminology to emphasize the connection with action selection. actions of other agents. The contrast between ILs and JALs can be illustrated in our exampl... |

498 | Markov games as a framework for multi-agent reinforcement learning
- Littman
- 1994
(Show Context)
Citation Context ...o the problem of coordination in multiagent systems (MASs) has become increasingly popular in AI and game theory. The use of reinforcement learning (RL), in particular, has attracted recent attention =-=[22, 20, 16, 11, 7, 15]-=-. As noted in [16], using RL as a means of achieving coordinated behavior is attractive because of its generality and robustness. Standard techniques for RL, for example, Q-learning [21], have been ap... |

416 |
Convention: A Philosophical Study
- Lewis
- 1969
(Show Context)
Citation Context ...7] can be addressed in several ways. For instance, communication between agents might be admitted [22, 23] or one could impose conventions or rules that restrict behavior so as to ensure coordination =-=[12, 19]-=-. Here we entertain the suggestion that coordinated action choice might be learned through repeated play of the game with the same agents [5, 6, 10, 13]. (Repeated play with a random selection of simi... |

402 |
A General Theory of Equilibrium Selection in Games
- J, Selten
- 1988
(Show Context)
Citation Context ...its actions. If they choose them randomly, or in some way reflecting personal biases, then they risk choosing a suboptimal, or uncoordinatedsjoint action. The general problem of equilibrium selections=-=[14, 7]-=- can be addressed in several ways. For instance, communication between agents might be admitted [22, 23] or one could impose conventions or rules that restrict behavior so as to ensure coordination [1... |

294 | The Evolution of Conventions
- Young
- 1993
(Show Context)
Citation Context ...ts. Joint action learners (JALs), in contrast, learn the value of their own actions in conjunction with those of other agents via integration of RL with equilibrium (or coordination) learning methods =-=[24, 5, 6, 9]-=-. We then briefly consider the importance of exploitive exploration strategies and examine, through a series of examples, how game structure and exploration strategies influence the dynamics of the le... |

248 | Multi-agent reinforcement learning: Independent vs. cooperative agents
- Tan
- 1993
(Show Context)
Citation Context ...o the problem of coordination in multiagent systems (MASs) has become increasingly popular in AI and game theory. The use of reinforcement learning (RL), in particular, has attracted recent attention =-=[22, 20, 16, 11, 7, 15]-=-. As noted in [16], using RL as a means of achieving coordinated behavior is attractive because of its generality and robustness. Standard techniques for RL, for example, Q-learning [21], have been ap... |

215 | Rational Learning leads to Nash Equilibrium
- Kalain, Lehrer
- 1993
(Show Context)
Citation Context ...ts. Joint action learners (JALs), in contrast, learn the value of their own actions in conjunction with those of other agents via integration of RL with equilibrium (or coordination) learning methods =-=[24, 5, 6, 9]-=-. We then briefly consider the importance of exploitive exploration strategies and examine, through a series of examples, how game structure and exploration strategies influence the dynamics of the le... |

172 |
On the synthesis of useful social laws for artificial agent societies
- Shoham, Tennenholtz
- 1992
(Show Context)
Citation Context ...n [13] can be addressed in several ways. For instance, communication between agents might be admitted [22] or one could impose conventions or rules that restrict behavior so as to ensure coordination =-=[18]-=-. Here we entertain the suggestion that coordinated action choice might be learned through repeated play of the game with the same agents [5, 6, 9, 11]. (Repeated play with a random selection of simil... |

153 | Asynchronous stochastic approximation and Q-learning
- Tsitsiklis
- 1994
(Show Context)
Citation Context ...eplaces the current estimate. If ff is decreased "slowly" during learning and all actions are sampled infinitely, Q-learning will converge to true Q-values for all actions in the single agen=-=t setting [21, 20]-=-. Convergence of Q-learning does not depend on the exploration strategy used. An agent can try its actions at any time---there is no requirement to perform actions that are currently estimated to be b... |

148 | Learning to coordinate without sharing information
- Sen, Sekaran, et al.
- 1994
(Show Context)
Citation Context ...o the problem of coordination in multiagent systems (MASs) has become increasingly popular in AI and game theory. The use of reinforcement learning (RL), in particular, has attracted recent attention =-=[22, 20, 16, 11, 7, 15]-=-. As noted in [16], using RL as a means of achieving coordinated behavior is attractive because of its generality and robustness. Standard techniques for RL, for example, Q-learning [21], have been ap... |

146 |
Iterative solution of games by fictitious play
- Brown
- 1951
(Show Context)
Citation Context ...en the object of considerable study [17, 10, 24].) we will see that interesting issues emerge. One especially simple, yet often effective, learning model for achieving coordination is fictitious play =-=[3, 5]-=-. Each agent i keeps a count C j a j , for each j 2 ff and a j 2 A j , of the number of times agent j has used action a j in the past. When the game is encountered, i treats the relative frequencies o... |

111 | Convergence results for single-step on-policy reinforecement-learning algorithms
- Singh, Jaakkola, et al.
- 2000
(Show Context)
Citation Context ...ability e Q(a)=T P a 0 e Q(a 0 )=T (2) The temperature parameter T can be decreased over time so that the exploitation probability increases (and can be done in such a way that convergence is assured =-=[19]-=-). The existence of multiple agents, each simultaneously learning, is a potential impediment to the successful employment of Q-learning (or RL generally) in multiagent settings. When agent i is learni... |

101 |
Fictitious play property for games with identical interests
- Monderer, Shapley
- 1996
(Show Context)
Citation Context ...an also be weighted to reflect priors). This simple adaptive strategy will converge to an equilibrium in our simple cooperative games assuming that agents randomize when multiple best responses exist =-=[12]-=-, and can be made to converge to an optimal equilibrium if appropriate mechanisms are adopted [1]; that is, the probability of coordinated equilibrium after k interactions can be made arbitrarily high... |

96 | Planning, learning and coordination in multiagent decision processes
- Boutilier
- 1996
(Show Context)
Citation Context ...of primary interest, though we will discuss this issue in Sections 5 and 6). 1 We can view the prob1 Many of our conclusions hold mutatis mutandis for sequential, multiagent Markov decision processes =-=[2]-=- with multiple states; but lem at hand, then, as a distributed bandit problem. More formally, we assume a collection ff of n (heterogeneous) agents, each agent i 2 ff having available to it a finite s... |

79 | An adaptive communication protocol for cooperating mobile robots
- Yanco, Stein
- 1993
(Show Context)
Citation Context ...o the problem of coordination in multiagent systems (MASs) has become increasingly popular in AI and game theory. The use of reinforcement learning (RL), in particular, has attracted recent attention =-=[22, 17, 16, 13, 23, 8, 15]-=-. As noted in [17], using RL as a means of achieving coordinated behavior is attractive because of its generality and robustness. Standard techniques for RL, for example, Q-learning [21], have been ap... |

53 |
Emergent conventions in multi-agent systems: initial experimental results and observations (preliminary report
- Shoham, Tennenholtz
- 1992
(Show Context)
Citation Context ...d through repeated play of the game with the same agents [5, 6, 9, 11]. (Repeated play with a random selection of similar agents from a large population has also been the object of considerable study =-=[17, 10, 24]-=-.) we will see that interesting issues emerge. One especially simple, yet often effective, learning model for achieving coordination is fictitious play [3, 5]. Each agent i keeps a count C j a j , for... |

49 |
Steady state learning and Nash equilibrium
- Fudenberg, Levine
- 1993
(Show Context)
Citation Context ...ts. Joint action learners (JALs), in contrast, learn the value of their own actions in conjunction with those of other agents via integration of RL with equilibrium (or coordination) learning methods =-=[24, 5, 6, 9]-=-. We then briefly consider the importance of exploitive exploration strategies and examine, through a series of examples, how game structure and exploration strategies influence the dynamics of the le... |

47 |
long run equilibria in games
- Learning
- 1993
(Show Context)
Citation Context ...d through repeated play of the game with the same agents [5, 6, 9, 11]. (Repeated play with a random selection of similar agents from a large population has also been the object of considerable study =-=[17, 10, 24]-=-.) we will see that interesting issues emerge. One especially simple, yet often effective, learning model for achieving coordination is fictitious play [3, 5]. Each agent i keeps a count C j a j , for... |

45 |
Self-Confirming Equilibrium
- Fudenberg, Levine
- 1993
(Show Context)
Citation Context ...ts. Joint action learners (JALs), in contrast, learn the value of their own actions in conjunction with those of other agents via integration of RL with equilibrium (or coordination) learning methods =-=[12, 5, 4, 7]-=-. We also examine the influence of partial observability on JALs, and how game structure and exploration strategies influence the dynamics of the learning process and the convergence to equilibrium. W... |

41 |
learning and coordination in multiagent decision processes
- Planning
- 1996
(Show Context)
Citation Context ...Gammai , a strategysi is a best responsesfor agent i if the expected value of the strategy profile 1 Most of our conclusions hold mutatis mutandis for sequential, multiagent Markov decision processes =-=[3]-=- with multiple states. \Pi \Gammai [ f i g is maximal for agent i; that is, agent i could not do better using any other strategys0 i . Finally, we say that the strategy profile \Pi is a Nash equilibri... |

32 | Multiagent coordination with learning classifier systems
- Sen, Sekaran
- 1995
(Show Context)
Citation Context ...o the problem of coordination in multiagent systems (MASs) has become increasingly popular in AI and game theory. The use of reinforcement learning (RL), in particular, has attracted recent attention =-=[22, 17, 16, 13, 23, 8, 15]-=-. As noted in [17], using RL as a means of achieving coordinated behavior is attractive because of its generality and robustness. Standard techniques for RL, for example, Q-learning [21], have been ap... |

31 | Self-fulfilling Bias in Multiagent Learning
- Wellman
- 1996
(Show Context)
Citation Context |

21 |
Learning to coordinate actions in multi-agent systems
- Wei
- 1993
(Show Context)
Citation Context |

18 | Learning conventions in multiagent stochastic domains using likelihood estimates
- Boutilier
- 1996
(Show Context)
Citation Context ...ium in our simple cooperative games assuming that agents randomize when multiple best responses exist [12], and can be made to converge to an optimal equilibrium if appropriate mechanisms are adopted =-=[1]-=-; that is, the probability of coordinated equilibrium after k interactions can be made arbitrarily high by increasing k sufficiently. It is also not hard to see that once the agents reach an equilibri... |

16 |
The Dynamics of Reinforcement Learning
- Claus, Boutilier
- 1998
(Show Context)
Citation Context ...observable model mentioned above, by allowing experiences of the form ha i ; o; ri where a i is the action performed by i, and o is its (joint action) observation. A preliminary version of this paper =-=[4]-=- studies the methods below within this model. 3 Comparing Independent and Joint-Action Learners We first compare the relative performance of independent and joint-action learners on a simple coordinat... |

15 |
Learning to coordinate actions in multi-agent systems
- Weiß
- 1993
(Show Context)
Citation Context |

6 |
Learning in the iterated prisoner’s dilemma
- Sandholm, Crites
- 1995
(Show Context)
Citation Context |

4 |
Decentralized learning in Markov chains
- Wheeler, Narendra
- 1951
(Show Context)
Citation Context ...rate empirical convergence. These results are consistent with ours, but properties of the convergence points (whether they are optimal or even in equilibrium are not considered). Wheeler and Narendra =-=[23]-=- develop a learning automata (LA) model for fully cooperative games. They show that using this model agents will converge to equilibrium if there is a unique pure strategy equilibrium; thus the coordi... |

1 |
Emergent conventions in multi-agent systems: Initial experimental results and observations
- andMosheTennenholtz
- 1992
(Show Context)
Citation Context ... through repeated play of the game with the same agents [5, 6, 10, 13]. (Repeated play with a random selection of similar agents from a large population has also been the object of considerable study =-=[1, 18, 11, 24]-=-.) One especially simple, yet often effective, learning model for achieving coordination is fictitious play [4, 5]. Each agent i keeps a count C j a j , for each j 2 ff and a j 2 A j , of the number o... |

1 |
Learning to coordinate actions in multi-agent systems
- WeilL
(Show Context)
Citation Context ...o the problem of coordination in multiagent systems (MASs) has become increasingly popular in AI and game theory. The use of reinforcement learning (RL), in particular, has attracted recent attention =-=[11, 9, 8, 6]-=-. As noted in [9], using RL as.a means of achieving coordinated behavior is attractive because of its generality and robustness. Standard techniques for RL, for example, Q-learning [10], have been app... |