Abstract:
We survey the recent work in AI on multi-agent reinforcement learning (that is, learning in stochastic games). We then argue that, while exciting, this work is flawed. The fundamental flaw is unclarity about the problem or problems being addressed. After tracing a representative sample of the recent literature, we identify four well-defined problems in multi-agent reinforcement learning, single out the problem that in our view is most suitable for AI, and make some remarks about how we believe progress is to be made on this problem.
Citations
|
1487
|
Dynamic programming
– Bellman
- 1957
|
|
340
|
Markov games as a framework for multi-agent reinforcement learning
– Littman
- 1994
|
|
205
|
Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm
– Hu, Wellman
- 1998
|
|
197
|
The dynamics of reinforcement learning in cooperative multiagent systems
– Claus, Boutilier
- 1998
|
|
140
|
Predicting how people play games: Reinforcement learning in experimental games with unique, mixed strategy equilibria
– Erev, Roth
- 1998
|
|
132
|
Rational learning leads to Nash equilibrium
– Kalai, Lehrer
- 1993
|
|
127
|
Learning to coordinate without sharing information
– Sen, Sekaran, et al.
- 1994
|
|
112
|
Modeling Bounded Rationality
– Rubinstein
- 1998
|
|
77
|
2001, ‘Friend-or-Foe Q-Learning in General-Sum Games
– Littman
|
|
75
|
Formulation of Bayesian Analysis for Games with Incomplete Information
– Mertens, Zamir
- 1985
|
|
65
|
Iterative Solution of Games by Fictitious Play
– Brown
- 1951
|
|
57
|
Online learning about other agents in a dynamic multiagent system
– Hu, Wellman
- 1998
|
|
51
|
Learning mixed equilibria
– Fudenberg, Kreps
- 1993
|
|
51
|
A decisiontheoretic approach to coordinating multiagent interactions
– Gmytrasiewicz, Durfee, et al.
- 1991
|
|
49
|
Rational and convergent learning in stochastic games
– Bowling, Veloso
- 2001
|
|
48
|
Bounded complexity justifies cooperation in the finitely repeated prisoner’s dilemma
– Neyman
- 1985
|
|
42
|
Approximation to bayes risk in repeated plays,” in Contributions to the Theory of
– Hannan
- 1957
|
|
38
|
On complexity as bounded rationality
– Papadimitriou, Yannakakis
- 1994
|
|
34
|
A generalized reinforcement-learning model: Convergence and applications
– Littman, Szepesvári
- 1996
|
|
29
|
Efficient learning equilibrium
– Brafman, Tennenholtz
- 2004
|
|
23
|
Implicit negotiation in repeated games
– Littman, Stone
- 2001
|
|
21
|
Correlated-q learning
– Greenwald, Hall
- 2003
|
|
21
|
A trading agent competition for the research community
– Wellman, Wurman
- 1999
|
|
20
|
Playing is believing: The role of beliefs in multiagent learning
– Chang, Kaelbling
- 2001
|
|
16
|
Convergence problems of general-sum multiagent reinforcement learning
– Bowling
- 2000
|
|
16
|
Sophisticated EWA learning and strategic teaching in repeated games
– Camerer, Ho, et al.
- 2002
|
|
16
|
Non-computable strategies and discounted repeated games
– Nachbar, Zame
- 1996
|
|
11
|
Learning to play games in extensive form by valuation
– Jehiel, Samet
- 2001
|
|
4
|
Multiagent Q-learning
– Hu, Wellman
- 2002
|
|
2
|
E#cient learning equilibrium
– Tennenholtz
- 2002
|