## AWESOME: A General Multiagent Learning Algorithm that Converges in Self-Play and Learns a Best Response against Stationary Opponents (2006)

### Cached

### Download Links

Venue: | IN PROCEEDINGS OF THE 20TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING |

Citations: | 90 - 5 self |

### BibTeX

@INPROCEEDINGS{Conitzer06awesome:a,

author = {Vincent Conitzer and Tuomas Sandholm},

title = {AWESOME: A General Multiagent Learning Algorithm that Converges in Self-Play and Learns a Best Response against Stationary Opponents},

booktitle = {IN PROCEEDINGS OF THE 20TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING},

year = {2006},

pages = {83--90},

publisher = {}

}

### Years of Citing Articles

### OpenURL

### Abstract

Two minimal requirements for a satisfactory multiagent learning algorithm are that it 1. learns to play optimally against stationary opponents and 2. converges to a Nash equilibrium in self-play. The previous algorithm that has come closest, WoLF-IGA, has been proven to have these two properties in 2-player 2-action (repeated) games -- assuming that the opponent's mixed strategy is observable. Another algorithm, ReDVaLeR (which was introduced after the algorithm described in this paper), achieves the two properties in games with arbitrary numbers of actions and players, but still requires that the opponents' mixed strategies are observable. In this paper we present AWESOME, the first algorithm that is guaranteed to have the two properties in games with arbitrary numbers of actions and players. It is still the only algorithm that does so while only relying on observing the other players' actual actions (not their mixed strategies). It also learns to play optimally against opponents that eventually become stationary. The basic idea behind AWESOME (Adapt When Everybody is Stationary, Otherwise Move to Equilibrium) is to try to adapt to the others' strategies when they appear stationary, but otherwise to retreat to a precomputed equilibrium strategy. We provide experimental results that suggest that AWESOME converges fast in practice. The techniques used to prove the properties of AWESOME are fundamentally different from those used for previous algorithms, and may help in analyzing future multiagent learning algorithms as well.

### Citations

875 |
The theory of learning in games
- Fudenberg
- 1998
(Show Context)
Citation Context ...) We call a NE a pure-strategy NE if all the individuals’ strategies in it are pure. Otherwise, we call it a mixedstrategy NE. As in most of the game theory literature on learning (for a review, see (=-=Fudenberg & Levine, 1998-=-)) and in both of the theoretical results on multiagent learning in computer science that we are trying to improve upon (Bowling & Veloso, 2002; Singh et al., 2000), we assume that the agents know the... |

715 | Equilibrium Points in N-Person Games - Nash - 1950 |

698 | The weighted majority algorithm - Littlestone, Warmuth - 1994 |

525 | Markov games as a framework for multi-agent reinforcement learning
- Littman
- 1994
(Show Context)
Citation Context ...Vrieze, 1987; Claus & Boutilier, 1998; Singh et al., 2000; Bowling & Veloso, 2002; Wang & Sandholm, 2002)). Some of the algorithms satisfy the second property in restricted games (e.g. (Vrieze, 1987; =-=Littman, 1994-=-; Hu & Wellman, 1998; Singh et al., 2000; Bowling & Veloso, 2002; Wang & Sandholm, 2002)). The algorithm that has come closest to satisfying both of the properties in general repeated games is WoLFIGA... |

495 | Theories of Bounded Rationality - Simon - 1972 |

406 | Subjectivity and Correlation in Randomized Strategies - Aumann - 1974 |

321 | The dynamics of reinforcement learning in cooperative multiagent systems
- Claus, Boutilier
- 1998
(Show Context)
Citation Context ...However, to date there has been no algorithm that achieves both of these minimal properties in general repeated games. Many of the proposed algorithms satisfy the first property (e.g. (Vrieze, 1987; =-=Claus & Boutilier, 1998-=-; Singh et al., 2000; Bowling & Veloso, 2002; Wang & Sandholm, 2002)). Some of the algorithms satisfy the second property in restricted games (e.g. (Vrieze, 1987; Littman, 1994; Hu & Wellman, 1998; Si... |

297 | Multiagent reinforcement learning: Theoretical framework and an algorithm
- Hu, Wellman
- 1998
(Show Context)
Citation Context ...laus & Boutilier, 1998; Singh et al., 2000; Bowling & Veloso, 2002; Wang & Sandholm, 2002)). Some of the algorithms satisfy the second property in restricted games (e.g. (Vrieze, 1987; Littman, 1994; =-=Hu & Wellman, 1998-=-; Singh et al., 2000; Bowling & Veloso, 2002; Wang & Sandholm, 2002)). The algorithm that has come closest to satisfying both of the properties in general repeated games is WoLFIGA (Bowling & Veloso, ... |

267 | Multi-agent Reinforcement Learning: Independent vs. Cooperative Agents
- Tan
- 1993
(Show Context)
Citation Context ...s making the environment nonstationary for a learner. Multiagent learning has been studied with different objectives and different restrictions on the game and on what the learner can observe (e.g., (=-=Tan, 1993-=-; Sen & Weiss, 1998)). Two minimal desirable properties of a good multiagent learning algorithm are • Learning to play optimally against stationary opponents (or even opponents that eventually become ... |

245 | R-max - a general polynomial time algorithm for near-optimal reinforcement learning - Brafman, Tennenholtz, et al. - 2002 |

243 | Rational Learning Leads to Nash Equilibrium - KALAI, LEHRER - 1993 |

242 | 2000) “A Simple Adaptive Procedure Leading to Correlated Equilibrium
- Hart, Mas-Colell
(Show Context)
Citation Context ...goals is “regret matching”, with which the learner’s regrets converge to zero and, if all players use the learning algorithm, the empirical distributions of play converge to a correlated equilibrium (=-=Hart & Mas-Colell, 2000-=-). (The set of correlated equilibria is a strict superset of the set of Nash equilibria, where players are allowed to condition their action on a commonly observed signal. Thus, convergence to a Nash ... |

202 | Equilibrium points of bimatrix games - Lemke, Howson - 1964 |

194 | Gambling in a rigged casino: The adversarial multi-armed bandit problem - Auer, Milano, et al. - 1998 |

194 | Multiagent learning using a variable learning rate
- Bowling, Veloso
- 2002
(Show Context)
Citation Context ...s at least one of these properties is, in a sense, unsatisfactory. Of course, one might also want the algorithm to have additional properties. 3 2 This property has sometimes been called rationality (=-=Bowling & Veloso, 2002-=-), but we avoid that term because it has an established, different meaning in economics. 3 It can be argued that the two properties are not even strong enough to constitute a “minimal” set of requirem... |

165 |
Multiagent Systems
- Weiss
- 2000
(Show Context)
Citation Context ...e environment nonstationary for a learner. Multiagent learning has been studied with different objectives and different restrictions on the game and on what the learner can observe (e.g., (Tan, 1993; =-=Sen & Weiss, 1998-=-)). Two minimal desirable properties of a good multiagent learning algorithm are • Learning to play optimally against stationary opponents (or even opponents that eventually become stationary). 2 • Co... |

161 | An iterative method of solving a game - Robinson - 1951 |

138 | Adaptive game playing using multiplicative weights - Freund, Schapire - 1999 |

136 | Algorithms, games, and the Internet - Papadimitriou - 2001 |

132 | Complexity Results about Nash Equilibria
- Conitzer, Sandholm
- 2003
(Show Context)
Citation Context ...librium. 6 (It is still unknown whether a Nash equilibrium can be found in worst-case polynomial time (Papadimitriou, 2001), but it is known that certain related questions are hard in the worst case (=-=Conitzer & Sandholm, 2003-=-).) 7 The basic idea behind AWESOME (Adapt When Everybody is Stationary, Otherwise Move to Equilibrium) is to try to adapt to the other agents’ strategies when they appear stationary, but otherwise to... |

129 | Nash and correlated equilibria: Some complexity considerations - Gilboa, Zemel - 1989 |

111 | Consistency and cautious fictitious play - Fudenberg, Levine - 1995 |

105 | Multiagent reinforcement learning in the iterated prisoner’s dilemma - Sandholm, Crites - 1995 |

98 | Some topics in two-person games - Shapley - 1964 |

96 | Nash convergence of gradient dynamics in general-sum games
- Singh, Kearns, et al.
- 1994
(Show Context)
Citation Context ...as been no algorithm that achieves both of these minimal properties in general repeated games. Many of the proposed algorithms satisfy the first property (e.g. (Vrieze, 1987; Claus & Boutilier, 1998; =-=Singh et al., 2000-=-; Bowling & Veloso, 2002; Wang & Sandholm, 2002)). Some of the algorithms satisfy the second property in restricted games (e.g. (Vrieze, 1987; Littman, 1994; Hu & Wellman, 1998; Singh et al., 2000; Bo... |

89 | Calibrated learning and correlated equilibrium
- Foster, Vohra
- 1997
(Show Context)
Citation Context ... Nash equilibrium is a strictly stronger property than convergence to a correlated equilibrium.) Convergence to correlated equilibria is achieved by a number of other learning procedures (Cahn, 2000; =-=Foster & Vohra, 1997-=-; Fudenberg & Levine, 1999). In this paper we present AWESOME, the first algorithm that has both of the desirable properties in general repeated games. 5 It removes all of the assumptions (a)–(d). It ... |

86 | Simple search methods for finding a Nash equilibrium - Porter, Nudelman, et al. - 2004 |

81 | Reinforcement learning to play an optimal Nash equilibrium in team Markov games - Wang, Sandholm - 2003 |

71 | Evolutionary" selection dynamics in games: Convergence and limit properties - Nachbar - 1990 |

70 | Convergence and no-regret in multiagent learning - Bowling - 2004 |

66 | A polynomial-time nash equilibrium algorithm for repeated games
- Littman, Stone
- 2003
(Show Context)
Citation Context ...or on the opponents’ current behavior. Interestingly, a recent paper shows that when players are interested in their average payoffs, such equilibria can be constructed in worst-case polynomial time (=-=Littman & Stone, 2003-=-). The rest of the paper is organized as follows. In Section 2, we define the setting. In Section 3, we motivate and define the AWESOME algorithm and show how to set its parameters soundly. In Section... |

63 | Uncoupled dynamics do not lead to Nash equilibrium - Hart, Mas-Colell - 2003 |

55 | New criteria and a new algorithm for learning in multi-agent systems - Powers, Shoham - 2005 |

52 | Mixedinteger programming methods for finding Nash equilibria - Sandholm, Gilpin, et al. - 2005 |

48 | Efficient learning equilibrium - Brafman, Tennenholtz |

43 | A generalized reinforcement-learning model: Convergence and applications - Littman, Szepesvári - 1996 |

42 | Learning against opponents with bounded memory - Powers, Shoham |

40 | Evaluating concurrent reinforcement learners - Mundhe, Sen |

39 | Deterministic calibration and Nash equilibrium - Kakade, Foster - 2004 |

36 | Conditional universal consistency
- Fudenberg, D
- 1999
(Show Context)
Citation Context ... strictly stronger property than convergence to a correlated equilibrium.) Convergence to correlated equilibria is achieved by a number of other learning procedures (Cahn, 2000; Foster & Vohra, 1997; =-=Fudenberg & Levine, 1999-=-). In this paper we present AWESOME, the first algorithm that has both of the desirable properties in general repeated games. 5 It removes all of the assumptions (a)–(d). It has the two desirable prop... |

31 | On no-regret learning, fictitious play, and nash equilibrium - Jafari, Greenwald, et al. - 2001 |

26 | On the convergence of the learning process in a 2 x 2 non-zero-sum two-person game - Miyasawa - 1961 |

24 | Performance bounded reinforcement learning in strategic interactions - Banerjee, Peng - 2004 |

22 | L.: Satisficing and learning cooperation in the prisoner’s dilemma - Stimpson, Goodrich, et al. - 2001 |

21 | A near optimal polynomial time algorithm for learning in certain classes of stochastic games - Brafman, Tennenholtz - 2000 |

21 | On the impossibility of predicting the behavior of rational agents - Foster, Young - 2001 |

18 |
Stochastic Games with Finite State and Action Spaces. No. 33 in CWI Tracts. Centrum voor Wiskunde en Informatica
- Vrieze
- 1987
(Show Context)
Citation Context ...gton DC, 2003.However, to date there has been no algorithm that achieves both of these minimal properties in general repeated games. Many of the proposed algorithms satisfy the first property (e.g. (=-=Vrieze, 1987-=-; Claus & Boutilier, 1998; Singh et al., 2000; Bowling & Veloso, 2002; Wang & Sandholm, 2002)). Some of the algorithms satisfy the second property in restricted games (e.g. (Vrieze, 1987; Littman, 199... |

16 | A general class of noregret learning algorithms and game-theoretic equilibria - Greenwald, Jafari - 2003 |

14 | General procedures leading to correlated equilibria
- Cahn
(Show Context)
Citation Context ...ergence to a Nash equilibrium is a strictly stronger property than convergence to a correlated equilibrium.) Convergence to correlated equilibria is achieved by a number of other learning procedures (=-=Cahn, 2000-=-; Foster & Vohra, 1997; Fudenberg & Levine, 1999). In this paper we present AWESOME, the first algorithm that has both of the desirable properties in general repeated games. 5 It removes all of the as... |

12 | Communication complexity as a lower bound for learning in games - Conitzer, Sandholm |