• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

A polynomial-time Nash equilibrium algorithm for repeated games. Decision Support Systems (2005)

by M Littman, P Stone
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 31
Next 10 →

Complexity Results about Nash Equilibria

by Vincent Conitzer, Tuomas Sandholm , 2002
"... Noncooperative game theory provides a normative framework for analyzing strategic interactions. ..."
Abstract - Cited by 115 (10 self) - Add to MetaCart
Noncooperative game theory provides a normative framework for analyzing strategic interactions.

AWESOME: A General Multiagent Learning Algorithm that Converges in Self-Play and Learns a Best Response against Stationary Opponents

by Vincent Conitzer, Tuomas Sandholm - IN PROCEEDINGS OF THE 20TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING , 2006
"... Two minimal requirements for a satisfactory multiagent learning algorithm are that it 1. learns to play optimally against stationary opponents and 2. converges to a Nash equilibrium in self-play. The previous algorithm that has come closest, WoLF-IGA, has been proven to have these two properties ..."
Abstract - Cited by 57 (5 self) - Add to MetaCart
Two minimal requirements for a satisfactory multiagent learning algorithm are that it 1. learns to play optimally against stationary opponents and 2. converges to a Nash equilibrium in self-play. The previous algorithm that has come closest, WoLF-IGA, has been proven to have these two properties in 2-player 2-action (repeated) games -- assuming that the opponent's mixed strategy is observable. Another algorithm, ReDVaLeR (which was introduced after the algorithm described in this paper), achieves the two properties in games with arbitrary numbers of actions and players, but still requires that the opponents' mixed strategies are observable. In this paper we present AWESOME, the first algorithm that is guaranteed to have the two properties in games with arbitrary numbers of actions and players. It is still the only algorithm that does so while only relying on observing the other players' actual actions (not their mixed strategies). It also learns to play optimally against opponents that eventually become stationary. The basic idea behind AWESOME (Adapt When Everybody is Stationary, Otherwise Move to Equilibrium) is to try to adapt to the others' strategies when they appear stationary, but otherwise to retreat to a precomputed equilibrium strategy. We provide experimental results that suggest that AWESOME converges fast in practice. The techniques used to prove the properties of AWESOME are fundamentally different from those used for previous algorithms, and may help in analyzing future multiagent learning algorithms as well.

Computing Equilibria in Multi-Player Games

by Christos H. Papadimitriou, Tim Roughgarden - In Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms (SODA , 2004
"... We initiate the systematic study of algorithmic issues involved in finding equilibria (Nash and correlated) in games with a large number of players; such games, in order to be computationally meaningful, must be presented in some succinct, game-specific way. We develop a general framework for obta ..."
Abstract - Cited by 47 (3 self) - Add to MetaCart
We initiate the systematic study of algorithmic issues involved in finding equilibria (Nash and correlated) in games with a large number of players; such games, in order to be computationally meaningful, must be presented in some succinct, game-specific way. We develop a general framework for obtaining polynomial-time algorithms for optimizing over correlated equilibria in such settings, and show how it can be applied successfully to symmetric games (for which we actually find an exact polytopal characterization), graphical games, and congestion games, among others. We also present complexity results implying that such algorithms are not possible in certain other such games. Finally, we present a polynomial-time algorithm, based on quantifier elimination, for finding a Nash equilibrium in symmetric games when the number of strategies is relatively small.

Computing the optimal strategy to commit to

by Vincent Conitzer, Tuomas Sandholm - IN PROCEEDINGS OF THE 7TH ACM CONFERENCE ON ELECTRONIC COMMERCE (ACM-EC , 2006
"... In multiagent systems, strategic settings are often analyzed under the assumption that the players choose their strategies simultaneously. However, this model is not always realistic. In many settings, one player is able to commit to a strategy before the other player makes a decision. Such models a ..."
Abstract - Cited by 46 (11 self) - Add to MetaCart
In multiagent systems, strategic settings are often analyzed under the assumption that the players choose their strategies simultaneously. However, this model is not always realistic. In many settings, one player is able to commit to a strategy before the other player makes a decision. Such models are synonymously referred to as leadership, commitment, or Stackelberg models, and optimal play in such models is often significantly different from optimal play in the model where strategies are selected simultaneously. The recent surge in interest in computing game-theoretic solutions has so far ignored leadership models (with the exception of the interest in mechanism design, where the designer is implicitly in a leadership position). In this paper, we study how to compute optimal strategies to commit to under both commitment to pure strategies and commitment to mixed strategies, in both normal-form and Bayesian games. We give both positive results (efficient algorithms) and negative results (NP-hardness results).

Learning to compete, compromise, and cooperate in repeated general-sum games

by Jacob W. Crandall, Michael A. Goodrich - In Proc. 22nd ICML , 2005
"... Learning algorithms often obtain relatively low average payoffs in repeated general-sum games between other learning agents due to a focus on myopic best-response and one-shot Nash equilibrium (NE) strategies. A less myopic approach places focus on NEs of the repeated game, which suggests that (at t ..."
Abstract - Cited by 19 (2 self) - Add to MetaCart
Learning algorithms often obtain relatively low average payoffs in repeated general-sum games between other learning agents due to a focus on myopic best-response and one-shot Nash equilibrium (NE) strategies. A less myopic approach places focus on NEs of the repeated game, which suggests that (at the least) a learning agent should possess two properties. First, an agent should never learn to play a strategy that produces average payoffs less than the minimax value of the game. Second, an agent should learn to cooperate/compromise when beneficial. No learning algorithm from the literature is known to possess both of these properties. We present a reinforcement learning algorithm (M-Qubed) that provably satisfies the first property and empirically displays (in self play) the second property in a wide range of games. 1.

Complexity of (Iterated) Dominance

by Vincent Conitzer, Tuomas Sandholm - EC'05 , 2005
"... We study various computational aspects of solving games using dominance and iterated dominance. We first study both strict and weak dominance (not iterated), and show that checking whether a given strategy is dominated by some mixed strategy can be done in polynomial time using a single linear progr ..."
Abstract - Cited by 19 (8 self) - Add to MetaCart
We study various computational aspects of solving games using dominance and iterated dominance. We first study both strict and weak dominance (not iterated), and show that checking whether a given strategy is dominated by some mixed strategy can be done in polynomial time using a single linear program solve. We then move on to iterated dominance. We show that determining whether there is some path that eliminates a given strategy is NP-complete with iterated weak dominance. This allows us to also show that determining whether there is a path that leads to a unique solution is NP-complete. Both of these results hold both with and without dominance by mixed strategies. (A weaker version of the second result (only without dominance by mixed strategies) was already known [7].) Iterated strict dominance, on the other hand, is path-independent (both with and without dominance by mixed strategies) and can therefore be done in polynomial time. We then study

The complexity of game dynamics: Bgp oscillations, sink equilibria, and beyond

by Alex Fabrikant, Christos H. Papadimitriou - In SODA ’08: Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms , 2008
"... We settle the complexity of a well-known problem in networking by establishing that it is PSPACE-complete to tell whether a system of path preferences in the BGP protocol [25] can lead to oscillatory behavior; one key insight is that the BGP oscillation question is in fact one about Nash dynamics. W ..."
Abstract - Cited by 14 (3 self) - Add to MetaCart
We settle the complexity of a well-known problem in networking by establishing that it is PSPACE-complete to tell whether a system of path preferences in the BGP protocol [25] can lead to oscillatory behavior; one key insight is that the BGP oscillation question is in fact one about Nash dynamics. We show that the concept of sink equilibria proposed recently in [11] is also PSPACE-complete to analyze and approximate for graphical games. Finally, we propose a new equilibrium concept inspired by game dynamics, unit recall equilibria, which we show to be close to universal (exists with high probability in a random game) and algorithmically promising. We also give a relaxation thereof, called componentwise unit recall equilibria, which we show to be both tractable and universal (guaranteed to exist in every game).

A generalized strategy eliminability criterion and computational methods for applying it

by Vincent Conitzer, Tuomas Sandholm - In Proceedings of the National Conference on Artificial Intelligence (AAAI , 2005
"... We define a generalized strategy eliminability criterion for bimatrix games that considers whether a given strategy is eliminable relative to given dominator & eliminee subsets of the players ’ strategies. We show that this definition spans a spectrum of eliminability criteria from strict dominance ..."
Abstract - Cited by 12 (5 self) - Add to MetaCart
We define a generalized strategy eliminability criterion for bimatrix games that considers whether a given strategy is eliminable relative to given dominator & eliminee subsets of the players ’ strategies. We show that this definition spans a spectrum of eliminability criteria from strict dominance (when the sets are as small as possible) to Nash equilibrium (when the sets are as large as possible). We show that checking whether a strategy is eliminable according to this criterion is coNP-complete (both when all the sets are as large as possible and when the dominator sets each have size 1). We then give an alternative definition of the eliminability criterion and show that it is equivalent using the Minimax Theorem. We show how this alternative definition can be translated into a mixed integer program of polynomial size with a number of (binary) integer variables equal to the sum of the sizes of the eliminee sets, implying that checking whether a strategy is eliminable according to the criterion can be done in polynomial time, given that the eliminee sets are small. Finally, we study using the criterion for iterated elimination of strategies. Categories and Subject Descriptors

The myth of the folk theorem

by Christian Borgs, Adam Tauman Kalai, Jennifer Chayes, Vahab Mirrokni - In Proceedings of the 40th Annual ACM Symposium on Theory of Computing , 2008
"... A well-known result in game theory known as “the Folk Theorem ” suggests that finding Nash equilibria in repeated games should be easier than in one-shot games. In contrast, we show that the problem of finding any (approximate) Nash equilibrium for a three-player infinitely-repeated game is computat ..."
Abstract - Cited by 9 (3 self) - Add to MetaCart
A well-known result in game theory known as “the Folk Theorem ” suggests that finding Nash equilibria in repeated games should be easier than in one-shot games. In contrast, we show that the problem of finding any (approximate) Nash equilibrium for a three-player infinitely-repeated game is computationally intractable (even when all payoffs are in {−1, 0, 1}), unless all of PPAD can be solved in randomized polynomial time. This is done by showing that finding Nash equilibria of (k + 1)-player infinitely-repeated games is as hard as finding Nash equilibria of k-player oneshot games, for which PPAD-hardness is known (Daskalakis, Goldberg and Papadimitriou, 2006; Chen, Deng and Teng, 2006; Chen, Teng and Valiant, 2007). This also explains why no computationally-efficient learning dynamics, such as the “no regret ” algorithms, can be rational (in general games with three or more players) in the sense that, when one’s opponents use such a strategy, it is not in general a best reply to follow suit.

Communication Complexity as a Lower Bound for Learning in Games

by Vincent Conitzer , Tuomas Sandholm - IN PROCEEDINGS OF THE 21ST INTERNATIONAL CONFERENCE ON MACHINE LEARNING , 2004
"... A fast-growing body of research in the AI and machine learning communities addresses learning in games, where there are multiple learners with di#erent interests. This research adds to more established research on learning in games conducted in economics. In part because of a clash of fields, ..."
Abstract - Cited by 8 (3 self) - Add to MetaCart
A fast-growing body of research in the AI and machine learning communities addresses learning in games, where there are multiple learners with di#erent interests. This research adds to more established research on learning in games conducted in economics. In part because of a clash of fields, there are widely varying requirements on learning algorithms in this domain. The goal of this paper is to demonstrate how communication complexity can be used as a lower bound on the required learning time or cost. Because this lower bound does not assume any requirements on the learning algorithm, it is universal, applying under any set of requirements on the learning algorithm. We characterize
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University