Results 1 - 10
of
24
Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs
- In Proc. of Int. Joint Conference on Autonomous Agents and Multi Agent Systems
, 2004
"... Partially observable decentralized decision making in robot teams is fundamentally different from decision making in fully observable problems. Team members cannot simply apply single-agent solution techniques in parallel. Instead, we must turn to game theoretic frameworks to correctly model the pro ..."
Abstract
-
Cited by 58 (0 self)
- Add to MetaCart
Partially observable decentralized decision making in robot teams is fundamentally different from decision making in fully observable problems. Team members cannot simply apply single-agent solution techniques in parallel. Instead, we must turn to game theoretic frameworks to correctly model the problem. While partially observable stochastic games (POSGs) provide a solution model for decentralized robot teams, this model quickly becomes intractable. We propose an algorithm that approximates POSGs as a series of smaller, related Bayesian games, using heuristics such as QMDP to provide the future discounted value of actions. This algorithm trades off limited look-ahead in uncertainty for computational feasibility, and results in policies that are locally optimal with respect to the selected heuristic. Empirical results are provided for both a simple problem for which the full POSG can also be constructed, as well as more complex, robot-inspired, problems.
Gradient-based algorithms for finding nash equilibria in extensive form games
- In Proceedings of the Eighteenth International Conference on Game Theory
, 2007
"... We present a computational approach to the saddle-point formulation for the Nash equilibria of two-person, zerosum sequential games of imperfect information. The algorithm is a first-order gradient method based on modern smoothing techniques for non-smooth convex optimization. The algorithm requires ..."
Abstract
-
Cited by 27 (11 self)
- Add to MetaCart
We present a computational approach to the saddle-point formulation for the Nash equilibria of two-person, zerosum sequential games of imperfect information. The algorithm is a first-order gradient method based on modern smoothing techniques for non-smooth convex optimization. The algorithm requires O(1/ɛ) iterations to compute an ɛ-equilibrium, and the work per iteration is extremely low. These features enable us to find approximate Nash equilibria for sequential games with a tree representation of about 10 10 nodes. This is three orders of magnitude larger than what previous algorithms can handle. We present two heuristic improvements to the basic algorithm and demonstrate their efficacy on a range of real-world games. Furthermore, we demonstrate how the algorithm can be customized to a specific class of problems with enormous memory savings. 1
Finding equilibria in large sequential games of imperfect information
- In ACM Conference on Electronic Commerce
, 2006
"... Information ∗ ..."
Better automated abstraction techniques for imperfect information games, with application to Texas Hold’em poker
- In International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS
, 2007
"... We present new approximation methods for computing gametheoretic strategies for sequential games of imperfect information. At a high level, we contribute two new ideas. First, we introduce a new state-space abstraction algorithm. In each round of the game, there is a limit to the number of strategic ..."
Abstract
-
Cited by 17 (7 self)
- Add to MetaCart
We present new approximation methods for computing gametheoretic strategies for sequential games of imperfect information. At a high level, we contribute two new ideas. First, we introduce a new state-space abstraction algorithm. In each round of the game, there is a limit to the number of strategically different situations that an equilibrium-finding algorithm can handle. Given this constraint, we use clustering to discover similar positions, and we compute the abstraction via an integer program that minimizes the expected error at each stage of the game. Second, we present a method for computing the leaf payoffs for a truncated version of the game by simulating the actions in the remaining portion of the game. This allows the equilibrium-finding algorithm to take into account the entire game tree while having to explicitly solve only a truncated version. Experiments show that each of our two new techniques improves performance dramatically in Texas Hold’em poker. The techniques lead to a drastic improvement over prior approaches for automatically generating agents, and our agent plays competitively even against the best agents overall.
Lossless abstraction of imperfect information games
- Journal of the ACM
, 2007
"... Abstract. Finding an equilibrium of an extensive form game of imperfect information is a fundamental problem in computational game theory, but current techniques do not scale to large games. To address this, we introduce the ordered game isomorphism and the related ordered game isomorphic abstractio ..."
Abstract
-
Cited by 14 (7 self)
- Add to MetaCart
Abstract. Finding an equilibrium of an extensive form game of imperfect information is a fundamental problem in computational game theory, but current techniques do not scale to large games. To address this, we introduce the ordered game isomorphism and the related ordered game isomorphic abstraction transformation. For a multi-player sequential game of imperfect information with observable actions and an ordered signal space, we prove that any Nash equilibrium in an abstracted smaller game, obtained by one or more applications of the transformation, can be easily converted into a Nash equilibrium in the original game. We present an algorithm, GameShrink, for abstracting the game using our isomorphism exhaustively. Its complexity is Õ(n2), where n is the number of nodes in a structure we call the signal tree. It is no larger than the game tree, and on nontrivial games it is drastically smaller, so GameShrink has time and space complexity sublinear in the size of the game tree. Using GameShrink, we find an equilibrium to a poker game with 3.1 billion nodes—over four orders of magnitude more than in the largest poker game solved previously. To address even larger games, we introduce approximation methods that do not preserve equilibrium, but nevertheless yield (ex post) provably close-to-optimal strategies.
SMOOTHING TECHNIQUES FOR COMPUTING NASH EQUILIBRIA OF SEQUENTIAL GAMES
"... Abstract. We develop first-order smoothing techniques for saddle-point problems that arise in the Nash equilibria computation of sequential games. The crux of our work is a construction of suitable prox-functions for a certain class of polytopes that encode the sequential nature of the games. An imp ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
Abstract. We develop first-order smoothing techniques for saddle-point problems that arise in the Nash equilibria computation of sequential games. The crux of our work is a construction of suitable prox-functions for a certain class of polytopes that encode the sequential nature of the games. An implementation based on our smoothing techniques computes approximate Nash equilibria for games that are four orders of magnitude larger than what conventional computational approaches can handle. 1.
A near-optimal strategy for a heads-up no-limit Texas Hold’em poker tournament
- In International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS
, 2007
"... We analyze a heads-up no-limit Texas Hold’em poker tournament with a fixed small blind of 300 chips, a fixed big blind of 600 chips and a total amount of 8000 chips on the table (until recently, these parameters defined the headsup endgame of sit-n-go tournaments on the popular Party-Poker.com onlin ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
We analyze a heads-up no-limit Texas Hold’em poker tournament with a fixed small blind of 300 chips, a fixed big blind of 600 chips and a total amount of 8000 chips on the table (until recently, these parameters defined the headsup endgame of sit-n-go tournaments on the popular Party-Poker.com online poker site). Due to the size of this game, a computation of an optimal (i.e. minimax) strategy for the game is completely infeasible. However, combining an algorithm due to Koller, Megiddo and von Stengel with concepts of Everett and suggestions of Sklansky, we compute an optimal jam/fold strategy, i.e. a strategy that would be optimal if any bet made by the player playing by the strategy (but not bets of his opponent) had to be his entire stack. Our computations establish that the computed strategy is nearoptimal for the unrestricted tournament (i.e., with post-flop play being allowed) in the rigorous sense that a player playing by the computed strategy will win the tournament with a probability within 1.4 percentage points of the probability that an optimal strategy (allowing post-flop play) would give.
Expectation-Based Versus Potential-Aware Automated Abstraction in Imperfect Information Games: An Experimental Comparison Using Poker
"... Automated abstraction algorithms for sequential imperfect information games have recently emerged as a key component in developing competitive game theory-based agents. The existing literature has not investigated the relative performance of different abstraction algorithms. Instead, agents whose co ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
Automated abstraction algorithms for sequential imperfect information games have recently emerged as a key component in developing competitive game theory-based agents. The existing literature has not investigated the relative performance of different abstraction algorithms. Instead, agents whose construction has used automated abstraction have only been compared under confounding effects: different granularities of abstraction and equilibrium-finding algorithms that yield different accuracies when solving the abstracted game. This paper provides the first systematic evaluation of abstraction algorithms. Two families of algorithms have been proposed. The distinguishing feature is the measure used to evaluate the strategic similarity between game states. One algorithm uses the probability of winning as the similarity measure. The other uses a potential-aware similarity measure based on probability distributions over future states. We conduct experiments on Rhode Island Hold’em poker. We compare the algorithms against each other, against optimal play, and against each agent’s nemesis. We also compare them based on the resulting game’s value. Interestingly, for very coarse abstractions the expectation-based algorithm is better, but for moderately coarse and fine abstractions the potentialaware approach is superior. Furthermore, agents constructed using the expectation-based approach are highly exploitable beyond what their performance against the game’s optimal strategy would suggest.
Approximation guarantees for fictitious play
- In Proceedings of the 47th Annual Allerton Conference on Communication, Control, and Computing
, 2009
"... Abstract—Fictitious play is a simple, well-known, and oftenused algorithm for playing (and, especially, learning to play) games. However, in general it does not converge to equilibrium; even when it does, we may not be able to run it to convergence. Still, we may obtain an approximate equilibrium. I ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Abstract—Fictitious play is a simple, well-known, and oftenused algorithm for playing (and, especially, learning to play) games. However, in general it does not converge to equilibrium; even when it does, we may not be able to run it to convergence. Still, we may obtain an approximate equilibrium. In this paper, we study the approximation properties that fictitious play obtains when it is run for a limited number of rounds. We show that if both players randomize uniformly over their actions in the first r rounds of fictitious play, then the result is an ǫ-equilibrium, where ǫ = (r + 1)/(2r). (Since we are examining only a constant number of pure strategies, we know that ǫ < 1/2 is impossible, due to a result of Feder et al.) We show that this bound is tight in the worst case; however, with an experiment on random games, we illustrate that fictitious play usually obtains a much better approximation. We then consider the possibility that the players fail to choose the same r. We show how to obtain the optimal approximation guarantee when both the opponent’s r and the game are adversarially chosen (but there is an upper bound R on the opponent’s r), using a linear program formulation. We show that if the action played in the ith round of fictitious play is chosen with probability proportional to: 1 for i = 1 and 1/(i −1) for all 2 ≤ i ≤ R+1, this gives an approximation guarantee of 1 − 1/(2 + ln R). We also obtain a lower bound of 1 − 4/ln R. This provides an actionable prescription for how long to run fictitious play. I.

