Results 1  10
of
84
Abstraction Pathologies in Extensive Games
"... Extensive games can be used to model many scenarios in which multiple agents interact with an environment. There has been considerable recent research on finding strong strategies in very large, zerosum extensive games. The standard approach in such work is to employ abstraction techniques to deriv ..."
Abstract

Cited by 43 (25 self)
 Add to MetaCart
(Show Context)
Extensive games can be used to model many scenarios in which multiple agents interact with an environment. There has been considerable recent research on finding strong strategies in very large, zerosum extensive games. The standard approach in such work is to employ abstraction techniques to derive a more tractably sized game. An extensive game solver is then employed to compute an equilibrium in that abstract game, and the resulting strategy is presumed to be strong in the full game. Progress in this line of research has focused on solving larger abstract games, which more closely resemble the full game. However, there is an underlying assumption that by abstracting less, and solving a larger game, an agent will have a stronger strategy in the full game. In this work we show that this assumption is not true in general. Refining an abstraction can actually lead to a weaker strategy. We show examples of these abstraction pathologies in a small game of poker that can be analyzed exactly. These examples show that pathologies arise when abstracting both chance nodes as well as a player’s actions. In summary, this paper shows that the standard approach to finding strong strategies for large extensive games rests on shaky ground.
SMOOTHING TECHNIQUES FOR COMPUTING NASH EQUILIBRIA OF SEQUENTIAL GAMES
"... We develop firstorder smoothing techniques for saddlepoint problems that arise in the Nash equilibria computation of sequential games. The crux of our work is a construction of suitable proxfunctions for a certain class of polytopes that encode the sequential nature of the games. An implementatio ..."
Abstract

Cited by 40 (10 self)
 Add to MetaCart
(Show Context)
We develop firstorder smoothing techniques for saddlepoint problems that arise in the Nash equilibria computation of sequential games. The crux of our work is a construction of suitable proxfunctions for a certain class of polytopes that encode the sequential nature of the games. An implementation based on our smoothing techniques computes approximate Nash equilibria for games that are four orders of magnitude larger than what conventional computational approaches can handle.
Computing robust counterstrategies
 In Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS
, 2007
"... Adaptation to other initially unknown agents often requires computing an effective counterstrategy. In the Bayesian paradigm, one must find a good counterstrategy to the inferred posterior of the other agents ’ behavior. In the experts paradigm, one may want to choose experts that are good counter ..."
Abstract

Cited by 38 (7 self)
 Add to MetaCart
Adaptation to other initially unknown agents often requires computing an effective counterstrategy. In the Bayesian paradigm, one must find a good counterstrategy to the inferred posterior of the other agents ’ behavior. In the experts paradigm, one may want to choose experts that are good counterstrategies to the other agents ’ expected behavior. In this paper we introduce a technique for computing robust counterstrategies for adaptation in multiagent scenarios under a variety of paradigms. The strategies can take advantage of a suspected tendency in the decisions of the other agents, while bounding the worstcase performance when the tendency is not observed. The technique involves solving a modified game, and therefore can make use of recently developed algorithms for solving very large extensive games. We demonstrate the effectiveness of the technique in twoplayer Texas Hold’em. We show that the computed poker strategies are substantially more robust than best response counterstrategies, while still exploiting a suspected tendency. We also compose the generated strategies in an experts algorithm showing a dramatic improvement in performance over using simple best responses. 1
Monte Carlo Sampling for Regret Minimization in Extensive Games
"... Sequential decisionmaking with multiple agents and imperfect information is commonly modeled as an extensive game. One efficient method for computing Nash equilibria in large, zerosum, imperfect information games is counterfactual regret minimization (CFR). In the domain of poker, CFR has proven e ..."
Abstract

Cited by 34 (13 self)
 Add to MetaCart
(Show Context)
Sequential decisionmaking with multiple agents and imperfect information is commonly modeled as an extensive game. One efficient method for computing Nash equilibria in large, zerosum, imperfect information games is counterfactual regret minimization (CFR). In the domain of poker, CFR has proven effective, particularly when using a domainspecific augmentation involving chance outcome sampling. In this paper, we describe a general family of domainindependent CFR samplebased algorithms called Monte Carlo counterfactual regret minimization (MCCFR) of which the original and pokerspecific versions are special cases. We start by showing that MCCFR performs the same regret updates as CFR on expectation. Then, we introduce two sampling schemes: outcome sampling and external sampling, showing that both have bounded overall regret with high probability. Thus, they can compute an approximate equilibrium using selfplay. Finally, we prove a new tighter bound on the regret for the original CFR algorithm and relate this new bound to MCCFR’s bounds. We show empirically that, although the samplebased algorithms require more iterations, their lower cost per iteration can lead to dramatically faster convergence in various games. 1
Probabilistic State Translation in Extensive Games with Large Action Sets
"... Equilibrium or nearequilibrium solutions to very large extensive form games are often computed by using abstractions to reduce the game size. A common abstraction technique for games with a large number of available actions is to restrict the number of legal actions in every state. This method has ..."
Abstract

Cited by 23 (6 self)
 Add to MetaCart
Equilibrium or nearequilibrium solutions to very large extensive form games are often computed by using abstractions to reduce the game size. A common abstraction technique for games with a large number of available actions is to restrict the number of legal actions in every state. This method has been used to discover equilibrium solutions for the game of nolimit headsup Texas Hold’em. When using a solution to an abstracted game to play one side in the unabstracted (real) game, the real opponent actions may not correspond to actions in the abstracted game. The most popular method for handling this situation is to translate opponent actions in the real game to the closest legal actions in the abstracted game. We show that this approach can result in a very exploitable player and propose an alternative solution. We use probabilistic mapping to translate a real action into a probability distribution over actions, whose weights are determined by a similarity metric. We show that this approach significantly reduces the exploitability when using an abstract solution in the real game. 1
Accelerating best response calculation in large extensive games
 In Proceedings of the TwentySecond International Joint Conference on Artificial Intelligence (IJCAI
, 2011
"... One fundamental evaluation criteria of an AI technique is its performance in the worstcase. For static strategies in extensive games, this can be computed using a best response computation. Conventionally, this requires a full game tree traversal. For very large games, such as poker, that traversal ..."
Abstract

Cited by 20 (10 self)
 Add to MetaCart
One fundamental evaluation criteria of an AI technique is its performance in the worstcase. For static strategies in extensive games, this can be computed using a best response computation. Conventionally, this requires a full game tree traversal. For very large games, such as poker, that traversal is infeasible to perform on modern hardware. In this paper, we detail a general technique for best response computations that can often avoid a full game tree traversal. Additionally, our method is specifically wellsuited for parallel environments. We apply this approach to computing the worstcase performance of a number of strategies in headsup limit Texas hold’em, which, prior to this work, was not possible. We explore these results thoroughly as they provide insight into the effects of abstraction on worstcase performance in large imperfect information games. This is a topic that has received much attention, but could not previously be examined outside of toy domains. 1
Data Biased Robust Counter Strategies
 Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics (AISTATS09
"... The problem of exploiting information about the environment while still being robust to inaccurate or incomplete information arises in many domains. Competitive imperfect information games where the goal is to maximally exploit an unknown opponent’s weaknesses are an example of this problem. Agents ..."
Abstract

Cited by 19 (2 self)
 Add to MetaCart
The problem of exploiting information about the environment while still being robust to inaccurate or incomplete information arises in many domains. Competitive imperfect information games where the goal is to maximally exploit an unknown opponent’s weaknesses are an example of this problem. Agents for these games must balance two objectives. First, they should aim to exploit data from past interactions with the opponent, seeking a bestresponse counter strategy. Second, they should aim to minimize losses since the limited data may be misleading or the opponent’s strategy may have changed, suggesting an opponentagnostic Nash equilibrium strategy. In this paper, we show how to partially satisfy both of these objectives at the same time, producing strategies with favourable tradeoffs between the ability to exploit an opponent and the capacity to be exploited. Like a recently published technique, our approach involves solving a modified game; however the result is more generally applicable and even performs well in situations with very limited data. We evaluate our technique in the game of twoplayer, Limit Texas Hold’em. 1
Using Counterfactual Regret Minimization to Create Competitive Multiplayer Poker Agents
, 2010
"... Games are used to evaluate and advance Multiagent and Artificial Intelligence techniques. Most of these games are deterministic with perfect information (e.g. Chess and Checkers). A deterministic game has no chance element and in a perfect information game, all information is visible to all players. ..."
Abstract

Cited by 18 (5 self)
 Add to MetaCart
(Show Context)
Games are used to evaluate and advance Multiagent and Artificial Intelligence techniques. Most of these games are deterministic with perfect information (e.g. Chess and Checkers). A deterministic game has no chance element and in a perfect information game, all information is visible to all players. However, many realworld scenarios with competing agents are stochastic (nondeterministic) with imperfect information. For twoplayer zerosum perfect recall games, a recent technique called Counterfactual Regret Minimization (CFR) computes strategies that are provably convergent to an εNash equilibrium. A Nash equilibrium strategy is useful in twoplayer games since it maximizes its utility against a worstcase opponent. However, for multiplayer (three or more player) games, we lose all theoretical guarantees for CFR. However, we believe that CFRgenerated
Evaluating StateSpace Abstractions in Extensiveform Games
, 2013
"... Efficient algorithms exist for finding optimal policies in extensiveform games. However, humanscale problems are typically so large that this computation remains infeasible with modern computing resources. Statespace abstraction techniques allow for the derivation of a smaller and strategically s ..."
Abstract

Cited by 16 (4 self)
 Add to MetaCart
Efficient algorithms exist for finding optimal policies in extensiveform games. However, humanscale problems are typically so large that this computation remains infeasible with modern computing resources. Statespace abstraction techniques allow for the derivation of a smaller and strategically similar abstract domain, in which an optimal strategy can be computed and then used as a suboptimal strategy in the real domain. In this paper, we consider the task of evaluating the quality of an abstraction, independent of a specific abstract strategy. In particular, we use a recent metric for abstraction quality and examine imperfect recall abstractions, in which agents “forget ” previously observed information to focus the abstraction effort on more recent and relevant state information. We present experimental results in the domain of Texas hold’em poker that validate the use of distributionaware abstractions over expectationbased approaches, demonstrate that the new metric better predicts tournament performance, and show that abstractions built using imperfect recall outperform those built using perfect recall in terms of both exploitability and oneonone play.
Efficient Nash Equilibrium Approximation through Monte Carlo Counterfactual Regret Minimization
"... Recently, there has been considerable progress towards algorithms for approximating Nash equilibrium strategies in extensive games. One such algorithm, Counterfactual Regret Minimization (CFR), has proven to be effective in twoplayer zerosum poker domains. While the basic algorithm is iterative an ..."
Abstract

Cited by 15 (9 self)
 Add to MetaCart
Recently, there has been considerable progress towards algorithms for approximating Nash equilibrium strategies in extensive games. One such algorithm, Counterfactual Regret Minimization (CFR), has proven to be effective in twoplayer zerosum poker domains. While the basic algorithm is iterative and performs a full game traversal on each iteration, sampling based approaches are possible. For instance, chancesampled CFR considers just a single chance outcome per traversal, resulting in faster but less precise iterations. While more iterations are required, chancesampled CFR requires less time overall to converge. In this work, we present new sampling techniques that consider sets of chance outcomes during each traversal to produce slower, more accurate iterations. By sampling only the public chance outcomes seen by all players, we take advantage of the imperfect information structure of the game to (i) avoid recomputation of strategy probabilities, and (ii) achieve an algorithmic speed improvement, performing O(n 2) work at terminal nodes in O(n) time. We demonstrate that this new CFR update converges more quickly than chancesampled CFR in the large domains of poker and Bluff.