Results 1  10
of
23
Approximation Accuracy, Gradient Methods, and Error Bound for Structured Convex Optimization
, 2009
"... Convex optimization problems arising in applications, possibly as approximations of intractable problems, are often structured and large scale. When the data are noisy, it is of interest to bound the solution error relative to the (unknown) solution of the original noiseless problem. Related to this ..."
Abstract

Cited by 13 (1 self)
 Add to MetaCart
Convex optimization problems arising in applications, possibly as approximations of intractable problems, are often structured and large scale. When the data are noisy, it is of interest to bound the solution error relative to the (unknown) solution of the original noiseless problem. Related to this is an error bound for the linear convergence analysis of firstorder gradient methods for solving these problems. Example applications include compressed sensing, variable selection in regression, TVregularized image denoising, and sensor network localization.
The State of Solving Large IncompleteInformation Games, and Application to Poker
, 2010
"... Gametheoretic solution concepts prescribe how rational parties should act, but to become operational the concepts need to be accompanied by algorithms. I will review the state of solving incompleteinformation games. They encompass many practical problems such as auctions, negotiations, and securi ..."
Abstract

Cited by 10 (3 self)
 Add to MetaCart
Gametheoretic solution concepts prescribe how rational parties should act, but to become operational the concepts need to be accompanied by algorithms. I will review the state of solving incompleteinformation games. They encompass many practical problems such as auctions, negotiations, and security applications. I will discuss them in the context of how they have transformed computer poker. In short, gametheoretic reasoning now scales to many large problems, outperforms the alternatives on those problems, and in some games beats the best humans.
Accelerating best response calculation in large extensive games
 In Proceedings of the TwentySecond International Joint Conference on Artificial Intelligence (IJCAI
, 2011
"... One fundamental evaluation criteria of an AI technique is its performance in the worstcase. For static strategies in extensive games, this can be computed using a best response computation. Conventionally, this requires a full game tree traversal. For very large games, such as poker, that traversal ..."
Abstract

Cited by 9 (6 self)
 Add to MetaCart
One fundamental evaluation criteria of an AI technique is its performance in the worstcase. For static strategies in extensive games, this can be computed using a best response computation. Conventionally, this requires a full game tree traversal. For very large games, such as poker, that traversal is infeasible to perform on modern hardware. In this paper, we detail a general technique for best response computations that can often avoid a full game tree traversal. Additionally, our method is specifically wellsuited for parallel environments. We apply this approach to computing the worstcase performance of a number of strategies in headsup limit Texas hold’em, which, prior to this work, was not possible. We explore these results thoroughly as they provide insight into the effects of abstraction on worstcase performance in large imperfect information games. This is a topic that has received much attention, but could not previously be examined outside of toy domains. 1
Strategy purification and thresholding: Effective nonequilibrium approaches for playing large games
 INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS (AAMAS
, 2012
"... There has been significant recent interest in computing effective strategies for playing large imperfectinformation games. Much prior work involves computing an approximate equilibrium strategy in a smaller abstract game, then playing this strategy in the full game (with the hope that it also well ..."
Abstract

Cited by 8 (3 self)
 Add to MetaCart
There has been significant recent interest in computing effective strategies for playing large imperfectinformation games. Much prior work involves computing an approximate equilibrium strategy in a smaller abstract game, then playing this strategy in the full game (with the hope that it also well approximates an equilibrium in the full game). In this paper, we present a family of modifications to this approach that work by constructing nonequilibrium strategies in the abstract game, which are then played in the full game. Our new procedures, called purification and thresholding, modify the action probabilities of an abstract equilibrium by preferring the higherprobability actions. Using a variety of domains, we show that these approaches lead to significantly stronger play than the standard equilibrium approach. As one example, our program that uses purification came in first place in the twoplayer nolimit Texas Hold’em total bankroll division of the 2010 Annual Computer Poker Competition. Surprisingly, we also show that purification significantly improves performance (against the full equilibrium strategy) in random 4 × 4 matrix games using random 3 × 3 abstractions. We present several additional results (both theoretical and empirical). Overall, one can view these approaches as ways of achieving robustness against overfitting one’s strategy to one’s lossy abstraction. Perhaps surprisingly, the performance gains do not necessarily come at the expense of worstcase exploitability.
Efficient Nash Equilibrium Approximation through Monte Carlo Counterfactual Regret Minimization
"... Recently, there has been considerable progress towards algorithms for approximating Nash equilibrium strategies in extensive games. One such algorithm, Counterfactual Regret Minimization (CFR), has proven to be effective in twoplayer zerosum poker domains. While the basic algorithm is iterative an ..."
Abstract

Cited by 7 (5 self)
 Add to MetaCart
Recently, there has been considerable progress towards algorithms for approximating Nash equilibrium strategies in extensive games. One such algorithm, Counterfactual Regret Minimization (CFR), has proven to be effective in twoplayer zerosum poker domains. While the basic algorithm is iterative and performs a full game traversal on each iteration, sampling based approaches are possible. For instance, chancesampled CFR considers just a single chance outcome per traversal, resulting in faster but less precise iterations. While more iterations are required, chancesampled CFR requires less time overall to converge. In this work, we present new sampling techniques that consider sets of chance outcomes during each traversal to produce slower, more accurate iterations. By sampling only the public chance outcomes seen by all players, we take advantage of the imperfect information structure of the game to (i) avoid recomputation of strategy probabilities, and (ii) achieve an algorithmic speed improvement, performing O(n 2) work at terminal nodes in O(n) time. We demonstrate that this new CFR update converges more quickly than chancesampled CFR in the large domains of poker and Bluff.
Finding Optimal Abstract Strategies in ExtensiveForm Games
"... Extensiveform games are a powerful model for representing interactions between agents. Nash equilibrium strategies are a common solution concept for extensiveform games and, in twoplayer zerosum games, there are efficient algorithms for calculating such strategies. In large games, this computati ..."
Abstract

Cited by 4 (4 self)
 Add to MetaCart
Extensiveform games are a powerful model for representing interactions between agents. Nash equilibrium strategies are a common solution concept for extensiveform games and, in twoplayer zerosum games, there are efficient algorithms for calculating such strategies. In large games, this computation may require too much memory and time to be tractable. A standard approach in such cases is to apply a lossy statespace abstraction technique to produce a smaller abstract game that game equilibrium is close to an equilibrium strategy in the unabstracted game. Recent work has shown that this assumption is unreliable, and an arbitrary Nash equilibrium in the abstract game is unlikely to be even near the least suboptimal strategy that can be represented in that space. In this work, we present for the first time an algorithm which efficiently finds optimal abstract strategies — strategies with minimal exploitability in the unabstracted game. We use this technique to find the least exploitable strategy ever reported for twoplayer limit Texas hold’em.
Algorithms for abstracting and solving imperfect information games
, 2007
"... Game theory is the mathematical study of rational behavior in strategic environments. In many settings, most notably twoperson zerosum games, game theory provides particularly strong and appealing solution concepts. Furthermore, these solutions are efficiently computable in the complexitytheory s ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
Game theory is the mathematical study of rational behavior in strategic environments. In many settings, most notably twoperson zerosum games, game theory provides particularly strong and appealing solution concepts. Furthermore, these solutions are efficiently computable in the complexitytheory sense. However, in most interesting potential applications in artificial intelligence, the solutions are difficult to compute using current techniques due primarily to the extremely large statespaces of the environments. In this thesis, we propose new algorithms for tackling these computational difficulties. In one stream of research, we introduce automated abstraction algorithms for sequential games of imperfect information. These algorithms take as input a description of a game and produce a description of a strategically similar, but smaller, game as output. We present algorithms that are lossless (i.e., equilibriumpreserving), as well as algorithms that are lossy, but which can yield much smaller games while still retaining the most important features of the original game. In a second stream of research, we develop specialized optimization algorithms for finding ɛequilibria in sequential games of imperfect information. The algorithms are based on recent advances in nonsmooth convex optimization (namely the excessive gap technique) and provide significant improvements
Tartanian5: A HeadsUp NoLimit Texas Hold’em PokerPlaying Program
, 2012
"... We present an overview of Tartanian5, a nolimit Texas Hold’em agent which we submitted to the 2012 Annual Computer Poker Competition. The agent plays a gametheoretic approximate Nash equilibrium strategy. First, it applies a potentialaware, perfectrecall, automated abstraction algorithm to group ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
We present an overview of Tartanian5, a nolimit Texas Hold’em agent which we submitted to the 2012 Annual Computer Poker Competition. The agent plays a gametheoretic approximate Nash equilibrium strategy. First, it applies a potentialaware, perfectrecall, automated abstraction algorithm to group similar game states together and construct a smaller game that is strategically similar to the full game. In order to maintain a tractable number of possible betting sequences, it employs a discretized betting model, where only a small number of bet sizes are allowed at each game state. The strategies for both players are then computed using an improved version of Nesterov’s excessive gap technique specialized for poker. To mitigate the effect of overfitting, we employ an expost purification procedure to remove actions that are played with small probability. One final feature of our agent is a novel algorithm for interpreting bet sizes of the opponent that fall outside our model. We describe our new approach in detail, and present theoretical and empirical advantages over prior approaches. Finally, we briefly describe ongoing research in our group involving realtime computation and opponent exploitation, which will hopefully be incorporated into our agents in future years.
Efficient Monte Carlo Counterfactual Regret Minimization in Games with Many Player Actions
"... Counterfactual Regret Minimization (CFR) is a popular, iterative algorithm for computing strategies in extensiveform games. The Monte Carlo CFR (MCCFR) variants reduce the per iteration time cost of CFR by traversing a smaller, sampled portion of the tree. The previous most effective instances of M ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Counterfactual Regret Minimization (CFR) is a popular, iterative algorithm for computing strategies in extensiveform games. The Monte Carlo CFR (MCCFR) variants reduce the per iteration time cost of CFR by traversing a smaller, sampled portion of the tree. The previous most effective instances of MCCFR can still be very slow in games with many player actions since they sample every action for a given player. In this paper, we present a new MCCFR algorithm, Average Strategy Sampling (AS), that samples a subset of the player’s actions according to the player’s average strategy. Our new algorithm is inspired by a new, tighter bound on the number of iterations required by CFR to converge to a given solution quality. In addition, we prove a similar, tighter bound for AS and other popular MCCFR variants. Finally, we validate our work by demonstrating that AS converges faster than previous MCCFR algorithms in both nolimit poker and Bluff. 1
Safe Opponent Exploitation
"... We consider the problem of playing a finitelyrepeated twoplayer zerosum game safely—that is, guaranteeing at least the value of the game per period in expectation regardless of the strategy used by the opponent. Playing a stagegame equilibrium strategy at each time step clearly guarantees safety ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
We consider the problem of playing a finitelyrepeated twoplayer zerosum game safely—that is, guaranteeing at least the value of the game per period in expectation regardless of the strategy used by the opponent. Playing a stagegame equilibrium strategy at each time step clearly guarantees safety, and prior work has conjectured that it is impossible to simultaneously deviate from a stagegame equilibrium (in hope of exploiting a suboptimal opponent) and to guarantee safety. We show that such profitable deviations are indeed possible—specifically, in games where certain types of ‘gift ’ strategies exist, which we define formally. We show that the set of strategies constituting such gifts can be strictly larger than the set of iteratively weaklydominated strategies; this disproves another recent conjecture which states that all noniterativelyweaklydominated strategies are best responses to each equilibrium strategy of the other player. We present a full characterization of safe strategies, and develop efficient algorithms for exploiting suboptimal opponents while guaranteeing safety. We also provide analogous results for sequential perfect and imperfectinformation games, and present safe exploitation algorithms and full characterizations of safe strategies for those settings as well. We present experimental results in Kuhn poker, a canonical test problem for gametheoretic algorithms. Our experiments show that 1) aggressive safe exploitation strategies significantly outperform adjusting the exploitation within equilibrium strategies and 2) all the safe exploitation strategies significantly outperform a (nonsafe) best response strategy against strong dynamic opponents.