Results 1  10
of
39
Approximation Accuracy, Gradient Methods, and Error Bound for Structured Convex Optimization
, 2009
"... Convex optimization problems arising in applications, possibly as approximations of intractable problems, are often structured and large scale. When the data are noisy, it is of interest to bound the solution error relative to the (unknown) solution of the original noiseless problem. Related to this ..."
Abstract

Cited by 34 (1 self)
 Add to MetaCart
(Show Context)
Convex optimization problems arising in applications, possibly as approximations of intractable problems, are often structured and large scale. When the data are noisy, it is of interest to bound the solution error relative to the (unknown) solution of the original noiseless problem. Related to this is an error bound for the linear convergence analysis of firstorder gradient methods for solving these problems. Example applications include compressed sensing, variable selection in regression, TVregularized image denoising, and sensor network localization.
The State of Solving Large IncompleteInformation Games, and Application to Poker
, 2010
"... Gametheoretic solution concepts prescribe how rational parties should act, but to become operational the concepts need to be accompanied by algorithms. I will review the state of solving incompleteinformation games. They encompass many practical problems such as auctions, negotiations, and securi ..."
Abstract

Cited by 26 (8 self)
 Add to MetaCart
Gametheoretic solution concepts prescribe how rational parties should act, but to become operational the concepts need to be accompanied by algorithms. I will review the state of solving incompleteinformation games. They encompass many practical problems such as auctions, negotiations, and security applications. I will discuss them in the context of how they have transformed computer poker. In short, gametheoretic reasoning now scales to many large problems, outperforms the alternatives on those problems, and in some games beats the best humans.
Accelerating best response calculation in large extensive games
 In Proceedings of the TwentySecond International Joint Conference on Artificial Intelligence (IJCAI
, 2011
"... One fundamental evaluation criteria of an AI technique is its performance in the worstcase. For static strategies in extensive games, this can be computed using a best response computation. Conventionally, this requires a full game tree traversal. For very large games, such as poker, that traversal ..."
Abstract

Cited by 19 (10 self)
 Add to MetaCart
One fundamental evaluation criteria of an AI technique is its performance in the worstcase. For static strategies in extensive games, this can be computed using a best response computation. Conventionally, this requires a full game tree traversal. For very large games, such as poker, that traversal is infeasible to perform on modern hardware. In this paper, we detail a general technique for best response computations that can often avoid a full game tree traversal. Additionally, our method is specifically wellsuited for parallel environments. We apply this approach to computing the worstcase performance of a number of strategies in headsup limit Texas hold’em, which, prior to this work, was not possible. We explore these results thoroughly as they provide insight into the effects of abstraction on worstcase performance in large imperfect information games. This is a topic that has received much attention, but could not previously be examined outside of toy domains. 1
Evaluating StateSpace Abstractions in Extensiveform Games
, 2013
"... Efficient algorithms exist for finding optimal policies in extensiveform games. However, humanscale problems are typically so large that this computation remains infeasible with modern computing resources. Statespace abstraction techniques allow for the derivation of a smaller and strategically s ..."
Abstract

Cited by 16 (4 self)
 Add to MetaCart
Efficient algorithms exist for finding optimal policies in extensiveform games. However, humanscale problems are typically so large that this computation remains infeasible with modern computing resources. Statespace abstraction techniques allow for the derivation of a smaller and strategically similar abstract domain, in which an optimal strategy can be computed and then used as a suboptimal strategy in the real domain. In this paper, we consider the task of evaluating the quality of an abstraction, independent of a specific abstract strategy. In particular, we use a recent metric for abstraction quality and examine imperfect recall abstractions, in which agents “forget ” previously observed information to focus the abstraction effort on more recent and relevant state information. We present experimental results in the domain of Texas hold’em poker that validate the use of distributionaware abstractions over expectationbased approaches, demonstrate that the new metric better predicts tournament performance, and show that abstractions built using imperfect recall outperform those built using perfect recall in terms of both exploitability and oneonone play.
Efficient Nash Equilibrium Approximation through Monte Carlo Counterfactual Regret Minimization
"... Recently, there has been considerable progress towards algorithms for approximating Nash equilibrium strategies in extensive games. One such algorithm, Counterfactual Regret Minimization (CFR), has proven to be effective in twoplayer zerosum poker domains. While the basic algorithm is iterative an ..."
Abstract

Cited by 14 (9 self)
 Add to MetaCart
Recently, there has been considerable progress towards algorithms for approximating Nash equilibrium strategies in extensive games. One such algorithm, Counterfactual Regret Minimization (CFR), has proven to be effective in twoplayer zerosum poker domains. While the basic algorithm is iterative and performs a full game traversal on each iteration, sampling based approaches are possible. For instance, chancesampled CFR considers just a single chance outcome per traversal, resulting in faster but less precise iterations. While more iterations are required, chancesampled CFR requires less time overall to converge. In this work, we present new sampling techniques that consider sets of chance outcomes during each traversal to produce slower, more accurate iterations. By sampling only the public chance outcomes seen by all players, we take advantage of the imperfect information structure of the game to (i) avoid recomputation of strategy probabilities, and (ii) achieve an algorithmic speed improvement, performing O(n 2) work at terminal nodes in O(n) time. We demonstrate that this new CFR update converges more quickly than chancesampled CFR in the large domains of poker and Bluff.
Strategy purification and thresholding: Effective nonequilibrium approaches for playing large games
 INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS (AAMAS
, 2012
"... There has been significant recent interest in computing effective strategies for playing large imperfectinformation games. Much prior work involves computing an approximate equilibrium strategy in a smaller abstract game, then playing this strategy in the full game (with the hope that it also well ..."
Abstract

Cited by 13 (6 self)
 Add to MetaCart
There has been significant recent interest in computing effective strategies for playing large imperfectinformation games. Much prior work involves computing an approximate equilibrium strategy in a smaller abstract game, then playing this strategy in the full game (with the hope that it also well approximates an equilibrium in the full game). In this paper, we present a family of modifications to this approach that work by constructing nonequilibrium strategies in the abstract game, which are then played in the full game. Our new procedures, called purification and thresholding, modify the action probabilities of an abstract equilibrium by preferring the higherprobability actions. Using a variety of domains, we show that these approaches lead to significantly stronger play than the standard equilibrium approach. As one example, our program that uses purification came in first place in the twoplayer nolimit Texas Hold’em total bankroll division of the 2010 Annual Computer Poker Competition. Surprisingly, we also show that purification significantly improves performance (against the full equilibrium strategy) in random 4 × 4 matrix games using random 3 × 3 abstractions. We present several additional results (both theoretical and empirical). Overall, one can view these approaches as ways of achieving robustness against overfitting one’s strategy to one’s lossy abstraction. Perhaps surprisingly, the performance gains do not necessarily come at the expense of worstcase exploitability.
Finding Optimal Abstract Strategies in ExtensiveForm Games
"... Extensiveform games are a powerful model for representing interactions between agents. Nash equilibrium strategies are a common solution concept for extensiveform games and, in twoplayer zerosum games, there are efficient algorithms for calculating such strategies. In large games, this computati ..."
Abstract

Cited by 12 (7 self)
 Add to MetaCart
(Show Context)
Extensiveform games are a powerful model for representing interactions between agents. Nash equilibrium strategies are a common solution concept for extensiveform games and, in twoplayer zerosum games, there are efficient algorithms for calculating such strategies. In large games, this computation may require too much memory and time to be tractable. A standard approach in such cases is to apply a lossy statespace abstraction technique to produce a smaller abstract game that game equilibrium is close to an equilibrium strategy in the unabstracted game. Recent work has shown that this assumption is unreliable, and an arbitrary Nash equilibrium in the abstract game is unlikely to be even near the least suboptimal strategy that can be represented in that space. In this work, we present for the first time an algorithm which efficiently finds optimal abstract strategies — strategies with minimal exploitability in the unabstracted game. We use this technique to find the least exploitable strategy ever reported for twoplayer limit Texas hold’em.
Computing approximate nash equilibria and robust bestresponses using sampling
 J. Artif. Intell. Res. (JAIR
"... This article discusses two contributions to decisionmaking in complex partially observable stochastic games. First, we apply two stateoftheart search techniques that use MonteCarlo sampling to the task of approximating a NashEquilibrium (NE) in such games, namely MonteCarlo Tree Search (MCTS) ..."
Abstract

Cited by 9 (3 self)
 Add to MetaCart
This article discusses two contributions to decisionmaking in complex partially observable stochastic games. First, we apply two stateoftheart search techniques that use MonteCarlo sampling to the task of approximating a NashEquilibrium (NE) in such games, namely MonteCarlo Tree Search (MCTS) and MonteCarlo Counterfactual Regret Minimization (MCCFR). MCTS has been proven to approximate a NE in perfectinformation games. We show that the algorithm quickly finds a reasonably strong strategy (but not a NE) in a complex imperfect information game, i.e. Poker. MCCFR on the other hand has theoretical NE convergence guarantees in such a game. We apply MCCFR for the first time in Poker. Based on our experiments, we may conclude that MCTS is a valid approach if one wants to learn reasonably strong strategies fast, whereas MCCFR is the better choice if the quality of the strategy is most important. Our second contribution relates to the observation that a NE is not a best response against players that are not playing a NE. We present MonteCarlo Restricted Nash Response (MCRNR), a samplebased algorithm for the computation of restricted Nash strategies. These are robust bestresponse strategies that (1) exploit nonNE opponents more than playing a NE and (2) are not (overly) exploitable by other strategies. We combine the advantages of two stateoftheart algorithms, i.e. MCCFR and Restricted Nash Response (RNR). MCRNR samples only relevant parts of the game tree. We show that MCRNR learns quicker than standard RNR in smaller games. Also we show in Poker that MCRNR learns robust bestresponse strategies fast, and that these strategies exploit opponents more than playing a NE does. 1.
Regret transfer and parameter optimization
 In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI
"... Regret matching is a widelyused algorithm for learning how to act. We begin by proving that regrets on actions in one setting (game) can be transferred to warm start the regrets for solving a different setting with same structure but different payoffs that can be written as a function of parameter ..."
Abstract

Cited by 9 (3 self)
 Add to MetaCart
Regret matching is a widelyused algorithm for learning how to act. We begin by proving that regrets on actions in one setting (game) can be transferred to warm start the regrets for solving a different setting with same structure but different payoffs that can be written as a function of parameters. We prove how this can be done by carefully discounting the prior regrets. This provides, to our knowledge, the first principled warmstarting method for noregret learning. It also extends to warmstarting the widelyadopted counterfactual regret minimization (CFR) algorithm for large incompleteinformation games; we show this experimentally as well. We then study optimizing a parameter vector for a player in a twoplayer zerosum game (e.g., optimizing bet sizes to use in poker). We propose a custom gradient descent algorithm that provably finds a locally optimal parameter vector while leveraging our warmstart theory to significantly save regretmatching iterations at each step. It optimizes the parameter vector while simultaneously finding an equilibrium. We present experiments in nolimit Leduc Hold’em and nolimit Texas Hold’em to optimize bet sizing. This amounts to the first action abstraction algorithm (algorithm for selecting a small number of discrete actions to use from a continuum of actions—a key preprocessing step for solving large games using current equilibriumfinding algorithms) with convergence guarantees for extensiveform games.
Action Translation in ExtensiveForm Games with Large Action Spaces: Axioms, Paradoxes, and the PseudoHarmonic Mapping
"... When solving extensiveform games with large action spaces, typically significant abstraction is needed to make the problem manageable from a modeling or computational perspective. When this occurs, a procedure is needed to interpret actions of the opponent that fall outside of our abstraction (by m ..."
Abstract

Cited by 8 (7 self)
 Add to MetaCart
When solving extensiveform games with large action spaces, typically significant abstraction is needed to make the problem manageable from a modeling or computational perspective. When this occurs, a procedure is needed to interpret actions of the opponent that fall outside of our abstraction (by mapping them to actions in our abstraction). This is called an action translation mapping. Prior action translation mappings have been based on heuristics without theoretical justification. We show that the prior mappings are highly exploitable and that most of them violate certain natural desiderata. We present a new mapping that satisfies these desiderata and has significantly lower exploitability than the prior mappings. Furthermore, we observe that the cost of this worstcase performance benefit (low exploitability) is not high in practice; our mapping performs competitively with the prior mappings against nolimit Texas Hold’em agents submitted to the 2012 Annual Computer Poker Competition. We also observe several paradoxes that can arise when performing action abstraction and translation; for example, we show that it is possible to improve performance by including suboptimal actions in our abstraction and excluding optimal actions.