Results 1 -
7 of
7
Better automated abstraction techniques for imperfect information games, with application to Texas Hold’em poker
- In International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS
, 2007
"... We present new approximation methods for computing gametheoretic strategies for sequential games of imperfect information. At a high level, we contribute two new ideas. First, we introduce a new state-space abstraction algorithm. In each round of the game, there is a limit to the number of strategic ..."
Abstract
-
Cited by 17 (7 self)
- Add to MetaCart
We present new approximation methods for computing gametheoretic strategies for sequential games of imperfect information. At a high level, we contribute two new ideas. First, we introduce a new state-space abstraction algorithm. In each round of the game, there is a limit to the number of strategically different situations that an equilibrium-finding algorithm can handle. Given this constraint, we use clustering to discover similar positions, and we compute the abstraction via an integer program that minimizes the expected error at each stage of the game. Second, we present a method for computing the leaf payoffs for a truncated version of the game by simulating the actions in the remaining portion of the game. This allows the equilibrium-finding algorithm to take into account the entire game tree while having to explicitly solve only a truncated version. Experiments show that each of our two new techniques improves performance dramatically in Texas Hold’em poker. The techniques lead to a drastic improvement over prior approaches for automatically generating agents, and our agent plays competitively even against the best agents overall.
Particle Filtering for Dynamic Agent Modelling in Simplified Poker
, 2007
"... Agent modelling is a challenging problem in many modern artificial intelligence applications. The agent modelling task is especially difficult when handling stochastic choices, deliberately hidden information, dynamic agents, and the need for fast learning. State estimation techniques, such as Kalma ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Agent modelling is a challenging problem in many modern artificial intelligence applications. The agent modelling task is especially difficult when handling stochastic choices, deliberately hidden information, dynamic agents, and the need for fast learning. State estimation techniques, such as Kalman filtering and particle filtering, have addressed many of these challenges, but have received little attention in the agent modelling literature. This paper looks at the use of particle filtering for modelling a dynamic opponent in Kuhn poker, a simplified version of Texas Hold’em poker. We demonstrate effective modelling both against static opponents as well as dynamic opponents, when the dynamics are known. We then examine an application of Rao-Blackwellized particle filtering for doing dual estimation, inferring both the opponent’s state as well as a model of its dynamics. Finally, we examine the robustness of the approach to incorrect beliefs about the opponent and compare it to previous work on opponent modelling in Kuhn poker.
2005, ‘Reduced-Variance Payoff Estimation in Adversarial Bandit Problems
- In: Proceedings of the ECML’05 workshop on Reinforcement Learning in Non-Stationary Environments. (in print
, 2005
"... Abstract. A natural way to compare learning methods in nonstationary environments is to compare their regret. In this paper we consider the regret of algorithms in adversarial multi-armed bandit problems. We propose several methods to improve the performance of the baseline exponentially weighted av ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Abstract. A natural way to compare learning methods in nonstationary environments is to compare their regret. In this paper we consider the regret of algorithms in adversarial multi-armed bandit problems. We propose several methods to improve the performance of the baseline exponentially weighted average forecaster by changing the payoff-estimation methods. We argue that improved performance can be achieved by constructing payoff estimation methods that produce estimates with low variance. Our arguments are backed up by both theoretical and empirical results. In fact, our empirical results show that significant performance gains are possible over the baseline algorithm. 1
Reinforcement Learning via AIXI Approximation
"... This paper introduces a principled approach for the design of a scalable general reinforcement learning agent. This approach is based on a direct approximation of AIXI, a Bayesian optimality notion for general reinforcement learning agents. Previously, it has been unclear whether the theory of AIXI ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
This paper introduces a principled approach for the design of a scalable general reinforcement learning agent. This approach is based on a direct approximation of AIXI, a Bayesian optimality notion for general reinforcement learning agents. Previously, it has been unclear whether the theory of AIXI could motivate the design of practical algorithms. We answer this hitherto open question in the affirmative, by providing the first computationally feasible approximation to the AIXI agent. To develop our approximation, we introduce a Monte Carlo Tree Search algorithm along with an agentspecific extension of the Context Tree Weighting algorithm. Empirically, we present a set of encouraging results on a number of stochastic, unknown, and partially observable domains. 1.
Evolving Opponent Models for Texas Hold ’Em
"... Abstract—Opponent models allow software agents to assess a multi-agent environment more accurately and therefore improve the agent’s performance. This paper makes use of coarse approximations to game-theoretic player representations to improve the performance of software players in Limit Texas Hold ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract—Opponent models allow software agents to assess a multi-agent environment more accurately and therefore improve the agent’s performance. This paper makes use of coarse approximations to game-theoretic player representations to improve the performance of software players in Limit Texas Hold ’Em poker. A 10-parameter model, intended to model a combination, or mixture, of various strategies is developed to represent the opponent. A ‘mixture identifier ’ is then evolved using the NEAT neuroevolution method to estimate values of these parameters for arbitrary opponents. To evaluate this approach, two poker players, represented as neural networks, were evolved under the same conditions, one with the mixture identifier, and one without. The player trained with access to the identifier achieved consistently higher and more stable fitness during evolution compared with the player without the identifier. Further, the player with the identifier outplays the other in a heads-up match after training, winning on average 60 % of the money at the table. These results demonstrate that opponent modeling is effective even with low-dimensional models and conveys an advantage to players trained to use these models. I.
Safe Opponent Exploitation
"... We consider the problem of playing a finitely-repeated two-player zero-sum game safely—that is, guaranteeing at least the value of the game per period in expectation regardless of the strategy used by the opponent. Playing a stage-game equilibrium strategy at each time step clearly guarantees safety ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
We consider the problem of playing a finitely-repeated two-player zero-sum game safely—that is, guaranteeing at least the value of the game per period in expectation regardless of the strategy used by the opponent. Playing a stage-game equilibrium strategy at each time step clearly guarantees safety, and prior work has conjectured that it is impossible to simultaneously deviate from a stage-game equilibrium (in hope of exploiting a suboptimal opponent) and to guarantee safety. We show that such profitable deviations are indeed possible—specifically, in games where certain types of ‘gift ’ strategies exist, which we define formally. We show that the set of strategies constituting such gifts can be strictly larger than the set of iteratively weakly-dominated strategies; this disproves another recent conjecture which states that all non-iterativelyweakly-dominated strategies are best responses to each equilibrium strategy of the other player. We present a full characterization of safe strategies, and develop efficient algorithms for exploiting suboptimal opponents while guaranteeing safety. We also provide analogous results for sequential perfect and imperfectinformation games, and present safe exploitation algorithms and full characterizations of safe strategies for those settings as well. We present experimental results in Kuhn poker, a canonical test problem for game-theoretic algorithms. Our experiments show that 1) aggressive safe exploitation strategies significantly outperform adjusting the exploitation within equilibrium strategies and 2) all the safe exploitation strategies significantly outperform a (non-safe) best response strategy against strong dynamic opponents.
A competitive Texas Hold'em . . .
, 2006
"... We present our game theory-based heads-up Texas Hold’em poker player. To overcome the computational obstacles stemming from Texas Hold’em’s gigantic game tree, our player employs automated abstraction techniques to reduce the complexity of the strategy computations. In addition to this state-space a ..."
Abstract
- Add to MetaCart
We present our game theory-based heads-up Texas Hold’em poker player. To overcome the computational obstacles stemming from Texas Hold’em’s gigantic game tree, our player employs automated abstraction techniques to reduce the complexity of the strategy computations. In addition to this state-space abstraction, our player uses roundbased abstraction in conjunction with both offline and real-time equilibrium approximation. Texas Hold’em consists of four betting rounds. Our player solves a large linear program (offline) to compute strategies for the abstracted first and second rounds. After the second betting round, our player updates the probability of each possible hand based on the observed betting actions in the first two rounds as well as the revealed cards. Using these updated probabilities, our player computes in real-time an equilibrium approximation for the last two abstracted rounds. We demonstrate that our player, which does not directly incorporate any poker-specific expert knowledge, is competitive with leading poker-playing programs which do incorporate such domain-specific knowledge, as well as with advanced human players.

