The Folk Theorem in Repeated Games with Discounting or with Incomplete Information
 Econometrica
, 1986
A Simple Adaptive Procedure Leading to Correlated Equilibrium
 Econometrica, September
"... We propose a new and simple adaptive procedure for playing a game: ‘‘regretmatching.’’ In this procedure, players may depart from their current play with probabilities that are proportional to measures of regret for not having used other strategies in the past. It is shown that our adaptive procedu ..."
We propose a new and simple adaptive procedure for playing a game: ‘‘regretmatching.’’ In this procedure, players may depart from their current play with probabilities that are proportional to measures of regret for not having used other strategies in the past. It is shown that our adaptive procedure guarantees that, with probability one, the empirical distributions of play converge to the set of correlated equilibria of the game.
Calibrated Learning and Correlated Equilibrium
 Games and Economic Behavior
, 1996
"... Suppose two players meet each other in a repeated game where: 1. each uses a learning rule with the property that it is a calibrated forecast of the others plays, and 2. each plays a best response to this forecast distribution. ..."
Suppose two players meet each other in a repeated game where: 1. each uses a learning rule with the property that it is a calibrated forecast of the others plays, and 2. each plays a best response to this forecast distribution.
Shopbots and Pricebots
, 1999
"... Shopbots are agents that automatically search the Internet to obtain information about prices and other attributes of goods and services. They herald a future in which autonomous agents profoundly influence electronic markets. In this study, a simple economic model is proposed and analyzed, which is ..."
Shopbots are agents that automatically search the Internet to obtain information about prices and other attributes of goods and services. They herald a future in which autonomous agents profoundly influence electronic markets. In this study, a simple economic model is proposed and analyzed, which is intended to quantify some of the likely impacts of a proliferation of shopbots and other economicallymotivated software agents. In addition, this paper reports on simulations of pricebots  adaptive, pricesetting agents which firms may well implement to combat, or even take advantage of, the growing community of shopbots. This study forms part of a larger research program that aims to provide insights into the impact of agent technology on the nascent information economy.
AWESOME: A general multiagent learning algorithm that converges in selfplay and learns a best response against stationary opponents
, 2003
"... A satisfactory multiagent learning algorithm should, at a minimum, learn to play optimally against stationary opponents and converge to a Nash equilibrium in selfplay. The algorithm that has come closest, WoLFIGA, has been proven to have these two properties in 2player 2action repeated games— as ..."
A satisfactory multiagent learning algorithm should, at a minimum, learn to play optimally against stationary opponents and converge to a Nash equilibrium in selfplay. The algorithm that has come closest, WoLFIGA, has been proven to have these two properties in 2player 2action repeated games— assuming that the opponent’s (mixed) strategy is observable. In this paper we present AWESOME, the first algorithm that is guaranteed to have these two properties in all repeated (finite) games. It requires only that the other players ’ actual actions (not their strategies) can be observed at each step. It also learns to play optimally against opponents that eventually become stationary. The basic idea behind AWESOME (Adapt When Everybody is Stationary, Otherwise Move to Equilibrium) is to try to adapt to the others’ strategies when they appear stationary, but otherwise to retreat to a precomputed equilibrium strategy. The techniques used to prove the properties of AWESOME are fundamentally different from those used for previous algorithms, and may help in analyzing other multiagent learning algorithms also.
A cryptographic solution to a game theoretic problem
 In CRYPTO 2000: 20th International Cryptology Conference
, 2000
"... Abstract. In this work we use cryptography to solve a gametheoretic problem which arises naturally in the area of two party strategic games. The standard gametheoretic solution concept for such games is that of an equilibrium, which is a pair of “selfenforcing ” strategies making each player’s st ..."
Abstract. In this work we use cryptography to solve a gametheoretic problem which arises naturally in the area of two party strategic games. The standard gametheoretic solution concept for such games is that of an equilibrium, which is a pair of “selfenforcing ” strategies making each player’s strategy an optimal response to the other player’s strategy. It is known that for many games the expected equilibrium payoffs can be much higher when a trusted third party (a “mediator”) assists the players in choosing their moves (correlated equilibria), than when each player has to choose its move on its own (Nash equilibria). It is natural to ask whether there exists a mechanism that eliminates the need for the mediator yet allows the players to maintain the high payoffs offered by mediatorassisted strategies. We answer this question affirmatively provided the players are computationally bounded and can have free communication (socalled “cheap talk”) prior to playing the game. The main building block of our solution is an efficient cryptographic protocol to the following Correlated Element Selection problem, which is of independent interest. Both Alice and Bob know a list of pairs (a1, b1)... (an, bn) (possibly with repetitions), and they want to pick a random index i such that Alice learns only ai and Bob learns only bi. Our solution to this problem has constant number of rounds, negligible error probability, and uses only very simple zeroknowledge proofs. We then show how to incorporate our cryptographic protocol back into a gametheoretic setting, which highlights some interesting parallels between cryptographic protocols and extensive form games. 1
Bounded Policy Iteration for Decentralized POMDPs
 In Proceedings of the Nineteenth International Joint Conference on Artificial Intelligence
, 2005
"... We present a bounded policy iteration algorithm for infinitehorizon decentralized POMDPs. Policies are represented as joint stochastic finitestate controllers, which consist of a local controller for each agent. We also let a joint controller include a correlation device that allows the agents to ..."
We present a bounded policy iteration algorithm for infinitehorizon decentralized POMDPs. Policies are represented as joint stochastic finitestate controllers, which consist of a local controller for each agent. We also let a joint controller include a correlation device that allows the agents to correlate their behavior without exchanging information during execution, and show that this leads to improved performance. The algorithm uses a fixed amount of memory, and each iteration is guaranteed to produce a controller with value at least as high as the previous one for all possible initial state distributions. For the case of a single agent, the algorithm reduces to Poupart and Boutilier’s bounded policy iteration for POMDPs. 1
Intrinsic Robustness of the Price of Anarchy
"... The price of anarchy (POA) is a worstcase measure of the inefficiency of selfish behavior, defined as the ratio of the objective function value of a worst Nash equilibrium of a game and that of an optimal outcome. This measure implicitly assumes that players successfully reach some Nash equilibrium ..."
The price of anarchy (POA) is a worstcase measure of the inefficiency of selfish behavior, defined as the ratio of the objective function value of a worst Nash equilibrium of a game and that of an optimal outcome. This measure implicitly assumes that players successfully reach some Nash equilibrium. This drawback motivates the search for inefficiency bounds that apply more generally to weaker notions of equilibria, such as mixed Nash and correlated equilibria; or to sequences of outcomes generated by natural experimentation strategies, such as successive best responses or simultaneous regretminimization. We prove a general and fundamental connection between the price of anarchy and its seemingly stronger relatives in classes of games with a sum objective. First, we identify a “canonical sufficient condition ” for an upper bound of the POA for pure Nash equilibria, which we call a smoothness argument. Second, we show that every bound derived via a smoothness argument extends automatically, with no quantitative degradation in the bound, to mixed Nash equilibria, correlated equilibria, and the average objective function value of regretminimizing players (or “price of total anarchy”). Smoothness arguments also have automatic implications for the inefficiency of approximate and BayesianNash equilibria and, under mild additional assumptions, for bicriteria bounds and for polynomiallength bestresponse sequences. We also identify classes of games — most notably, congestion games with cost functions restricted to an arbitrary fixed set — that are tight, in the sense that smoothness arguments are guaranteed to produce an optimal worstcase upper bound on the POA, even for the smallest set of interest (pure Nash equilibria). Byproducts of our proof of this result include the first tight bounds on the POA in congestion games with nonpolynomial cost functions, and the first