Results 1  10
of
115
Predicting How People Play Games: Reinforcement Learning . . .
 AMERICAN ECONOMIC REVIEW
, 1998
"... ..."
Learning in ExtensiveForm Games: Experimental Data and Simple Dynamic Models in the Intermediate Term
 GAMES AND ECONOMIC BEHAVIOR 8, 164212 (1995)
, 1995
"... We use simple learning models to track the behavior observed in experiments concerning three extensive form games with similar perfect equilibria. In only two of the games does observed behavior approach the perfect equilibrium as players gain experience. We examine a family of learning models which ..."
Abstract

Cited by 223 (12 self)
 Add to MetaCart
We use simple learning models to track the behavior observed in experiments concerning three extensive form games with similar perfect equilibria. In only two of the games does observed behavior approach the perfect equilibrium as players gain experience. We examine a family of learning models which possess some of the robust properties of learning noted in the psychology literature. The intermediate term predictions of these models track well the observed behavior in all three games, even though the models considered differ in their very long term predictions. We argue that for predicting observed behavior the intermediate term predictions of dynamic learning models may be even more important than their asymptotic properties.
A Simple Adaptive Procedure Leading to Correlated Equilibrium
 Econometrica, September
"... We propose a new and simple adaptive procedure for playing a game: ‘‘regretmatching.’’ In this procedure, players may depart from their current play with probabilities that are proportional to measures of regret for not having used other strategies in the past. It is shown that our adaptive procedu ..."
Abstract

Cited by 218 (14 self)
 Add to MetaCart
We propose a new and simple adaptive procedure for playing a game: ‘‘regretmatching.’’ In this procedure, players may depart from their current play with probabilities that are proportional to measures of regret for not having used other strategies in the past. It is shown that our adaptive procedure guarantees that, with probability one, the empirical distributions of play converge to the set of correlated equilibria of the game.
Rational Learning Leads to Nash Equilibrium
 Econometrica
, 1993
"... Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at ..."
Abstract

Cited by 214 (13 self)
 Add to MetaCart
Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at
Rationalizability, learning, and equilibrium in games with strategic complementarities. Econometrica
 Journal of the Econometric Society
, 1990
"... Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at ..."
Abstract

Cited by 161 (0 self)
 Add to MetaCart
Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at
Calibrated Learning and Correlated Equilibrium
 Games and Economic Behavior
, 1996
"... Suppose two players meet each other in a repeated game where: 1. each uses a learning rule with the property that it is a calibrated forecast of the others plays, and 2. each plays a best response to this forecast distribution. ..."
Abstract

Cited by 86 (5 self)
 Add to MetaCart
Suppose two players meet each other in a repeated game where: 1. each uses a learning rule with the property that it is a calibrated forecast of the others plays, and 2. each plays a best response to this forecast distribution.
AWESOME: A general multiagent learning algorithm that converges in selfplay and learns a best response against stationary opponents
, 2003
"... A satisfactory multiagent learning algorithm should, at a minimum, learn to play optimally against stationary opponents and converge to a Nash equilibrium in selfplay. The algorithm that has come closest, WoLFIGA, has been proven to have these two properties in 2player 2action repeated games— as ..."
Abstract

Cited by 80 (5 self)
 Add to MetaCart
A satisfactory multiagent learning algorithm should, at a minimum, learn to play optimally against stationary opponents and converge to a Nash equilibrium in selfplay. The algorithm that has come closest, WoLFIGA, has been proven to have these two properties in 2player 2action repeated games— assuming that the opponent’s (mixed) strategy is observable. In this paper we present AWESOME, the first algorithm that is guaranteed to have these two properties in all repeated (finite) games. It requires only that the other players ’ actual actions (not their strategies) can be observed at each step. It also learns to play optimally against opponents that eventually become stationary. The basic idea behind AWESOME (Adapt When Everybody is Stationary, Otherwise Move to Equilibrium) is to try to adapt to the others’ strategies when they appear stationary, but otherwise to retreat to a precomputed equilibrium strategy. The techniques used to prove the properties of AWESOME are fundamentally different from those used for previous algorithms, and may help in analyzing other multiagent learning algorithms also.
Correlated Qlearning
 In Proceedings of the Twentieth International Conference on Machine Learning
, 2003
"... There have been several attempts to design multiagent Qlearning algorithms capable of learning equilibrium policies in generalsum Markov games, just as Qlearning learns optimal policies in Markov decision processes. We introduce correlated Qlearning, one such algorithm based on the correlated eq ..."
Abstract

Cited by 56 (2 self)
 Add to MetaCart
There have been several attempts to design multiagent Qlearning algorithms capable of learning equilibrium policies in generalsum Markov games, just as Qlearning learns optimal policies in Markov decision processes. We introduce correlated Qlearning, one such algorithm based on the correlated equilibrium solution concept. Motivated by a fixed point proof of the existence of stationary correlated equilibrium policies in Markov games, we present a generic multiagent Qlearning algorithm of which many popular algorithms are immediate special cases. We also prove that certain variants of correlated (and Nash) Qlearning are guaranteed to converge to stationary correlated (and Nash) equilibrium policies in two special classes of Markov games, namely zerosum and commoninterest. Finally, we show empirically that correlated Qlearning outperforms Nash Qlearning, further justifying the former beyond noting that it is less computationally expensive than the latter.
The multiplicative weights update method: a meta algorithm and applications
, 2005
"... Algorithms in varied fields use the idea of maintaining a distribution over a certain set and use the multiplicative update rule to iteratively change these weights. Their analysis are usually very similar and rely on an exponential potential function. We present a simple meta algorithm that unifies ..."
Abstract

Cited by 53 (10 self)
 Add to MetaCart
Algorithms in varied fields use the idea of maintaining a distribution over a certain set and use the multiplicative update rule to iteratively change these weights. Their analysis are usually very similar and rely on an exponential potential function. We present a simple meta algorithm that unifies these disparate algorithms and drives them as simple instantiations of the meta algorithm. 1
On the Global Convergence of Stochastic Fictitious Play
 ECONOMETRICA
, 2002
"... We establish global convergence results for stochastic fictitious play for four classes of games: games with an interior ESS, zero sum games, potential games, and supermodular games. We do so by appealing to techniques from stochastic approximation theory, which relate the limit behavior of a stocha ..."
Abstract

Cited by 51 (10 self)
 Add to MetaCart
We establish global convergence results for stochastic fictitious play for four classes of games: games with an interior ESS, zero sum games, potential games, and supermodular games. We do so by appealing to techniques from stochastic approximation theory, which relate the limit behavior of a stochastic process to the limit behavior of a differential equation defined by the expected motion of the process. The key result in our analysis of supermodular games is that the relevant differential equation defines a strongly monotone dynamical system. Our analyses of the other cases combine Lyapunov function arguments with a discrete choice theory result: that the choice probabilities generated by any additive random utility model can be derived from a deterministic model based on payoff perturbations that depend nonlinearly on the vector of choice probabilities.