## Efficient learning equilibrium (2002)

### Cached

### Download Links

- [www.cs.bgu.ac.il]
- [www.cs.bgu.ac.il]
- [iew3.technion.ac.il]
- [www.eecs.harvard.edu]
- [ie.technion.ac.il]
- DBLP

### Other Repositories/Bibliography

Venue: | In Proceedings of NIPS |

Citations: | 49 - 6 self |

### BibTeX

@INPROCEEDINGS{Brafman02efficientlearning,

author = {Ronen I. Brafman and Moshe Tennenholtz},

title = {Efficient learning equilibrium},

booktitle = {In Proceedings of NIPS},

year = {2002},

publisher = {}

}

### Years of Citing Articles

### OpenURL

### Abstract

We introduce ecient learning equilibrium (ELE), a normative approach to learning in non-cooperative settings. In ELE, the learning algorithms themselves are required to be in equilibrium. In addition, the learning algorithms must arrive at a desired value after polynomial time, and a deviation from the prescribed ELE become irrational after polynomial time. We prove the existence of an ELE (where the desired value is the expected payoff in a Nash equilibrium) and of a Pareto-ELE (where the objective is the maximization of social surplus) in repeated games with perfect monitoring. We also show that an ELE does not always exist in the imperfect monitoring case. Finally, we discuss the extension of these results to general-sum stochastic games.

### Citations

2969 |
Game Theory
- Fudenberg, Tirole
- 1991
(Show Context)
Citation Context ...he so-called folk theorems in economics and a novel, efficient punishment mechanism which ensures the efficiency of our approach. 3 In the folk theorems (e.g., see [12] and the extended discussion in =-=[13]-=-) the basic idea is that any strategy profile that leads to payoffs that are greater than or equal to the security level (probabilistic maximin) values that the agents can guarantee themselves can be ... |

1403 | Reinforcement learning: a survey
- Kaelbling, Littman, et al.
- 1996
(Show Context)
Citation Context ...f multi-agent interaction has attracted the attention of researchers in cognitive psychology, experimental economics, machine learning, artificial intelligence, and related fields for quite some time =-=[18, 7]-=-. Much of this work uses repeated games [6, 11] and stochastic games [26, 22, 17, 3] as models of such interactions. The literature on learning in games in game theory [11] is mainly concerned with th... |

899 |
The Theory of Learning in Games
- Fudenberg, Levine
- 1998
(Show Context)
Citation Context ...ention of researchers in cognitive psychology, experimental economics, machine learning, artificial intelligence, and related fields for quite some time [18, 7]. Much of this work uses repeated games =-=[6, 11]-=- and stochastic games [26, 22, 17, 3] as models of such interactions. The literature on learning in games in game theory [11] is mainly concerned with the understanding of learning procedures that if ... |

732 |
Equilibrium points in n-person games
- Nash
- 1950
(Show Context)
Citation Context ...game in strategic form, in which the agents’ set of action is finite possesses a Nash equilibrium in mixed strategies (where each agent can select a probability distribution of its available actions) =-=[24]-=-. Unfortunately, in general, there can be many Nash equilibria. Thus, while Nash-equilibria are stable in some respect, this does not imply that 4sM1 = � M2 = M3 = � Figure 1: 5, −5 3, −3 −3, 3 −2, 2 ... |

533 | Markov games as a framework for multi-agent reinforcement learning
- Littman
- 1994
(Show Context)
Citation Context ...itive psychology, experimental economics, machine learning, artificial intelligence, and related fields for quite some time [18, 7]. Much of this work uses repeated games [6, 11] and stochastic games =-=[26, 22, 17, 3]-=- as models of such interactions. The literature on learning in games in game theory [11] is mainly concerned with the understanding of learning procedures that if adopted by the different agents will ... |

467 | Predicting How People Play Games: Reinforcement Learning in Experimental Games with Unique, Mixed Strategy Equilibria. The American Economic Review 88
- Erev, Roth
- 1998
(Show Context)
Citation Context ...f multi-agent interaction has attracted the attention of researchers in cognitive psychology, experimental economics, machine learning, artificial intelligence, and related fields for quite some time =-=[18, 7]-=-. Much of this work uses repeated games [6, 11] and stochastic games [26, 22, 17, 3] as models of such interactions. The literature on learning in games in game theory [11] is mainly concerned with th... |

429 | The folk theorem in repeated games with discounting or incomplete information
- Fudenberg, Maskin
- 1986
(Show Context)
Citation Context ...rithm is based on a combination of the so-called folk theorems in economics and a novel, efficient punishment mechanism which ensures the efficiency of our approach. 3 In the folk theorems (e.g., see =-=[12]-=- and the extended discussion in [13]) the basic idea is that any strategy profile that leads to payoffs that are greater than or equal to the security level (probabilistic maximin) values that the age... |

428 | Auction theory: a guide to the literature
- Klemperer
- 1999
(Show Context)
Citation Context ...d only by it, and the setting of common values, where the worth of the good is common but is revealed to the agents only after they take their actions. Many interesting intermediate cases exists (see =-=[20]-=- for a discussion and survey in the context of auction theory). These settings fall into the category of games in which private payoffs are revealed after the joint-action of all agents and are instan... |

411 |
Subjectivity and Correlation in Randomized Strategies
- Aumann
- 1974
(Show Context)
Citation Context ...r payments, and it does not possess any private knowledge or aim to optimize private payoffs. Therefore, the right way to view this party is as a mediator/intersted party or correlation device (as in =-=[1, 9]-=-, etc.) We find this interpretation to be convenient, although not essential. Notice that the suggested payments are just part of the algorithms, and it is up to the agents to decide whether to make t... |

323 | The dynamics of reinforcement learning in cooperative multi-agent systems
- Claus, Boutilier
- 1998
(Show Context)
Citation Context ...ention of researchers in cognitive psychology, experimental economics, machine learning, artificial intelligence, and related fields for quite some time [18, 7]. Much of this work uses repeated games =-=[6, 11]-=- and stochastic games [26, 22, 17, 3] as models of such interactions. The literature on learning in games in game theory [11] is mainly concerned with the understanding of learning procedures that if ... |

299 | Multi-agent reinforcement learning: Theoretical framework and an algorithms
- Hu, Wellman
- 1998
(Show Context)
Citation Context ...itive psychology, experimental economics, machine learning, artificial intelligence, and related fields for quite some time [18, 7]. Much of this work uses repeated games [6, 11] and stochastic games =-=[26, 22, 17, 3]-=- as models of such interactions. The literature on learning in games in game theory [11] is mainly concerned with the understanding of learning procedures that if adopted by the different agents will ... |

255 |
Stochastic games
- Shapley
- 1953
(Show Context)
Citation Context ...itive psychology, experimental economics, machine learning, artificial intelligence, and related fields for quite some time [18, 7]. Much of this work uses repeated games [6, 11] and stochastic games =-=[26, 22, 17, 3]-=- as models of such interactions. The literature on learning in games in game theory [11] is mainly concerned with the understanding of learning procedures that if adopted by the different agents will ... |

249 | Near-optimal reinforcement learning in polynomial time
- Kearns, Singh
(Show Context)
Citation Context ...δ, the situation in stochastic games is more complicated. A parameter that is typically used to assess the speed of convergence of a learning algorithm in stochastic games is the ɛ-return mixing time =-=[19, 4]-=-. Intuitively, the ɛ-return mixing time of a policy is the expected time it would take an agent that uses this policy to converge to a value that is ɛ close to the value of the policy. Ideally, we wou... |

248 | R-max – a general polynomial time algorithm for near-optimal reinforcement learning
- Brafman, Tennenholtz
(Show Context)
Citation Context |

138 | Adaptive game playing using multiplicative weights
- Freund, Schapire
- 1999
(Show Context)
Citation Context ... agent using them a value which is approximately equal to the value he would have attained had he known in advance how his adversary would play. Algorithms along this line appear in e.g., [15] and in =-=[10]-=- (where special attention is given to the issue of efficiency). This latter result is truly in the spirit of on-line algorithms, where our goal is to do as much as we can online as we would have been ... |

60 | Correlated q-learning
- Greenwald, Hall, et al.
- 2002
(Show Context)
Citation Context ...is limits the applicability of the ELE criteria as it cannot be used as a guideline in such cases. Yet, most previous work in AI on learning algorithms for games assumes perfect monitoring, including =-=[17, 21, 14]-=-. Work on common-interest or zero-sum games makes this assumption implicitly too, since there the perfect and imperfect monitoring settings coincide. Perfect monitoring is not assumed by authors that ... |

48 | Bundling equilibrium in combinatorial auctions. Games and Economic Behavior 47:104–123
- Holzman, Kfir-Dahav, et al.
- 2004
(Show Context)
Citation Context ...f the state of the system. This system state may not be initially observable and might consist of various private inputs of the agents. Ex-post equilibria play a major role in central mechanisms (see =-=[16]-=- for general results in the context of the famous VCG mechanisms). It turns out that the idea of equilibrium of learning algorithms can be viewed similarly. We search for strategies (termed learning a... |

25 | A reinforcement procedure leading to correlated equilibrium
- Hart, Mas-Colell
- 2002
(Show Context)
Citation Context ...ntee for the agent using them a value which is approximately equal to the value he would have attained had he known in advance how his adversary would play. Algorithms along this line appear in e.g., =-=[15]-=- and in [10] (where special attention is given to the issue of efficiency). This latter result is truly in the spirit of on-line algorithms, where our goal is to do as much as we can online as we woul... |

18 | Using redundancy to improve robustness of distributed mechanism implementations
- Shneidman, Parkes
(Show Context)
Citation Context ...e in this setting with respect to whether or not the agents adopt strategic considerations when taking actions and passing messages (e.g. can agents strategically modify messages sent by others?) see =-=[23, 8, 27]-=- for various ways of approaching these issues). Much of the literature deals with the search for mechanisms where the agents will have dominant strategies that lead to desired behavior (e.g. maximizin... |

17 |
Game Theory. 2nd ed
- OWEN
- 1982
(Show Context)
Citation Context ... Game-theory provides a mathematical formulation of multi-agent interactions and multiagent decision making. Here we review some of the basic concepts. For a good introduction to the area, see, e.g., =-=[25]-=-. A game is a formal description of an interaction between a set of agents. The rules of the game describe the order of moves by the agents, the available choices at each move, the information availab... |

14 | Distributed games
- Monderer, Tennenholtz
- 1999
(Show Context)
Citation Context ...e in this setting with respect to whether or not the agents adopt strategic considerations when taking actions and passing messages (e.g. can agents strategically modify messages sent by others?) see =-=[23, 8, 27]-=- for various ways of approaching these issues). Much of the literature deals with the search for mechanisms where the agents will have dominant strategies that lead to desired behavior (e.g. maximizin... |

10 |
An Approach to Communication Equilibrium
- Forges
- 1986
(Show Context)
Citation Context ...r payments, and it does not possess any private knowledge or aim to optimize private payoffs. Therefore, the right way to view this party is as a mediator/intersted party or correlation device (as in =-=[1, 9]-=-, etc.) We find this interpretation to be convenient, although not essential. Notice that the suggested payments are just part of the algorithms, and it is up to the agents to decide whether to make t... |

5 | Learning to coordinate efficiently: A model based approach - Brafman, Tennenholtz - 2003 |

4 |
Rational and covergent learning in stochastic games
- Bowling, Veloso
- 2001
(Show Context)
Citation Context ... and theoretical importance, these results cannot replace concepts that are based on the notion of an equilibrium. Another related work on normative guidelines to the design of learning algorithms is =-=[2]-=-. There, Bowling and Veloso suggest two criteria for learning algorithms. The first, which they call rationality stipulates that if the other player’s policies converge to a stationary policy then the... |

2 |
Distributed Algorithmic Mechanism Design: Recent Results and Futute Directions
- Feigenbuam, S
- 2002
(Show Context)
Citation Context ...e in this setting with respect to whether or not the agents adopt strategic considerations when taking actions and passing messages (e.g. can agents strategically modify messages sent by others?) see =-=[23, 8, 27]-=- for various ways of approaching these issues). Much of the literature deals with the search for mechanisms where the agents will have dominant strategies that lead to desired behavior (e.g. maximizin... |

2 | E#cient learning equilibrium - Tennenholtz - 2002 |

1 |
or foe q-learning in general sum markov games
- Friend
- 2001
(Show Context)
Citation Context ...is limits the applicability of the ELE criteria as it cannot be used as a guideline in such cases. Yet, most previous work in AI on learning algorithms for games assumes perfect monitoring, including =-=[17, 21, 14]-=-. Work on common-interest or zero-sum games makes this assumption implicitly too, since there the perfect and imperfect monitoring settings coincide. Perfect monitoring is not assumed by authors that ... |

1 | Learning to coordinate eciently { a model based approach - Brafman, Tennenholtz - 2003 |