• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

AWESOME: A general multiagent learning algorithm that converges in self-play andlearns a best response against stationary opponents (0)

by Sandholm
Venue:Mach. Learn
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 39
Next 10 →

Learning against opponents with bounded memory

by Rob Powers - In IJCAI , 2005
"... Recently, a number of authors have proposed criteria for evaluating learning algorithms in multiagent systems. While well-justified, each of these has generally given little attention to one of the main challenges of a multi-agent setting: the capability of the other agents to adapt and learn as wel ..."
Abstract - Cited by 32 (2 self) - Add to MetaCart
Recently, a number of authors have proposed criteria for evaluating learning algorithms in multiagent systems. While well-justified, each of these has generally given little attention to one of the main challenges of a multi-agent setting: the capability of the other agents to adapt and learn as well. We propose extending existing criteria to apply to a class of adaptive opponents with bounded memory which we describe. We then show an algorithm that provably achieves an ɛ-best response against this richer class of opponents while simultaneously guaranteeing a minimum payoff against any opponent and performing well in self-play. This new algorithm also demonstrates strong performance in empirical tests against a variety of opponents in a wide range of environments. 1

Best-Response Multiagent Learning in Non-Stationary Environments

by Michael Weinberg, Jeffrey S. Rosenschein , 2004
"... This paper investigates a relatively new direction in Multiagent Reinforcement Learning. Most multiagent learning techniques focus on Nash equilibria as elements of both the learning algorithm and its evaluation criteria. In contrast, we propose a multiagent learning algorithm that is optimal in the ..."
Abstract - Cited by 13 (1 self) - Add to MetaCart
This paper investigates a relatively new direction in Multiagent Reinforcement Learning. Most multiagent learning techniques focus on Nash equilibria as elements of both the learning algorithm and its evaluation criteria. In contrast, we propose a multiagent learning algorithm that is optimal in the sense of finding a best-response policy, rather than in reaching an equilibrium. We present the first learning algorithm that is provably optimal against restricted classes of non-stationary opponents. The algorithm infers an accurate model of the opponent's non-stationary strategy, and simultaneously creates a best-response policy against that strategy. Our learning algorithm works within the very general framework of #-player, general-sum stochastic games, and learns both the game structure and its associated optimal policy.

A general criterion and an algorithmic framework for learning in multi-agent systems

by Rob Powers, Yoav Shoham, Thuc Vu - Machine Learning , 2007
"... in multi-agent systems ..."
Abstract - Cited by 9 (0 self) - Add to MetaCart
in multi-agent systems

Communication Complexity as a Lower Bound for Learning in Games

by Vincent Conitzer , Tuomas Sandholm - IN PROCEEDINGS OF THE 21ST INTERNATIONAL CONFERENCE ON MACHINE LEARNING , 2004
"... A fast-growing body of research in the AI and machine learning communities addresses learning in games, where there are multiple learners with di#erent interests. This research adds to more established research on learning in games conducted in economics. In part because of a clash of fields, ..."
Abstract - Cited by 8 (3 self) - Add to MetaCart
A fast-growing body of research in the AI and machine learning communities addresses learning in games, where there are multiple learners with di#erent interests. This research adds to more established research on learning in games conducted in economics. In part because of a clash of fields, there are widely varying requirements on learning algorithms in this domain. The goal of this paper is to demonstrate how communication complexity can be used as a lower bound on the required learning time or cost. Because this lower bound does not assume any requirements on the learning algorithm, it is universal, applying under any set of requirements on the learning algorithm. We characterize

Optimal efficient learning equilibrium: Imperfect monitoring in symmetric games

by Ronen I. Brafman, Moshe Tennenholtz - In Proceedings of the National Conference on Artificial Intelligence (AAAI , 2005
"... Efficient Learning Equilibrium (ELE) is a natural solution concept for multi-agent encounters with incomplete information. It requires the learning algorithms themselves to be in equilibrium for any game selected from a set of (initially unknown) games. In an optimal ELE, the learning algorithms wou ..."
Abstract - Cited by 7 (4 self) - Add to MetaCart
Efficient Learning Equilibrium (ELE) is a natural solution concept for multi-agent encounters with incomplete information. It requires the learning algorithms themselves to be in equilibrium for any game selected from a set of (initially unknown) games. In an optimal ELE, the learning algorithms would efficiently obtain the surplus the agents would obtain in an optimal Nash equilibrium of the initially unknown game which is played. The crucial part is that in an ELE deviations from the learning algorithms would become non-beneficial after polynomial time, although the game played is initially unknown. While appealing conceptually, the main challenge for establishing learning algorithms based on this concept is to isolate general classes of games where an ELE exists. Unfortunately, it has been shown that while an ELE exists for the setting in which each agent can observe all other agents ’ actions and payoffs, an ELE does not exist in general when the other agents ’ payoffs cannot be observed. In this paper we provide the first positive results on this problem, constructively proving the existence of an optimal ELE for the class of symmetric games where an agent can not observe other agents ’ payoffs. 1.

Online Multiagent Learning against Memory Bounded Adversaries

by Doran Chakraborty, Peter Stone
"... Abstract. The traditional agenda in Multiagent Learning (MAL) has been to develop learners that guarantee convergence to an equilibrium in self-play or that converge to playing the best response against an opponent using one of a fixed set of known targeted strategies. This paper introduces an algor ..."
Abstract - Cited by 7 (6 self) - Add to MetaCart
Abstract. The traditional agenda in Multiagent Learning (MAL) has been to develop learners that guarantee convergence to an equilibrium in self-play or that converge to playing the best response against an opponent using one of a fixed set of known targeted strategies. This paper introduces an algorithm called Learn or Exploit for Adversary Induced Markov Decision Process (LoE-AIM) that targets optimality against any learning opponent that can be treated as a memory bounded adversary. LoE-AIM makes no prior assumptions about the opponent and is tailored to optimally exploit any adversary which induces a Markov decision process in the state space of joint histories. LoE-AIM either explores and gathers new information about the opponent or converges to the best response to the partially learned opponent strategy in repeated play. We further extend LoE-AIM to account for online repeated interactions against the same adversary with plays against other adversaries interleaved in between. LoE-AIM-repeated stores learned knowledge about an adversary, identifies the adversary in case of repeated interaction, and reuses the stored knowledge about the behavior of the adversary to enhance learning in the current epoch of play. LoE-AIM and LoE-AIM-repeated are fully implemented, with results demonstrating their superiority over other existing MAL algorithms. 1

Robust learning equilibrium

by Itai Ashlagi, Dov Monderer - In Proceedings of the 22th Annual Conference on Uncertainty in Artificial Intelligence (UAI-06), 34–41. Corvallis,Oregon: AUAI , 2006
"... We introduce robust learning equilibrium and apply it to the context of auctions. 1 ..."
Abstract - Cited by 6 (5 self) - Add to MetaCart
We introduce robust learning equilibrium and apply it to the context of auctions. 1

Efficient learning of multi-step best response

by Bikramjit Banerjee - In AAMAS ’05: Proceedings of the fourth international joint conference on Autonomous agents and multiagent systems , 2005
"... We provide a uniform framework for learning against a recent history adversary in arbitrary repeated bimatrix games, by modeling such an agent as a Markov Decision Process. We focus on learning an optimal non-stationary policy in such an MDP over a finite horizon and adapt an existing efficient Mont ..."
Abstract - Cited by 4 (0 self) - Add to MetaCart
We provide a uniform framework for learning against a recent history adversary in arbitrary repeated bimatrix games, by modeling such an agent as a Markov Decision Process. We focus on learning an optimal non-stationary policy in such an MDP over a finite horizon and adapt an existing efficient Monte Carlo based algorithm for learning optimal policies in such MDPs. We show that this new efficient algorithm can obtain higher average rewards than a previously known efficient algorithm against some opponents in the contract game. Though this improvement comes at the cost of increased domain knowledge, a simple experiment in the Prisoner’s Dilemma game shows that even when no extra domain knowledge (besides that the opponent’s memory size is known) is assumed, the error can still be small.

Leading a Best-Response Teammate in an Ad Hoc Team

by Peter Stone, Gal A. Kaminka, Jeffrey S. Rosenschein , 2009
"... Abstract. Teams of agents may not always be developed in a planned, coordinated fashion. Rather, as deployed agents become more common in e-commerce and other settings, there are increasing opportunities for previously unacquainted agents to cooperate in ad hoc team settings. In such scenarios, it i ..."
Abstract - Cited by 4 (1 self) - Add to MetaCart
Abstract. Teams of agents may not always be developed in a planned, coordinated fashion. Rather, as deployed agents become more common in e-commerce and other settings, there are increasing opportunities for previously unacquainted agents to cooperate in ad hoc team settings. In such scenarios, it is useful for individual agents to be able to collaborate with a wide variety of possible teammates under the philosophy that not all agents are fully rational. This paper considers an agent that is to interact repeatedly with a teammate that will adapt to this interaction in a particular suboptimal, but natural way. We formalize this setting in game-theoretic terms, provide and analyze a fully-implemented algorithm for finding optimal action sequences, prove some theoretical results pertaining to the lengths of these action sequences, and provide empirical results pertaining to the prevalence of our problem of interest in random interaction settings. 1

Learning Equilibrium in Resource Selection Games

by Itai Ashlagi, Dov Monderer, Moshe Tennenholtz
"... We consider a resource selection game with incomplete information about the resource-cost functions. All the players know is the set of players, an upper bound on the possible costs, and that the cost functions are positive and nondecreasing. The game is played repeatedly and after every stage each ..."
Abstract - Cited by 3 (2 self) - Add to MetaCart
We consider a resource selection game with incomplete information about the resource-cost functions. All the players know is the set of players, an upper bound on the possible costs, and that the cost functions are positive and nondecreasing. The game is played repeatedly and after every stage each player observes her cost, and the actions of all players. For every ɛ> 0 we prove the existence of a learning ɛ-equilibrium, which is a profile of algorithms, one for each player such that a unilateral deviation of a player is, up to ɛ not beneficial for her regardless of the actual cost functions. Furthermore, the learning equilibrium yields an optimal social cost. 1.
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University