• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm (1998)

by J Hu, M Wellman
Venue:Proceedings of ICML
Add To MetaCart

Tools

Sorted by:
Results 11 - 20 of 331
Next 10 →

AWESOME: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents

by Vincent Conitzer, Tuomas Sandholm , 2003
"... A satisfactory multiagent learning algorithm should, at a minimum, learn to play optimally against stationary opponents and converge to a Nash equilibrium in self-play. The algorithm that has come closest, WoLF-IGA, has been proven to have these two properties in 2-player 2-action repeated games— as ..."
Abstract - Cited by 97 (5 self) - Add to MetaCart
A satisfactory multiagent learning algorithm should, at a minimum, learn to play optimally against stationary opponents and converge to a Nash equilibrium in self-play. The algorithm that has come closest, WoLF-IGA, has been proven to have these two properties in 2-player 2-action repeated games— assuming that the opponent’s (mixed) strategy is observable. In this paper we present AWESOME, the first algorithm that is guaranteed to have these two properties in all repeated (finite) games. It requires only that the other players ’ actual actions (not their strategies) can be observed at each step. It also learns to play optimally against opponents that eventually become stationary. The basic idea behind AWESOME (Adapt When Everybody is Stationary, Otherwise Move to Equilibrium) is to try to adapt to the others’ strategies when they appear stationary, but otherwise to retreat to a precomputed equilibrium strategy. The techniques used to prove the properties of AWESOME are fundamentally different from those used for previous algorithms, and may help in analyzing other multiagent learning algorithms also.
(Show Context)

Citation Context

...laus & Boutilier, 1998; Singh et al., 2000; Bowling & Veloso, 2002; Wang & Sandholm, 2002)). Some of the algorithms satisfy the second property in restricted games (e.g. (Vrieze, 1987; Littman, 1994; =-=Hu & Wellman, 1998-=-; Singh et al., 2000; Bowling & Veloso, 2002; Wang & Sandholm, 2002)). The algorithm that has come closest to satisfying both of the properties in general repeated games is WoLFIGA (Bowling & Veloso, ...

Rational and Convergent Learning in Stochastic Games

by Michael Bowling , Manuela Veloso , 2001
"... This paper investigates the problem of policy learning in multiagent environments using the stochastic game framework, which we briefly overview. We introduce two properties as desirable for a learning agent when in the presence of other learning agents, namely rationality and convergence. We e ..."
Abstract - Cited by 91 (5 self) - Add to MetaCart
This paper investigates the problem of policy learning in multiagent environments using the stochastic game framework, which we briefly overview. We introduce two properties as desirable for a learning agent when in the presence of other learning agents, namely rationality and convergence. We examine existing reinforcement learning algorithms according to these two properties and notice that they fail to simultaneously meet both criteria. We then contribute a new learning algorithm, WoLF policy hillclimbing, that is based on a simple principle: "learn quickly while losing, slowly while winning." The algorithm is proven to be rational and we present empirical results for a number of stochastic games showing the algorithm converges.

Reinforcement Learning to Play an Optimal Nash Equilibrium in Team Markov Games

by Xiaofeng Wang, Tuomas Sandholm - in Advances in Neural Information Processing Systems , 2002
"... Multiagent learning is a key problem in game theory and AI. It involves two interrelated learning problems: identifying the game and learning to play. These two problems prevail even in team games where the agents' interests do not conflict. Even team games can have multiple Nash equilibria, on ..."
Abstract - Cited by 88 (3 self) - Add to MetaCart
Multiagent learning is a key problem in game theory and AI. It involves two interrelated learning problems: identifying the game and learning to play. These two problems prevail even in team games where the agents' interests do not conflict. Even team games can have multiple Nash equilibria, only some of which are optimal. We present optimal adaptive learning (OAL), the first algorithm that converges to an optimal Nash equilibrium for any team Markov game. We provide a convergence proof, and show that the algorithm's parameters are easy to set so that the convergence conditions are met. Our experiments show that existing algorithms do not converge in many of these problems while OAL does. We also demonstrate the importance of the fundamental ideas behind OAL: incomplete history sampling and biased action selection.
(Show Context)

Citation Context

...a in Markov games (aka. stochastic games) [16] when the game structure is unknown. This has been studied under various types of Markov games such as zero-sum Markov games [9], generalsum Markov games =-=[6, 8]-=- and team Markov games [2]. Multiagent RL in Markov games involves two interrelated learning problems: identifying the game and learning to play. These two problems prevail even in team Markov games w...

Convergence and no-regret in multiagent learning

by Michael Bowling - In Advances in Neural Information Processing Systems 17 , 2005
"... Learning in a multiagent system is a challenging problem due to two key factors. First, if other agents are simultaneously learning then the environment is no longer stationary, thus undermining convergence guarantees. Second, learning is often susceptible to deception, where the other agents may be ..."
Abstract - Cited by 85 (0 self) - Add to MetaCart
Learning in a multiagent system is a challenging problem due to two key factors. First, if other agents are simultaneously learning then the environment is no longer stationary, thus undermining convergence guarantees. Second, learning is often susceptible to deception, where the other agents may be able to exploit a learner’s particular dynamics. In the worst case, this could result in poorer performance than if the agent was not learning at all. These challenges are identifiable in the two most common evaluation criteria for multiagent learning algorithms: convergence and regret. Algorithms focusing on convergence or regret in isolation are numerous. In this paper, we seek to address both criteria in a single algorithm by introducing GIGA-WoLF, a learning algorithm for normalform games. We prove the algorithm guarantees at most zero average regret, while demonstrating the algorithm converges in many situations of self-play. We prove convergence in a limited setting and give empirical results in a wider variety of situations. These results also suggest a third new learning criterion combining convergence and regret, which we call negative non-convergence regret (NNR). 1
(Show Context)

Citation Context

...settings.sThe desirability of convergence has been recently contested. We offer some brief insight into this debate in the introduction of the extended version of this paper [1]. Equilibrium learners =-=[2, 3, 4]-=- are one method of handling the loss of stationarity. These algorithms learn joint-action values, which are stationary, and in certain circumstances guarantee these values converge to Nash (or correla...

Accelerating Reinforcement Learning through Implicit Imitation

by Bob Price, Craig Boutilier - JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH , 2003
"... Imitation can be viewed as a means of enhancing learning in multiagent environments. It augments ..."
Abstract - Cited by 79 (0 self) - Add to MetaCart
Imitation can be viewed as a means of enhancing learning in multiagent environments. It augments
(Show Context)

Citation Context

...on offers the potential to accelerate learning. A general solution requires the integration of imitation into more general models for multiagent RL based on stochastic or Markov games (Littman, 1994; =-=Hu & Wellman, 1998-=-; Bowling & Veloso, 2001). This would no doubt be a rather challenging, yet rewarding endeavor. To take a simple example, in simple coordination problems (e.g., two mobile agents trying to avoid each ...

Analyzing complex strategic interactions in multi-agent systems.

by William E Walsh , Rajarshi Das , Gerald Tesauro , Jeffrey O Kephart - Proceedings of 2002 Workshop on Game-Theoretic and Decision-Theoretic Agents (GTDT-02), , 2002
"... Abstract We develop a model for analyzing complex games with repeated interactions, for which a full game-theoretic analysis is intractable. Our approach treats exogenously specified, heuristic strategies, rather than the atomic actions, as primitive, and computes a heuristic-payoff table specifyin ..."
Abstract - Cited by 76 (3 self) - Add to MetaCart
Abstract We develop a model for analyzing complex games with repeated interactions, for which a full game-theoretic analysis is intractable. Our approach treats exogenously specified, heuristic strategies, rather than the atomic actions, as primitive, and computes a heuristic-payoff table specifying the expected payoffs of the joint heuristic strategy space. We analyze two games based on (i) automated dynamic pricing and (ii) continuous double auction. For each game we compute Nash equilibria of previously published heuristic strategies. To determine the most plausible equilibria, we study the replicator dynamics of a large population playing the strategies. In order to account for errors in estimation of payoffs or improvements in strategies, we also analyze the dynamics and equilibria based on perturbed payoffs.
(Show Context)

Citation Context

.... Equilibrium convergence can be guaranteed for a single Q-Learner, and for two-player games in the zero-sum case (Littman 1994), or in the general-sum case with the use of a Nash equilibrium solver (=-=Hu & Wellman 1998-=-). For this paper, we borrow a well-developed model from evolutionary game theory (Weibull 1995) to analyze strategy choice dynamics. In contrast to the aforementioned approaches, which model repeated...

Multi-agent reinforcement learning: a critical survey

by Yoav Shoham, Rob Powers, Trond Grenager , 2003
"... We survey the recent work in AI on multi-agent reinforcement learning (that is, learning in stochastic games). We then argue that, while exciting, this work is flawed. The fundamental flaw is unclarity about the problem or problems being addressed. After tracing a representative sample of the recent ..."
Abstract - Cited by 68 (1 self) - Add to MetaCart
We survey the recent work in AI on multi-agent reinforcement learning (that is, learning in stochastic games). We then argue that, while exciting, this work is flawed. The fundamental flaw is unclarity about the problem or problems being addressed. After tracing a representative sample of the recent literature, we identify four well-defined problems in multi-agent reinforcement learning, single out the problem that in our view is most suitable for AI, and make some remarks about how we believe progress is tobemadeonthisproblem. 1

Coordination in Multiagent Reinforcement Learning: A Bayesian Approach

by Georgios Chalkiadakis, Craig Boutilier - In Proceedings of the Second International Joint Conference on Autonomous Agents and Multiagent Systems , 2003
"... Much emphasis in multiagent reinforcement learning (MARL) research is placed on ensuring that MARL algorithms (eventually) converge to desirable equilibria. As in standard reinforcement learning, convergence generally requires sufficient exploration of strategy space. However, exploration often com ..."
Abstract - Cited by 66 (6 self) - Add to MetaCart
Much emphasis in multiagent reinforcement learning (MARL) research is placed on ensuring that MARL algorithms (eventually) converge to desirable equilibria. As in standard reinforcement learning, convergence generally requires sufficient exploration of strategy space. However, exploration often comes at a price in the form of penalties or foregone opportunities. In multiagent settings, the problem is exacerbated by the need for agents to "coordinate" their policies on equilibria. We propose a Bayesian model for optimal exploration in MARL problems that allows these exploration costs to be weighed against their expected benefits using the notion of value of information. Unlike standard RL models, this model requires reasoning about how one's actions will influence the behavior of other agents. We develop tractable approximations to optimal Bayesian exploration, and report on experiments illustrating the benefits of this approach in identical interest games.
(Show Context)

Citation Context

...orithms Keywords multiagent learning, reinforcement learning, Bayesian methods 1. INTRODUCTION The application of reinforcement learning (RL) to multiagent systems has received considerable attention =-=[12, 3, 7, 2]-=-. However, in multiagent settings, the effect (or benefit) of one agent’s actions are often directly influenced by those of other agents. This adds complexity to the learning problem, requiring that a...

Correlated-q learning

by Amy Greenwald , Keith Hall - In ICML ’03: Proceedings of the Twentieth International Conference on Machine Learning , 2003
"... Abstract This paper introduces Correlated-Q (CE-Q) learning, a multiagent Q-learning algorithm based on the correlated equilibrium (CE) solution concept. CE-Q generalizes both Nash-Q and Friend-and-Foe-Q: in general-sum games, the set of correlated equilibria contains the set of Nash equilibria; in ..."
Abstract - Cited by 65 (3 self) - Add to MetaCart
Abstract This paper introduces Correlated-Q (CE-Q) learning, a multiagent Q-learning algorithm based on the correlated equilibrium (CE) solution concept. CE-Q generalizes both Nash-Q and Friend-and-Foe-Q: in general-sum games, the set of correlated equilibria contains the set of Nash equilibria; in constantsum games, the set of correlated equilibria contains the set of minimax equilibria. This paper describes experiments with four variants of CE-Q, demonstrating empirical convergence to equilibrium policies on a testbed of general-sum Markov games.
(Show Context)

Citation Context

...ral-sum games, the set of correlated equilibria contains the set of Nash equilibria; in constantsum games, the set of correlated equilibria contains the set of minimax equilibria. This paper describes experiments with four variants of CE-Q, demonstrating empirical convergence to equilibrium policies on a testbed of general-sum Markov games. 1. Introduction Recently, there have been several attempts to design a multiagent learning algorithm that learns equilibrium policies in general-sum Markov games, just as Q-learning converges to optimal policies in Markov decision processes. Hu and Wellman [8] propose an algorithm called Nash-Q that converges to Nash equilibrium policies under certain (restrictive) conditions. Littman’s [11] friend-or-foe-Q (FF-Q) algorithm always converges, but it only learns equilibrium policies in restricted classes of games: e.g., twoplayer, constant-sum Markov games, which exhibit minimax equilibria (foe-Q); e.g., coordination games with uniquely-valued equilibria (friend-Q). This paper introduces Correlated-Q (CE-Q) learning, a multiagent Q-learning algorithm based on the correlated equilibrium solution concept [1]. CE-Q generalizes both Nash-Q and FF-Q: in g...

Economics and Electronic Commerce: Survey and Directions for Research

by Robert J. Kauffman, Eric A. Walden - INTERNATIONAL JOURNAL OF ELECTRONIC COMMERCE , 2001
"... This article reviews the growing body of research on electronic commerce from the perspective of economic analysis. It begins by constructing a new framework for understanding electronic commerce research, then identifies the range of applicable theory and current research in the context of the new ..."
Abstract - Cited by 60 (11 self) - Add to MetaCart
This article reviews the growing body of research on electronic commerce from the perspective of economic analysis. It begins by constructing a new framework for understanding electronic commerce research, then identifies the range of applicable theory and current research in the context of the new conceptual model. It goes on to assess the state-of-the-art of knowledge about electronic commerce phenomena in terms of the levels of analysis here proposed. And finally, it charts the directions along which useful work in this area might be developed. This survey and framework are intended to induce researchers in the field of information systems, the authors’ reference discipline, and other areas in schools of business and management to recognize that research on electronic commerce is business-school research, broadly defined. As such, developments in this research area in the next several years will occur across multiple business-school disciplines, and there will be a growing impetus for greater interdisciplinary communication and interaction.
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University