Results 11  20
of
229
Reinforcement Learning to Play an Optimal Nash Equilibrium in Team Markov Games
 in Advances in Neural Information Processing Systems
, 2002
"... Multiagent learning is a key problem in game theory and AI. It involves two interrelated learning problems: identifying the game and learning to play. These two problems prevail even in team games where the agents' interests do not conflict. Even team games can have multiple Nash equilibria, only so ..."
Abstract

Cited by 78 (3 self)
 Add to MetaCart
Multiagent learning is a key problem in game theory and AI. It involves two interrelated learning problems: identifying the game and learning to play. These two problems prevail even in team games where the agents' interests do not conflict. Even team games can have multiple Nash equilibria, only some of which are optimal. We present optimal adaptive learning (OAL), the first algorithm that converges to an optimal Nash equilibrium for any team Markov game. We provide a convergence proof, and show that the algorithm's parameters are easy to set so that the convergence conditions are met. Our experiments show that existing algorithms do not converge in many of these problems while OAL does. We also demonstrate the importance of the fundamental ideas behind OAL: incomplete history sampling and biased action selection.
An Algorithm for Distributed Reinforcement Learning in Cooperative MultiAgent Systems
 In Proceedings of the Seventeenth International Conference on Machine Learning
, 2000
"... The article focuses on distributed reinforcement learning in cooperative multiagent decisionprocesses, where an ensemble of simultaneously and independently acting agents tries to maximize a discounted sum of rewards. We assume that each agent has no information about its teammates' behaviou ..."
Abstract

Cited by 76 (10 self)
 Add to MetaCart
The article focuses on distributed reinforcement learning in cooperative multiagent decisionprocesses, where an ensemble of simultaneously and independently acting agents tries to maximize a discounted sum of rewards. We assume that each agent has no information about its teammates' behaviour. Thus, in contrast to singleagent reinforcementlearning each agent has to consider its teammates' behaviour and to nd a cooperative policy. We propose a modelfree distributed Qlearning algorithm for cooperative multiagentdecisionprocesses. It can be proved to nd optimal policies in deterministic environments. No additional expense is needed in comparison to the nondistributed case. Further there is no need for additional communication between the agents. 1. Introduction Reinforcement learning has originally been discussed for Markov Decision Processes (MDPs): a single agent has to learn a policy that maximizes the discounted sum of rewards in a stochastic environment...
Learning About Other Agents in a Dynamic Multiagent System
, 2001
"... 21 We analyze the problem of learning about other agents in a class of dynamic multiagent systems, where performance of 22 the primary agent depends on behavior of the others. We consider an online version of the problem, where agents must learn 23 models of the others in the course of continual i ..."
Abstract

Cited by 71 (6 self)
 Add to MetaCart
21 We analyze the problem of learning about other agents in a class of dynamic multiagent systems, where performance of 22 the primary agent depends on behavior of the others. We consider an online version of the problem, where agents must learn 23 models of the others in the course of continual interactions. Various levels of recursive models are implemented in a 24 simulated double auction market. Our experiments show learning agents on average outperform nonlearning agents who do 25 not use information about others. Among learning agents, those with minimum recursion assumption generally perform 26 better than the agents with more complicated, though often wrong assumptions. 2001 Published by Elsevier Science B.V. 27 Keywords: Multiagent learning; Multiagent systems; Computational market 28 29 1.
Decentralized control of cooperative systems: Categorization and complexity analysis
 Journal of Artificial Intelligence Research
, 2004
"... Decentralized control of cooperative systems captures the operation of a group of decisionmakers that share a single global objective. The difficulty in solving optimally such problems arises when the agents lack full observability of the global state of the system when they operate. The general pr ..."
Abstract

Cited by 66 (8 self)
 Add to MetaCart
Decentralized control of cooperative systems captures the operation of a group of decisionmakers that share a single global objective. The difficulty in solving optimally such problems arises when the agents lack full observability of the global state of the system when they operate. The general problem has been shown to be NEXPcomplete. In this paper, we identify classes of decentralized control problems whose complexity ranges between NEXP and P. In particular, we study problems characterized by independent transitions, independent observations, and goaloriented objective functions. Two algorithms are shown to solve optimally useful classes of goaloriented decentralized processes in polynomial time. This paper also studies information sharing among the decisionmakers, which can improve their performance. We distinguish between three ways in which agents can exchange information: indirect communication, direct communication and sharing state features that are not controlled by the agents. Our analysis shows that for every class of problems we consider, introducing direct or indirect communication does not change the worstcase complexity. The results provide a better understanding of the complexity of decentralized control problems that arise in practice and facilitate the development of planning algorithms for these problems. 1.
Convergence and noregret in multiagent learning
 In Advances in Neural Information Processing Systems 17
, 2005
"... Learning in a multiagent system is a challenging problem due to two key factors. First, if other agents are simultaneously learning then the environment is no longer stationary, thus undermining convergence guarantees. Second, learning is often susceptible to deception, where the other agents may be ..."
Abstract

Cited by 65 (0 self)
 Add to MetaCart
Learning in a multiagent system is a challenging problem due to two key factors. First, if other agents are simultaneously learning then the environment is no longer stationary, thus undermining convergence guarantees. Second, learning is often susceptible to deception, where the other agents may be able to exploit a learnerâ€™s particular dynamics. In the worst case, this could result in poorer performance than if the agent was not learning at all. These challenges are identifiable in the two most common evaluation criteria for multiagent learning algorithms: convergence and regret. Algorithms focusing on convergence or regret in isolation are numerous. In this paper, we seek to address both criteria in a single algorithm by introducing GIGAWoLF, a learning algorithm for normalform games. We prove the algorithm guarantees at most zero average regret, while demonstrating the algorithm converges in many situations of selfplay. We prove convergence in a limited setting and give empirical results in a wider variety of situations. These results also suggest a third new learning criterion combining convergence and regret, which we call negative nonconvergence regret (NNR). 1
Correlated Qlearning
 In Proceedings of the Twentieth International Conference on Machine Learning
, 2003
"... There have been several attempts to design multiagent Qlearning algorithms capable of learning equilibrium policies in generalsum Markov games, just as Qlearning learns optimal policies in Markov decision processes. We introduce correlated Qlearning, one such algorithm based on the correlated eq ..."
Abstract

Cited by 56 (2 self)
 Add to MetaCart
There have been several attempts to design multiagent Qlearning algorithms capable of learning equilibrium policies in generalsum Markov games, just as Qlearning learns optimal policies in Markov decision processes. We introduce correlated Qlearning, one such algorithm based on the correlated equilibrium solution concept. Motivated by a fixed point proof of the existence of stationary correlated equilibrium policies in Markov games, we present a generic multiagent Qlearning algorithm of which many popular algorithms are immediate special cases. We also prove that certain variants of correlated (and Nash) Qlearning are guaranteed to converge to stationary correlated (and Nash) equilibrium policies in two special classes of Markov games, namely zerosum and commoninterest. Finally, we show empirically that correlated Qlearning outperforms Nash Qlearning, further justifying the former beyond noting that it is less computationally expensive than the latter.
Reinforcement Learning of Coordination in Cooperative MultiAgent Systems
, 2002
"... We report on an investigation of reinforcement learning techniques for the learning of coordination in cooperative multiagent systems. Specifically, we focus on a novel action selection strategy for Qlearning (Watkins 1989). The new technique is applicable to scenarios where mutual observation ..."
Abstract

Cited by 53 (4 self)
 Add to MetaCart
We report on an investigation of reinforcement learning techniques for the learning of coordination in cooperative multiagent systems. Specifically, we focus on a novel action selection strategy for Qlearning (Watkins 1989). The new technique is applicable to scenarios where mutual observation of actions is not possible.
Multiagent reinforcement learning: a critical survey
, 2003
"... We survey the recent work in AI on multiagent reinforcement learning (that is, learning in stochastic games). We then argue that, while exciting, this work is flawed. The fundamental flaw is unclarity about the problem or problems being addressed. After tracing a representative sample of the recent ..."
Abstract

Cited by 52 (0 self)
 Add to MetaCart
We survey the recent work in AI on multiagent reinforcement learning (that is, learning in stochastic games). We then argue that, while exciting, this work is flawed. The fundamental flaw is unclarity about the problem or problems being addressed. After tracing a representative sample of the recent literature, we identify four welldefined problems in multiagent reinforcement learning, single out the problem that in our view is most suitable for AI, and make some remarks about how we believe progress is tobemadeonthisproblem. 1