Results 1  10
of
138
Cooperative MultiAgent Learning: The State of the Art
 Autonomous Agents and MultiAgent Systems
, 2005
"... Cooperative multiagent systems are ones in which several agents attempt, through their interaction, to jointly solve tasks or to maximize utility. Due to the interactions among the agents, multiagent problem complexity can rise rapidly with the number of agents or their behavioral sophistication. ..."
Abstract

Cited by 182 (8 self)
 Add to MetaCart
(Show Context)
Cooperative multiagent systems are ones in which several agents attempt, through their interaction, to jointly solve tasks or to maximize utility. Due to the interactions among the agents, multiagent problem complexity can rise rapidly with the number of agents or their behavioral sophistication. The challenge this presents to the task of programming solutions to multiagent systems problems has spawned increasing interest in machine learning techniques to automate the search and optimization process. We provide a broad survey of the cooperative multiagent learning literature. Previous surveys of this area have largely focused on issues common to specific subareas (for example, reinforcement learning or robotics). In this survey we attempt to draw from multiagent learning work in a spectrum of areas, including reinforcement learning, evolutionary computation, game theory, complex systems, agent modeling, and robotics. We find that this broad view leads to a division of the work into two categories, each with its own special issues: applying a single learner to discover joint solutions to multiagent problems (team learning), or using multiple simultaneous learners, often one per agent (concurrent learning). Additionally, we discuss direct and indirect communication in connection with learning, plus open issues in task decomposition, scalability, and adaptive dynamics. We conclude with a presentation of multiagent learning problem domains, and a list of multiagent learning resources. 1
Dynamic Programming for Partially Observable Stochastic Games
 IN PROCEEDINGS OF THE NINETEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE
, 2004
"... We develop an exact dynamic programming algorithm for partially observable stochastic games (POSGs). The algorithm is a synthesis of dynamic programming for partially observable Markov decision processes (POMDPs) and iterated elimination of dominated strategies in normal form games. ..."
Abstract

Cited by 159 (25 self)
 Add to MetaCart
(Show Context)
We develop an exact dynamic programming algorithm for partially observable stochastic games (POSGs). The algorithm is a synthesis of dynamic programming for partially observable Markov decision processes (POMDPs) and iterated elimination of dominated strategies in normal form games.
Correlated Qlearning
 In Proceedings of the Twentieth International Conference on Machine Learning
, 2003
"... There have been several attempts to design multiagent Qlearning algorithms capable of learning equilibrium policies in generalsum Markov games, just as Qlearning learns optimal policies in Markov decision processes. We introduce correlated Qlearning, one such algorithm based on the correlated eq ..."
Abstract

Cited by 58 (2 self)
 Add to MetaCart
There have been several attempts to design multiagent Qlearning algorithms capable of learning equilibrium policies in generalsum Markov games, just as Qlearning learns optimal policies in Markov decision processes. We introduce correlated Qlearning, one such algorithm based on the correlated equilibrium solution concept. Motivated by a fixed point proof of the existence of stationary correlated equilibrium policies in Markov games, we present a generic multiagent Qlearning algorithm of which many popular algorithms are immediate special cases. We also prove that certain variants of correlated (and Nash) Qlearning are guaranteed to converge to stationary correlated (and Nash) equilibrium policies in two special classes of Markov games, namely zerosum and commoninterest. Finally, we show empirically that correlated Qlearning outperforms Nash Qlearning, further justifying the former beyond noting that it is less computationally expensive than the latter.
Multiagent Reinforcement Learning for MultiRobot Systems: A Survey
, 2004
"... Multiagent reinforcement learning for multirobot systems is a challenging issue in both robotics and artificial intelligence. With the ever increasing interests in theoretical researches and practical applications, currently there have been a lot of efforts towards providing some solutions to this c ..."
Abstract

Cited by 28 (0 self)
 Add to MetaCart
Multiagent reinforcement learning for multirobot systems is a challenging issue in both robotics and artificial intelligence. With the ever increasing interests in theoretical researches and practical applications, currently there have been a lot of efforts towards providing some solutions to this challenge. However, there are still many difficulties in scaling up the multiagent reinforcement learning to multirobot systems. The main objective of this paper is to provide a survey, though not completely on the multiagent reinforcement learning in multirobot systems. After reviewing important advances in this field, some challenging problems and promising research directions are analyzed. A concluding remark is made from the perspectives of the authors.
An intrusion detection game with limited observations
 Proceedings of the 12th Int. Symp. on Dynamic Games and Applications, Sophia Antipolis
, 2006
"... We present a 2player zerosum stochastic (Markov) security game which models the interaction between malicious attackers to a system and the IDS who allocates system resources for detection and response. We capture the operation of a sensor network observing and reporting the attack information to ..."
Abstract

Cited by 27 (8 self)
 Add to MetaCart
(Show Context)
We present a 2player zerosum stochastic (Markov) security game which models the interaction between malicious attackers to a system and the IDS who allocates system resources for detection and response. We capture the operation of a sensor network observing and reporting the attack information to the IDS as a finite Markov chain. Thus, we extend the game theoretic framework in [1] to a stochastic and dynamic one. We analyze the outcomes and evolution of an example game numerically for various game parameters. Furthermore, we study limited information cases where players optimize their strategies offline or online depending on the type of information available, using methods based on Markov decision process and Qlearning. I.
Theoretical considerations of potentialbased reward shaping for multiagent systems
 In Proceedings of the 10th International Conference on Autonomous Agents and Multiagent Systems (AAMAS
, 2011
"... Potentialbased reward shaping has previously been proven to both be equivalent to Qtable initialisation and guarantee policy invariance in singleagent reinforcement learning. The method has since been used in multiagent reinforcement learning without consideration of whether the theoretical equi ..."
Abstract

Cited by 21 (12 self)
 Add to MetaCart
(Show Context)
Potentialbased reward shaping has previously been proven to both be equivalent to Qtable initialisation and guarantee policy invariance in singleagent reinforcement learning. The method has since been used in multiagent reinforcement learning without consideration of whether the theoretical equivalence and guarantees hold. This paper extends the existing proofs to similar results in multiagent systems, providing the theoretical background to explain the success of previous empirical studies. Specifically, it is proven that the equivalence to Qtable initialisation remains and the Nash Equilibria of the underlying stochastic game are not modified. Furthermore, we demonstrate empirically that potentialbased reward shaping affects exploration and, consequentially, can alter the joint policy converged upon.
Bestresponse multiagent learning in nonstationary environments
 In Proc. 3rd AAMAS (p
, 2004
"... This paper investigates a relatively new direction in Multiagent Reinforcement Learning. Most multiagent learning techniques focus on Nash equilibria as elements of both the learning algorithm and its evaluation criteria. In contrast, we propose a multiagent learning algorithm that is optimal in th ..."
Abstract

Cited by 21 (1 self)
 Add to MetaCart
(Show Context)
This paper investigates a relatively new direction in Multiagent Reinforcement Learning. Most multiagent learning techniques focus on Nash equilibria as elements of both the learning algorithm and its evaluation criteria. In contrast, we propose a multiagent learning algorithm that is optimal in the sense of finding a bestresponse policy, rather than in reaching an equilibrium. We present the first learning algorithm that is provably optimal against restricted classes of nonstationary opponents. The algorithm infers an accurate model of the opponent’s nonstationary strategy, and simultaneously creates a bestresponse policy against that strategy. Our learning algorithm works within the very general framework of nplayer, generalsum stochastic games, and learns both the game structure and its associated optimal policy. 1.
An Empirical Study of PotentialBased Reward Shaping and Advice in Complex, MultiAgent Systems
 ADVANCES IN COMPLEX SYSTEMS C
, 2011
"... This paper investigates the impact of reward shaping in multiagent reinforcement learning as a way to incorporate domain knowledge about good strategies. In theory, potentialbased reward shaping does not alter the Nash Equilibria of a stochastic game, only the exploration of the shaped agent. We d ..."
Abstract

Cited by 18 (9 self)
 Add to MetaCart
This paper investigates the impact of reward shaping in multiagent reinforcement learning as a way to incorporate domain knowledge about good strategies. In theory, potentialbased reward shaping does not alter the Nash Equilibria of a stochastic game, only the exploration of the shaped agent. We demonstrate empirically the performance of reward shaping in two problem domains within the context of RoboCup KeepAway by designing three reward shaping schemes, encouraging specific behaviour such as keeping a minimum distance from other players on the same team and taking on specific roles. The results illustrate that reward shaping with multiple, simultaneous learning agents can reduce the time needed to learn a suitable policy and can alter the final group performance.