Results 1 - 10
of
138
Cooperative Multi-Agent Learning: The State of the Art
- Autonomous Agents and Multi-Agent Systems
, 2005
"... Cooperative multi-agent systems are ones in which several agents attempt, through their interaction, to jointly solve tasks or to maximize utility. Due to the interactions among the agents, multi-agent problem complexity can rise rapidly with the number of agents or their behavioral sophistication. ..."
Abstract
-
Cited by 182 (8 self)
- Add to MetaCart
(Show Context)
Cooperative multi-agent systems are ones in which several agents attempt, through their interaction, to jointly solve tasks or to maximize utility. Due to the interactions among the agents, multi-agent problem complexity can rise rapidly with the number of agents or their behavioral sophistication. The challenge this presents to the task of programming solutions to multi-agent systems problems has spawned increasing interest in machine learning techniques to automate the search and optimization process. We provide a broad survey of the cooperative multi-agent learning literature. Previous surveys of this area have largely focused on issues common to specific subareas (for example, reinforcement learning or robotics). In this survey we attempt to draw from multi-agent learning work in a spectrum of areas, including reinforcement learning, evolutionary computation, game theory, complex systems, agent modeling, and robotics. We find that this broad view leads to a division of the work into two categories, each with its own special issues: applying a single learner to discover joint solutions to multi-agent problems (team learning), or using multiple simultaneous learners, often one per agent (concurrent learning). Additionally, we discuss direct and indirect communication in connection with learning, plus open issues in task decomposition, scalability, and adaptive dynamics. We conclude with a presentation of multi-agent learning problem domains, and a list of multi-agent learning resources. 1
Dynamic Programming for Partially Observable Stochastic Games
- IN PROCEEDINGS OF THE NINETEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE
, 2004
"... We develop an exact dynamic programming algorithm for partially observable stochastic games (POSGs). The algorithm is a synthesis of dynamic programming for partially observable Markov decision processes (POMDPs) and iterated elimination of dominated strategies in normal form games. ..."
Abstract
-
Cited by 159 (25 self)
- Add to MetaCart
(Show Context)
We develop an exact dynamic programming algorithm for partially observable stochastic games (POSGs). The algorithm is a synthesis of dynamic programming for partially observable Markov decision processes (POMDPs) and iterated elimination of dominated strategies in normal form games.
Correlated Q-learning
- In Proceedings of the Twentieth International Conference on Machine Learning
, 2003
"... There have been several attempts to design multiagent Q-learning algorithms capable of learning equilibrium policies in general-sum Markov games, just as Q-learning learns optimal policies in Markov decision processes. We introduce correlated Q-learning, one such algorithm based on the correlated eq ..."
Abstract
-
Cited by 58 (2 self)
- Add to MetaCart
There have been several attempts to design multiagent Q-learning algorithms capable of learning equilibrium policies in general-sum Markov games, just as Q-learning learns optimal policies in Markov decision processes. We introduce correlated Q-learning, one such algorithm based on the correlated equilibrium solution concept. Motivated by a fixed point proof of the existence of stationary correlated equilibrium policies in Markov games, we present a generic multiagent Q-learning algorithm of which many popular algorithms are immediate special cases. We also prove that certain variants of correlated (and Nash) Q-learning are guaranteed to converge to stationary correlated (and Nash) equilibrium policies in two special classes of Markov games, namely zero-sum and common-interest. Finally, we show empirically that correlated Q-learning outperforms Nash Q-learning, further justifying the former beyond noting that it is less computationally expensive than the latter.
Multiagent Reinforcement Learning for Multi-Robot Systems: A Survey
, 2004
"... Multiagent reinforcement learning for multirobot systems is a challenging issue in both robotics and artificial intelligence. With the ever increasing interests in theoretical researches and practical applications, currently there have been a lot of efforts towards providing some solutions to this c ..."
Abstract
-
Cited by 28 (0 self)
- Add to MetaCart
Multiagent reinforcement learning for multirobot systems is a challenging issue in both robotics and artificial intelligence. With the ever increasing interests in theoretical researches and practical applications, currently there have been a lot of efforts towards providing some solutions to this challenge. However, there are still many difficulties in scaling up the multiagent reinforcement learning to multi-robot systems. The main objective of this paper is to provide a survey, though not completely on the multiagent reinforcement learning in multi-robot systems. After reviewing important advances in this field, some challenging problems and promising research directions are analyzed. A concluding remark is made from the perspectives of the authors.
An intrusion detection game with limited observations
- Proceedings of the 12th Int. Symp. on Dynamic Games and Applications, Sophia Antipolis
, 2006
"... We present a 2-player zero-sum stochastic (Markov) security game which models the interaction between malicious attackers to a system and the IDS who allocates system resources for detection and response. We capture the operation of a sensor network observing and reporting the attack information to ..."
Abstract
-
Cited by 27 (8 self)
- Add to MetaCart
(Show Context)
We present a 2-player zero-sum stochastic (Markov) security game which models the interaction between malicious attackers to a system and the IDS who allocates system resources for detection and response. We capture the operation of a sensor network observing and reporting the attack information to the IDS as a finite Markov chain. Thus, we extend the game theoretic framework in [1] to a stochastic and dynamic one. We analyze the outcomes and evolution of an example game numerically for various game parameters. Furthermore, we study limited information cases where players optimize their strategies offline or online depending on the type of information available, using methods based on Markov decision process and Q-learning. I.
Theoretical considerations of potential-based reward shaping for multi-agent systems
- In Proceedings of the 10th International Conference on Autonomous Agents and Multiagent Systems (AAMAS
, 2011
"... Potential-based reward shaping has previously been proven to both be equivalent to Q-table initialisation and guarantee policy invariance in single-agent reinforcement learning. The method has since been used in multi-agent reinforcement learning without consideration of whether the theoretical equi ..."
Abstract
-
Cited by 21 (12 self)
- Add to MetaCart
(Show Context)
Potential-based reward shaping has previously been proven to both be equivalent to Q-table initialisation and guarantee policy invariance in single-agent reinforcement learning. The method has since been used in multi-agent reinforcement learning without consideration of whether the theoretical equivalence and guarantees hold. This paper extends the existing proofs to similar results in multi-agent systems, providing the theoretical background to explain the success of previous empirical studies. Specifically, it is proven that the equivalence to Q-table initialisation remains and the Nash Equilibria of the underlying stochastic game are not modified. Furthermore, we demonstrate empirically that potential-based reward shaping affects exploration and, consequentially, can alter the joint policy converged upon.
Best-response multiagent learning in non-stationary environments
- In Proc. 3rd AAMAS (p
, 2004
"... This paper investigates a relatively new direction in Mul-tiagent Reinforcement Learning. Most multiagent learning techniques focus on Nash equilibria as elements of both the learning algorithm and its evaluation criteria. In contrast, we propose a multiagent learning algorithm that is optimal in th ..."
Abstract
-
Cited by 21 (1 self)
- Add to MetaCart
(Show Context)
This paper investigates a relatively new direction in Mul-tiagent Reinforcement Learning. Most multiagent learning techniques focus on Nash equilibria as elements of both the learning algorithm and its evaluation criteria. In contrast, we propose a multiagent learning algorithm that is optimal in the sense of finding a best-response policy, rather than in reaching an equilibrium. We present the first learning al-gorithm that is provably optimal against restricted classes of non-stationary opponents. The algorithm infers an accu-rate model of the opponent’s non-stationary strategy, and simultaneously creates a best-response policy against that strategy. Our learning algorithm works within the very gen-eral framework of n-player, general-sum stochastic games, and learns both the game structure and its associated opti-mal policy. 1.
An Empirical Study of Potential-Based Reward Shaping and Advice in Complex, Multi-Agent Systems
- ADVANCES IN COMPLEX SYSTEMS C
, 2011
"... This paper investigates the impact of reward shaping in multi-agent reinforcement learning as a way to incorporate domain knowledge about good strategies. In theory, potential-based reward shaping does not alter the Nash Equilibria of a stochastic game, only the exploration of the shaped agent. We d ..."
Abstract
-
Cited by 18 (9 self)
- Add to MetaCart
This paper investigates the impact of reward shaping in multi-agent reinforcement learning as a way to incorporate domain knowledge about good strategies. In theory, potential-based reward shaping does not alter the Nash Equilibria of a stochastic game, only the exploration of the shaped agent. We demonstrate empirically the performance of reward shaping in two problem domains within the context of RoboCup KeepAway by designing three reward shaping schemes, encouraging specific behaviour such as keeping a minimum distance from other players on the same team and taking on specific roles. The results illustrate that reward shaping with multiple, simultaneous learning agents can reduce the time needed to learn a suitable policy and can alter the final group performance.