Results 1  10
of
39
Cooperative MultiAgent Learning: The State of the Art
 Autonomous Agents and MultiAgent Systems
, 2005
"... Cooperative multiagent systems are ones in which several agents attempt, through their interaction, to jointly solve tasks or to maximize utility. Due to the interactions among the agents, multiagent problem complexity can rise rapidly with the number of agents or their behavioral sophistication. ..."
Abstract

Cited by 113 (6 self)
 Add to MetaCart
Cooperative multiagent systems are ones in which several agents attempt, through their interaction, to jointly solve tasks or to maximize utility. Due to the interactions among the agents, multiagent problem complexity can rise rapidly with the number of agents or their behavioral sophistication. The challenge this presents to the task of programming solutions to multiagent systems problems has spawned increasing interest in machine learning techniques to automate the search and optimization process. We provide a broad survey of the cooperative multiagent learning literature. Previous surveys of this area have largely focused on issues common to specific subareas (for example, reinforcement learning or robotics). In this survey we attempt to draw from multiagent learning work in a spectrum of areas, including reinforcement learning, evolutionary computation, game theory, complex systems, agent modeling, and robotics. We find that this broad view leads to a division of the work into two categories, each with its own special issues: applying a single learner to discover joint solutions to multiagent problems (team learning), or using multiple simultaneous learners, often one per agent (concurrent learning). Additionally, we discuss direct and indirect communication in connection with learning, plus open issues in task decomposition, scalability, and adaptive dynamics. We conclude with a presentation of multiagent learning problem domains, and a list of multiagent learning resources. 1
Nash QLearning for GeneralSum Stochastic Games
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2003
"... We extend Qlearning to a noncooperative multiagent context, using the framework of generalsum stochastic games. A learning agent maintains Qfunctions over joint actions, and performs updates based on assuming Nash equilibrium behavior over the current Qvalues. This learning protocol provably conv ..."
Abstract

Cited by 108 (0 self)
 Add to MetaCart
We extend Qlearning to a noncooperative multiagent context, using the framework of generalsum stochastic games. A learning agent maintains Qfunctions over joint actions, and performs updates based on assuming Nash equilibrium behavior over the current Qvalues. This learning protocol provably converges given certain restrictions on the stage games (defined by Qvalues) that arise during learning. Experiments with a pair of twoplayer grid games suggest that such restrictions on the game structure are not necessarily required. Stage games encountered during learning in both grid environments violate the conditions. However, learning consistently converges in the first grid game, which has a unique equilibrium Qfunction, but sometimes fails to converge in the second, which has three different equilibrium Qfunctions. In a comparison of offline learning performance in both games, we find agents are more likely to reach a joint optimal path with Nash Qlearning than with a singleagent Qlearning method. When at least one agent adopts Nash Qlearning, the performance of both agents is better than using singleagent Qlearning. We have also implemented an online version of Nash Qlearning that balances exploration with exploitation, yielding improved performance.
Exploiting Structure to Efficiently Solve Large Scale Partially Observable Markov Decision Processes
, 2005
"... Partially observable Markov decision processes (POMDPs) provide a natural and principled framework to model a wide range of sequential decision making problems under uncertainty. To date, the use of POMDPs in realworld problems has been limited by the poor scalability of existing solution algorithm ..."
Abstract

Cited by 62 (6 self)
 Add to MetaCart
Partially observable Markov decision processes (POMDPs) provide a natural and principled framework to model a wide range of sequential decision making problems under uncertainty. To date, the use of POMDPs in realworld problems has been limited by the poor scalability of existing solution algorithms, which can only solve problems with up to ten thousand states. In fact, the complexity of finding an optimal policy for a finitehorizon discrete POMDP is PSPACEcomplete. In practice, two important sources of intractability plague most solution algorithms: large policy spaces and large state spaces. On the other hand,
Collaborative Multiagent Reinforcement Learning by Payoff Propagation
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2006
"... In this article we describe a set of scalable techniques for learning the behavior of a group of agents in a collaborative multiagent setting. As a basis we use the framework of coordination graphs of Guestrin, Koller, and Parr (2002a) which exploits the dependencies between agents to decompose t ..."
Abstract

Cited by 32 (2 self)
 Add to MetaCart
In this article we describe a set of scalable techniques for learning the behavior of a group of agents in a collaborative multiagent setting. As a basis we use the framework of coordination graphs of Guestrin, Koller, and Parr (2002a) which exploits the dependencies between agents to decompose the global payoff function into a sum of local terms. First, we deal with the singlestate case and describe a payoff propagation algorithm that computes the individual actions that approximately maximize the global payoff function. The method can be viewed as the decisionmaking analogue of belief propagation in Bayesian networks. Second, we focus on learning the behavior of the agents in sequential decisionmaking tasks. We introduce different modelfree reinforcementlearning techniques, unitedly called Sparse Cooperative Qlearning, which approximate the global actionvalue function based on the topology of a coordination graph, and perform updates using the contribution of the individual agents to the maximal global action value. The combined use of an edgebased decomposition of the actionvalue function and the payoff propagation algorithm for efficient action selection, result in an approach that scales only linearly in the problem size. We provide experimental evidence that our method outperforms related multiagent reinforcementlearning methods based on temporal differences.
Bayesian Reinforcement Learning for Coalition Formation under Uncertainty
 In Proc. of AAMAS’04
, 2004
"... Research on coalition formation usually assumes the values of potential coalitions to be known with certainty. Furthermore, settings in which agents lack sufficient knowledge of the capabilities of potential partners is rarely, if ever, touched upon. We remove these often unrealistic assumptions and ..."
Abstract

Cited by 29 (10 self)
 Add to MetaCart
Research on coalition formation usually assumes the values of potential coalitions to be known with certainty. Furthermore, settings in which agents lack sufficient knowledge of the capabilities of potential partners is rarely, if ever, touched upon. We remove these often unrealistic assumptions and propose a model that utilizes Bayesian (multiagent) reinforcement learning in a way that enables coalition participants to reduce their uncertainty regarding coalitional values and the capabilities of others. In addition, we introduce the Bayesian Core, a new stability concept for coalition formation under uncertainty. Preliminary experimental evidence demonstrates the effectiveness of our approach. 1.
Utile coordination: Learning interdependencies among cooperative agents
 In Proceedings of the IEEE Symposium on Computational Intelligence and Games (CIG’05
, 2005
"... a multiagent system to learn where and how to coordinate. The method starts with uncoordinated learners and maintains statistics on expected returns. Coordination dependencies are dynamically added if the statistics indicate a statistically significant benefit. This results in a compact state repres ..."
Abstract

Cited by 19 (0 self)
 Add to MetaCart
a multiagent system to learn where and how to coordinate. The method starts with uncoordinated learners and maintains statistics on expected returns. Coordination dependencies are dynamically added if the statistics indicate a statistically significant benefit. This results in a compact state representation because only necessary coordination is modeled. We apply our method within the framework of coordination graphs in which value rules represent the coordination dependencies between the agents for a specific context. The algorithm is first applied on a small illustrative problem, and next on a large predatorprey problem in which two predators have to capture a single prey. 1
Learning against multiple opponents
 in Proceedings of the Fifth International Joint Conference on Autonomous Agents and Multiagent
, 2006
"... We address the problem of learning in repeated nplayer (as opposed to 2player) generalsum games, paying particular attention to the rarely addressed situation in which there are a mixture of agents of different types. We propose new criteria requiring that the agents employing a particular learni ..."
Abstract

Cited by 13 (2 self)
 Add to MetaCart
We address the problem of learning in repeated nplayer (as opposed to 2player) generalsum games, paying particular attention to the rarely addressed situation in which there are a mixture of agents of different types. We propose new criteria requiring that the agents employing a particular learning algorithm work together to achieve a joint bestresponse against a target class of opponents, while guaranteeing they each achieve at least their individual securitylevel payoff against any possible set of opponents. We then provide algorithms that provably meet these criteria for two target classes: stationary strategies and adaptive strategies with a bounded memory. We also demonstrate that the algorithm for stationary strategies outperforms existing algorithms in tests spanning a wide variety of repeated games with more than two players.
Analyzing and Visualizing Multiagent Rewards in Dynamic and Stochastic Environments
 Journal of Autonomous Agents and Multiagent Systems
, 2008
"... Abstract. The ability to analyze the effectiveness of agent reward structures is critical to the successful design of multiagent learning algorithms. Though final system performance is the best indicator of the suitability of a given reward structure, it is often preferable to analyze the reward pro ..."
Abstract

Cited by 11 (7 self)
 Add to MetaCart
Abstract. The ability to analyze the effectiveness of agent reward structures is critical to the successful design of multiagent learning algorithms. Though final system performance is the best indicator of the suitability of a given reward structure, it is often preferable to analyze the reward properties that lead to good system behavior (i.e., properties promoting coordination among the agents and providing agents with strong signal to noise ratios). This step is particularly helpful in continuous, dynamic, stochastic domains illsuited to simple table backup schemes commonly used in TD(λ)/Qlearning where the effectiveness of the reward structure is difficult to distinguish from the effectiveness of the chosen learning algorithm. In this paper, we present a new reward evaluation method that provides a visualization of the tradeoff between the level of coordination among the agents and the difficulty of the learning problem each agent faces. This method is independent of the learning algorithm and is only a function of the problem domain and the agents’ reward structure. We use this reward property visualization method to determine an effective reward without performing extensive simulations. We then test this method in both a static and a dynamic multirover learning domain where the agents have continuous state spaces and take noisy actions (e.g., the agents ’ movement decisions are not always carried out properly). Our results show that in the more difficult dynamic domain, the reward efficiency visualization method provides a two order of magnitude speedup in selecting good rewards, compared to running a full simulation. In addition, this method facilitates the design and analysis of new rewards tailored to the observational limitations of the domain, providing rewards that combine the best properties of traditional rewards. 1.
Learning to coordinate using commitment sequences in cooperative multiagentsystems
 in Proceedings of the Third Symposium on Adaptive Agents and Multiagent Systems (AAMAS03
, 2003
"... We report on an investigation of the learning of coordination in cooperative multiagent systems. Specifically, we study solutions that are applicable to independent agents, i.e., agents that do not observe one another’s actions and do not explicitly communicate with each other. In previously publish ..."
Abstract

Cited by 10 (2 self)
 Add to MetaCart
We report on an investigation of the learning of coordination in cooperative multiagent systems. Specifically, we study solutions that are applicable to independent agents, i.e., agents that do not observe one another’s actions and do not explicitly communicate with each other. In previously published work (Kapetanakis and Kudenko, 2002) we have presented a reinforcement learning approach that converges to the optimal joint action even in scenarios with high miscoordination costs. However, this approach failed in fully stochastic environments. In this paper, we present a novel approach based on reward estimation with a shared actionselection protocol. The new technique is applicable in fully stochastic environments where mutual observation of actions is not possible. We demonstrate empirically that our approach causes the agents to converge almost always to the optimal joint action even in difficult stochastic scenarios with high miscoordination penalties. 1