Results 1  10
of
113
Cooperative MultiAgent Learning: The State of the Art
 Autonomous Agents and MultiAgent Systems
, 2005
"... Cooperative multiagent systems are ones in which several agents attempt, through their interaction, to jointly solve tasks or to maximize utility. Due to the interactions among the agents, multiagent problem complexity can rise rapidly with the number of agents or their behavioral sophistication. ..."
Abstract

Cited by 113 (6 self)
 Add to MetaCart
Cooperative multiagent systems are ones in which several agents attempt, through their interaction, to jointly solve tasks or to maximize utility. Due to the interactions among the agents, multiagent problem complexity can rise rapidly with the number of agents or their behavioral sophistication. The challenge this presents to the task of programming solutions to multiagent systems problems has spawned increasing interest in machine learning techniques to automate the search and optimization process. We provide a broad survey of the cooperative multiagent learning literature. Previous surveys of this area have largely focused on issues common to specific subareas (for example, reinforcement learning or robotics). In this survey we attempt to draw from multiagent learning work in a spectrum of areas, including reinforcement learning, evolutionary computation, game theory, complex systems, agent modeling, and robotics. We find that this broad view leads to a division of the work into two categories, each with its own special issues: applying a single learner to discover joint solutions to multiagent problems (team learning), or using multiple simultaneous learners, often one per agent (concurrent learning). Additionally, we discuss direct and indirect communication in connection with learning, plus open issues in task decomposition, scalability, and adaptive dynamics. We conclude with a presentation of multiagent learning problem domains, and a list of multiagent learning resources. 1
Nash QLearning for GeneralSum Stochastic Games
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2003
"... We extend Qlearning to a noncooperative multiagent context, using the framework of generalsum stochastic games. A learning agent maintains Qfunctions over joint actions, and performs updates based on assuming Nash equilibrium behavior over the current Qvalues. This learning protocol provably conv ..."
Abstract

Cited by 108 (0 self)
 Add to MetaCart
We extend Qlearning to a noncooperative multiagent context, using the framework of generalsum stochastic games. A learning agent maintains Qfunctions over joint actions, and performs updates based on assuming Nash equilibrium behavior over the current Qvalues. This learning protocol provably converges given certain restrictions on the stage games (defined by Qvalues) that arise during learning. Experiments with a pair of twoplayer grid games suggest that such restrictions on the game structure are not necessarily required. Stage games encountered during learning in both grid environments violate the conditions. However, learning consistently converges in the first grid game, which has a unique equilibrium Qfunction, but sometimes fails to converge in the second, which has three different equilibrium Qfunctions. In a comparison of offline learning performance in both games, we find agents are more likely to reach a joint optimal path with Nash Qlearning than with a singleagent Qlearning method. When at least one agent adopts Nash Qlearning, the performance of both agents is better than using singleagent Qlearning. We have also implemented an online version of Nash Qlearning that balances exploration with exploitation, yielding improved performance.
AWESOME: A general multiagent learning algorithm that converges in selfplay and learns a best response against stationary opponents
, 2003
"... A satisfactory multiagent learning algorithm should, at a minimum, learn to play optimally against stationary opponents and converge to a Nash equilibrium in selfplay. The algorithm that has come closest, WoLFIGA, has been proven to have these two properties in 2player 2action repeated games— as ..."
Abstract

Cited by 81 (5 self)
 Add to MetaCart
A satisfactory multiagent learning algorithm should, at a minimum, learn to play optimally against stationary opponents and converge to a Nash equilibrium in selfplay. The algorithm that has come closest, WoLFIGA, has been proven to have these two properties in 2player 2action repeated games— assuming that the opponent’s (mixed) strategy is observable. In this paper we present AWESOME, the first algorithm that is guaranteed to have these two properties in all repeated (finite) games. It requires only that the other players ’ actual actions (not their strategies) can be observed at each step. It also learns to play optimally against opponents that eventually become stationary. The basic idea behind AWESOME (Adapt When Everybody is Stationary, Otherwise Move to Equilibrium) is to try to adapt to the others’ strategies when they appear stationary, but otherwise to retreat to a precomputed equilibrium strategy. The techniques used to prove the properties of AWESOME are fundamentally different from those used for previous algorithms, and may help in analyzing other multiagent learning algorithms also.
Convergence and noregret in multiagent learning
 In Advances in Neural Information Processing Systems 17
, 2005
"... Learning in a multiagent system is a challenging problem due to two key factors. First, if other agents are simultaneously learning then the environment is no longer stationary, thus undermining convergence guarantees. Second, learning is often susceptible to deception, where the other agents may be ..."
Abstract

Cited by 66 (0 self)
 Add to MetaCart
Learning in a multiagent system is a challenging problem due to two key factors. First, if other agents are simultaneously learning then the environment is no longer stationary, thus undermining convergence guarantees. Second, learning is often susceptible to deception, where the other agents may be able to exploit a learner’s particular dynamics. In the worst case, this could result in poorer performance than if the agent was not learning at all. These challenges are identifiable in the two most common evaluation criteria for multiagent learning algorithms: convergence and regret. Algorithms focusing on convergence or regret in isolation are numerous. In this paper, we seek to address both criteria in a single algorithm by introducing GIGAWoLF, a learning algorithm for normalform games. We prove the algorithm guarantees at most zero average regret, while demonstrating the algorithm converges in many situations of selfplay. We prove convergence in a limited setting and give empirical results in a wider variety of situations. These results also suggest a third new learning criterion combining convergence and regret, which we call negative nonconvergence regret (NNR). 1
Learning against opponents with bounded memory
 In IJCAI
, 2005
"... Recently, a number of authors have proposed criteria for evaluating learning algorithms in multiagent systems. While welljustified, each of these has generally given little attention to one of the main challenges of a multiagent setting: the capability of the other agents to adapt and learn as wel ..."
Abstract

Cited by 41 (3 self)
 Add to MetaCart
Recently, a number of authors have proposed criteria for evaluating learning algorithms in multiagent systems. While welljustified, each of these has generally given little attention to one of the main challenges of a multiagent setting: the capability of the other agents to adapt and learn as well. We propose extending existing criteria to apply to a class of adaptive opponents with bounded memory which we describe. We then show an algorithm that provably achieves an ɛbest response against this richer class of opponents while simultaneously guaranteeing a minimum payoff against any opponent and performing well in selfplay. This new algorithm also demonstrates strong performance in empirical tests against a variety of opponents in a wide range of environments. 1
Playing is believing: The role of beliefs in multiagent learning
 In Advances in Neural Information Processing Systems 14
, 2001
"... We propose a new classification for multiagent learning algorithms, with each league of players characterized by both their possible strategies and possible beliefs. Using this classification, we review the optimality of existing algorithms, including the case of interleague play. We propose an ..."
Abstract

Cited by 30 (1 self)
 Add to MetaCart
We propose a new classification for multiagent learning algorithms, with each league of players characterized by both their possible strategies and possible beliefs. Using this classification, we review the optimality of existing algorithms, including the case of interleague play. We propose an incremental improvement to the existing algorithms that seems to achieve average payoffs that are at least the Nash equilibrium payoffs in the longrun against fair opponents.
Dynamic CesaroWardrop Equilibration in Networks
 IEEE Transactions on Automatic Control
, 2002
"... We analyze a routing scheme for a broad class of networks which converges (in the Cesaro sense) with probability one to the set of approximate CesaroWardrop equilibria, an extension of the notion of a Wardrop equilibrium. The network model allows for wireline networks where delays are caused by flo ..."
Abstract

Cited by 27 (2 self)
 Add to MetaCart
We analyze a routing scheme for a broad class of networks which converges (in the Cesaro sense) with probability one to the set of approximate CesaroWardrop equilibria, an extension of the notion of a Wardrop equilibrium. The network model allows for wireline networks where delays are caused by flows on links, as well as wireless networks, a primary motivation for us, where delays are caused by other flows in the vicinity of nodes.
MultiRobot Team Response to a MultiRobot Opponent Team
 In Proceedings of ICRA’03, the 2003 IEEE International Conference on Robotics and Automation
, 2002
"... Adversarial multirobot problems, where teams of robots compete with one another, require the development of approaches that span all levels of control and integrate algorithms ranging from lowlevel robot motion control, through to planning, opponent modeling, and multiagent learning. Smallsize rob ..."
Abstract

Cited by 26 (16 self)
 Add to MetaCart
Adversarial multirobot problems, where teams of robots compete with one another, require the development of approaches that span all levels of control and integrate algorithms ranging from lowlevel robot motion control, through to planning, opponent modeling, and multiagent learning. Smallsize robot soccer, a league within the RoboCup initiative, is a prime example of this multirobot team adversarial environment. In this paper, we describe some of the algorithms and approaches of our robot soccer team, CMDragons'02, developed for RoboCup 2002. Our team represents an integration of many components, several of which that are in themselves stateoftheart, into a framework designed for fast adaptation and response to the changing environment.
Existence of multiagent equilibria with limited agents
 Journal of Artificial Intelligence Research
, 2002
"... Multiagent learning is a necessary yet challenging problem as multiagent systems become more prevalent and environments become more dynamic. Much of the groundbreaking work in this area draws on notable results from game theory, in particular, the concept of Nash equilibria. Learners that directly l ..."
Abstract

Cited by 25 (2 self)
 Add to MetaCart
Multiagent learning is a necessary yet challenging problem as multiagent systems become more prevalent and environments become more dynamic. Much of the groundbreaking work in this area draws on notable results from game theory, in particular, the concept of Nash equilibria. Learners that directly learn an equilibrium obviously rely on their existence. Learners that instead seek to play optimally with respect to the other players also depend upon equilibria since equilibria are fixed points for learning. From another perspective, agents with limitations are real and common. These may be undesired physical limitations as well as selfimposed rational limitations, such as abstraction and approximation techniques, used to make learning tractable. This article explores the interactions of these two important concepts: equilibria and limitations in learning. We introduce the question of whether equilibria continue to exist when agents have limitations. We look at the general effects limitations can have on agent behavior, and define a natural extension of equilibria that accounts for these limitations. Using this formalization, we make three major contributions: (i) a counterexample for the general existence of equilibria with limitations, (ii) sufficient conditions on limitations that preserve their existence, (iii) three general classes of games and limitations that satisfy these conditions. We then present empirical results from a specific multiagent learning algorithm applied to a specific instance of limited agents. These results demonstrate that learning with limitations is feasible, when the conditions outlined by our theoretical analysis hold. 1.
Learning to compete, compromise, and cooperate in repeated generalsum games
 In Proc. 22nd ICML
, 2005
"... Learning algorithms often obtain relatively low average payoffs in repeated generalsum games between other learning agents due to a focus on myopic bestresponse and oneshot Nash equilibrium (NE) strategies. A less myopic approach places focus on NEs of the repeated game, which suggests that (at t ..."
Abstract

Cited by 24 (3 self)
 Add to MetaCart
Learning algorithms often obtain relatively low average payoffs in repeated generalsum games between other learning agents due to a focus on myopic bestresponse and oneshot Nash equilibrium (NE) strategies. A less myopic approach places focus on NEs of the repeated game, which suggests that (at the least) a learning agent should possess two properties. First, an agent should never learn to play a strategy that produces average payoffs less than the minimax value of the game. Second, an agent should learn to cooperate/compromise when beneficial. No learning algorithm from the literature is known to possess both of these properties. We present a reinforcement learning algorithm (MQubed) that provably satisfies the first property and empirically displays (in self play) the second property in a wide range of games. 1.