Results 1  10
of
225
Cooperative MultiAgent Learning: The State of the Art
 Autonomous Agents and MultiAgent Systems
, 2005
"... Cooperative multiagent systems are ones in which several agents attempt, through their interaction, to jointly solve tasks or to maximize utility. Due to the interactions among the agents, multiagent problem complexity can rise rapidly with the number of agents or their behavioral sophistication. ..."
Abstract

Cited by 182 (8 self)
 Add to MetaCart
(Show Context)
Cooperative multiagent systems are ones in which several agents attempt, through their interaction, to jointly solve tasks or to maximize utility. Due to the interactions among the agents, multiagent problem complexity can rise rapidly with the number of agents or their behavioral sophistication. The challenge this presents to the task of programming solutions to multiagent systems problems has spawned increasing interest in machine learning techniques to automate the search and optimization process. We provide a broad survey of the cooperative multiagent learning literature. Previous surveys of this area have largely focused on issues common to specific subareas (for example, reinforcement learning or robotics). In this survey we attempt to draw from multiagent learning work in a spectrum of areas, including reinforcement learning, evolutionary computation, game theory, complex systems, agent modeling, and robotics. We find that this broad view leads to a division of the work into two categories, each with its own special issues: applying a single learner to discover joint solutions to multiagent problems (team learning), or using multiple simultaneous learners, often one per agent (concurrent learning). Additionally, we discuss direct and indirect communication in connection with learning, plus open issues in task decomposition, scalability, and adaptive dynamics. We conclude with a presentation of multiagent learning problem domains, and a list of multiagent learning resources. 1
Nash QLearning for GeneralSum Stochastic Games
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2003
"... We extend Qlearning to a noncooperative multiagent context, using the framework of generalsum stochastic games. A learning agent maintains Qfunctions over joint actions, and performs updates based on assuming Nash equilibrium behavior over the current Qvalues. This learning protocol provably conv ..."
Abstract

Cited by 138 (0 self)
 Add to MetaCart
We extend Qlearning to a noncooperative multiagent context, using the framework of generalsum stochastic games. A learning agent maintains Qfunctions over joint actions, and performs updates based on assuming Nash equilibrium behavior over the current Qvalues. This learning protocol provably converges given certain restrictions on the stage games (defined by Qvalues) that arise during learning. Experiments with a pair of twoplayer grid games suggest that such restrictions on the game structure are not necessarily required. Stage games encountered during learning in both grid environments violate the conditions. However, learning consistently converges in the first grid game, which has a unique equilibrium Qfunction, but sometimes fails to converge in the second, which has three different equilibrium Qfunctions. In a comparison of offline learning performance in both games, we find agents are more likely to reach a joint optimal path with Nash Qlearning than with a singleagent Qlearning method. When at least one agent adopts Nash Qlearning, the performance of both agents is better than using singleagent Qlearning. We have also implemented an online version of Nash Qlearning that balances exploration with exploitation, yielding improved performance.
AWESOME: A general multiagent learning algorithm that converges in selfplay and learns a best response against stationary opponents
, 2003
"... A satisfactory multiagent learning algorithm should, at a minimum, learn to play optimally against stationary opponents and converge to a Nash equilibrium in selfplay. The algorithm that has come closest, WoLFIGA, has been proven to have these two properties in 2player 2action repeated games— as ..."
Abstract

Cited by 97 (5 self)
 Add to MetaCart
(Show Context)
A satisfactory multiagent learning algorithm should, at a minimum, learn to play optimally against stationary opponents and converge to a Nash equilibrium in selfplay. The algorithm that has come closest, WoLFIGA, has been proven to have these two properties in 2player 2action repeated games— assuming that the opponent’s (mixed) strategy is observable. In this paper we present AWESOME, the first algorithm that is guaranteed to have these two properties in all repeated (finite) games. It requires only that the other players ’ actual actions (not their strategies) can be observed at each step. It also learns to play optimally against opponents that eventually become stationary. The basic idea behind AWESOME (Adapt When Everybody is Stationary, Otherwise Move to Equilibrium) is to try to adapt to the others’ strategies when they appear stationary, but otherwise to retreat to a precomputed equilibrium strategy. The techniques used to prove the properties of AWESOME are fundamentally different from those used for previous algorithms, and may help in analyzing other multiagent learning algorithms also.
Convergence and noregret in multiagent learning
 In Advances in Neural Information Processing Systems 17
, 2005
"... Learning in a multiagent system is a challenging problem due to two key factors. First, if other agents are simultaneously learning then the environment is no longer stationary, thus undermining convergence guarantees. Second, learning is often susceptible to deception, where the other agents may be ..."
Abstract

Cited by 85 (0 self)
 Add to MetaCart
(Show Context)
Learning in a multiagent system is a challenging problem due to two key factors. First, if other agents are simultaneously learning then the environment is no longer stationary, thus undermining convergence guarantees. Second, learning is often susceptible to deception, where the other agents may be able to exploit a learner’s particular dynamics. In the worst case, this could result in poorer performance than if the agent was not learning at all. These challenges are identifiable in the two most common evaluation criteria for multiagent learning algorithms: convergence and regret. Algorithms focusing on convergence or regret in isolation are numerous. In this paper, we seek to address both criteria in a single algorithm by introducing GIGAWoLF, a learning algorithm for normalform games. We prove the algorithm guarantees at most zero average regret, while demonstrating the algorithm converges in many situations of selfplay. We prove convergence in a limited setting and give empirical results in a wider variety of situations. These results also suggest a third new learning criterion combining convergence and regret, which we call negative nonconvergence regret (NNR). 1
Emergence of norms through social learning
 PROCEEDINGS OF IJCAI07
, 2007
"... Behavioral norms are key ingredients that allow agent coordination where societal laws do not sufficiently constrain agent behaviors. Whereas social laws need to be enforced in a topdown manner, norms evolve in a bottomup manner and are typically more selfenforcing. While effective norms can sig ..."
Abstract

Cited by 64 (6 self)
 Add to MetaCart
Behavioral norms are key ingredients that allow agent coordination where societal laws do not sufficiently constrain agent behaviors. Whereas social laws need to be enforced in a topdown manner, norms evolve in a bottomup manner and are typically more selfenforcing. While effective norms can significantly enhance performance of individual agents and agent societies, there has been little work in multiagent systems on the formation of social norms. We propose a model that supports the emergence of social norms via learning from interaction experiences. In our model, individual agents repeatedly interact with other agents in the society over instances of a given scenario. Each interaction is framed as a stage game. An agent learns its policy to play the game over repeated interactions with multiple agents. We term this mode of learning social learning, which is distinct from an agent learning from repeated interactions against the same player. We are particularly interested in situations where multiple action combinations yield the same optimal payoff. The key research question is to find out if the entire population learns to converge to a consistent norm. In addition to studying such emergence of social norms among homogeneous learners via social learning, we study the effects of heterogeneous learners, population size, multiple social groups, etc.
New criteria and a new algorithm for learning in multiagent systems
 In Advancesin Neural Information Processing Systems 17
, 2005
"... We propose a new set of criteria for learning algorithms in multiagent systems, one that is more stringent and (we argue) better justified than previous proposed criteria. Our criteria, which apply most straightforwardly in repeated games with average rewards, consist of three requirements: (a) a ..."
Abstract

Cited by 55 (6 self)
 Add to MetaCart
(Show Context)
We propose a new set of criteria for learning algorithms in multiagent systems, one that is more stringent and (we argue) better justified than previous proposed criteria. Our criteria, which apply most straightforwardly in repeated games with average rewards, consist of three requirements: (a) against a specified class of opponents (this class is a parameter of the criterion) the algorithm yield a payoff that approaches the payoff of the best response, (b) against other opponents the algorithm’s payoff at least approach (and possibly exceed) the security level payoff (or maximin value), and (c) subject to these requirements, the algorithm achieve a close to optimal payoff in selfplay. We furthermore require that these average payoffs be achieved quickly. We then present a novel algorithm, and show that it meets these new criteria for a particular parameter class, the class of stationary opponents. Finally, we show that the algorithm is effective not only in theory, but also empirically. Using a recently introduced comprehensive game theoretic test suite, we show that the algorithm almost universally outperforms previous learning algorithms. 1
Learning against opponents with bounded memory
 In IJCAI
, 2005
"... Recently, a number of authors have proposed criteria for evaluating learning algorithms in multiagent systems. While welljustified, each of these has generally given little attention to one of the main challenges of a multiagent setting: the capability of the other agents to adapt and learn as wel ..."
Abstract

Cited by 49 (3 self)
 Add to MetaCart
Recently, a number of authors have proposed criteria for evaluating learning algorithms in multiagent systems. While welljustified, each of these has generally given little attention to one of the main challenges of a multiagent setting: the capability of the other agents to adapt and learn as well. We propose extending existing criteria to apply to a class of adaptive opponents with bounded memory which we describe. We then show an algorithm that provably achieves an ɛbest response against this richer class of opponents while simultaneously guaranteeing a minimum payoff against any opponent and performing well in selfplay. This new algorithm also demonstrates strong performance in empirical tests against a variety of opponents in a wide range of environments. 1
Extending Qlearning to general adaptive multiagent systems
 In Advances in Neural Information Processing Systems 16
, 2004
"... Recent multiagent extensions of QLearning require knowledge of other agents ’ payoffs and Qfunctions, and assume gametheoretic play at all times by all other agents. This paper proposes a fundamentally different approach, dubbed “HyperQ ” Learning, in which values of mixed strategies rather tha ..."
Abstract

Cited by 39 (0 self)
 Add to MetaCart
Recent multiagent extensions of QLearning require knowledge of other agents ’ payoffs and Qfunctions, and assume gametheoretic play at all times by all other agents. This paper proposes a fundamentally different approach, dubbed “HyperQ ” Learning, in which values of mixed strategies rather than base actions are learned, and in which other agents ’ strategies are estimated from observed actions via Bayesian inference. HyperQ may be effective against many different types of adaptive agents, even if they are persistently dynamic. Against certain broad categories of adaptation, it is argued that HyperQ may converge to exact optimal timevarying policies. In tests using RockPaperScissors, HyperQ learns to significantly exploit an Infinitesimal Gradient Ascent (IGA) player, as well as a Policy Hill Climber (PHC) player. Preliminary analysis of HyperQ against itself is also presented. 1
Dynamic CesaroWardrop Equilibration in Networks
 IEEE Transactions on Automatic Control
, 2002
"... We analyze a routing scheme for a broad class of networks which converges (in the Cesaro sense) with probability one to the set of approximate CesaroWardrop equilibria, an extension of the notion of a Wardrop equilibrium. The network model allows for wireline networks where delays are caused by flo ..."
Abstract

Cited by 36 (2 self)
 Add to MetaCart
We analyze a routing scheme for a broad class of networks which converges (in the Cesaro sense) with probability one to the set of approximate CesaroWardrop equilibria, an extension of the notion of a Wardrop equilibrium. The network model allows for wireline networks where delays are caused by flows on links, as well as wireless networks, a primary motivation for us, where delays are caused by other flows in the vicinity of nodes.
Playing is believing: The role of beliefs in multiagent learning
 In Advances in Neural Information Processing Systems 14
, 2001
"... We propose a new classification for multiagent learning algorithms, with each league of players characterized by both their possible strategies and possible beliefs. Using this classification, we review the optimality of existing algorithms, including the case of interleague play. We propose an ..."
Abstract

Cited by 32 (1 self)
 Add to MetaCart
(Show Context)
We propose a new classification for multiagent learning algorithms, with each league of players characterized by both their possible strategies and possible beliefs. Using this classification, we review the optimality of existing algorithms, including the case of interleague play. We propose an incremental improvement to the existing algorithms that seems to achieve average payoffs that are at least the Nash equilibrium payoffs in the longrun against fair opponents.