Results 1  10
of
12
AWESOME: A general multiagent learning algorithm that converges in selfplay and learns a best response against stationary opponents
, 2003
"... A satisfactory multiagent learning algorithm should, at a minimum, learn to play optimally against stationary opponents and converge to a Nash equilibrium in selfplay. The algorithm that has come closest, WoLFIGA, has been proven to have these two properties in 2player 2action repeated games— as ..."
Abstract

Cited by 81 (5 self)
 Add to MetaCart
A satisfactory multiagent learning algorithm should, at a minimum, learn to play optimally against stationary opponents and converge to a Nash equilibrium in selfplay. The algorithm that has come closest, WoLFIGA, has been proven to have these two properties in 2player 2action repeated games— assuming that the opponent’s (mixed) strategy is observable. In this paper we present AWESOME, the first algorithm that is guaranteed to have these two properties in all repeated (finite) games. It requires only that the other players ’ actual actions (not their strategies) can be observed at each step. It also learns to play optimally against opponents that eventually become stationary. The basic idea behind AWESOME (Adapt When Everybody is Stationary, Otherwise Move to Equilibrium) is to try to adapt to the others’ strategies when they appear stationary, but otherwise to retreat to a precomputed equilibrium strategy. The techniques used to prove the properties of AWESOME are fundamentally different from those used for previous algorithms, and may help in analyzing other multiagent learning algorithms also.
Learning to compete, compromise, and cooperate in repeated generalsum games
 In Proc. 22nd ICML
, 2005
"... Learning algorithms often obtain relatively low average payoffs in repeated generalsum games between other learning agents due to a focus on myopic bestresponse and oneshot Nash equilibrium (NE) strategies. A less myopic approach places focus on NEs of the repeated game, which suggests that (at t ..."
Abstract

Cited by 24 (3 self)
 Add to MetaCart
Learning algorithms often obtain relatively low average payoffs in repeated generalsum games between other learning agents due to a focus on myopic bestresponse and oneshot Nash equilibrium (NE) strategies. A less myopic approach places focus on NEs of the repeated game, which suggests that (at the least) a learning agent should possess two properties. First, an agent should never learn to play a strategy that produces average payoffs less than the minimax value of the game. Second, an agent should learn to cooperate/compromise when beneficial. No learning algorithm from the literature is known to possess both of these properties. We present a reinforcement learning algorithm (MQubed) that provably satisfies the first property and empirically displays (in self play) the second property in a wide range of games. 1.
Effect of referrals on convergence to satisficing distributions
 In AAMAS ’05: Proceedings of the fourth international joint conference on Autonomous agents and multiagent systems
, 2005
"... We investigate a framework where agents locate highquality service providers by using referrals from peer agents. The performance of providers is measured by the satisfaction obtained by agents from using their services. Provider performance depends upon its intrinsic capability and upon its curren ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
We investigate a framework where agents locate highquality service providers by using referrals from peer agents. The performance of providers is measured by the satisfaction obtained by agents from using their services. Provider performance depends upon its intrinsic capability and upon its current load. We present an algorithm for selecting a service provider for a given task which includes mechanisms for deciding when and who to ask for a referral. This mechanism requires learning, over interactions, both the performance levels of different service providers, as well as the quality of referrals provided by other agents. We use a satisficing rather than an optimizing framework, where agents are content to receive service quality above a threshold. Agents have to learn the quality of others ’ referrals and the quality of providers to find satisficing providers. We compare the effectiveness of referral systems with or without deception with systems without referrals. We identify zones, based on an observed entropy metric, where using referrals is helpful in promoting fast convergence to satisficing distributions.
The success and failure of tagmediated evolution of cooperation
 In
, 2005
"... Abstract. Use of tags to limit partner selection for playing has been shown to produce stable cooperation in agent populations playing the Prisoner’s Dilemma game. There is, however, a lack of understanding of how and why tags facilitate such cooperation. We start with an empirical investigation tha ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
Abstract. Use of tags to limit partner selection for playing has been shown to produce stable cooperation in agent populations playing the Prisoner’s Dilemma game. There is, however, a lack of understanding of how and why tags facilitate such cooperation. We start with an empirical investigation that identifies the key dynamics that result in sustainable cooperation in PD. Sufficiently long tags are needed to achieve this effect. A theoretical analysis shows that multiple simulation parameters including tag length, mutation rate and population size will have significant effect on sustaining cooperation. Experiments partially validate these observations. Additionally, we claim that tags only promote mimicking and not coordinated behavior in general, i.e., tags can promote cooperation only if cooperation requires identical actions from all group members. We illustrate the failure of the tag model to sustain cooperation by experimenting with domains where agents need to take complementary actions to maximize payoff. 1
Learning to Cooperate in MultiAgent Social Dilemmas ABSTRACT
"... In many MultiAgent Systems (MAS), agents (even if selfinterested) need to cooperate in order to maximize their own utilities. Most of the multiagent learning algorithms focus on oneshot games, whose rational optimal solution is a Nash Equilibrium. Many times, these solutions are no longer optimal ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
In many MultiAgent Systems (MAS), agents (even if selfinterested) need to cooperate in order to maximize their own utilities. Most of the multiagent learning algorithms focus on oneshot games, whose rational optimal solution is a Nash Equilibrium. Many times, these solutions are no longer optimal in repeated interactions, where, in the long run, other more profitable (Pareto Efficient) equilibrium points emerge (e.g., in the iterated Prisoner’s Dilemma). The goal of this work is to improve existing rational Reinforcement Learning (RL) algorithms, that typically learn the oneshot Nash Equilibrium solution, using design principles that foster the reaching of the Pareto Efficient equilibrium. In this paper we propose two principles (Change or Learn Fast and Change and Keep) aimed at improving cooperation among Qlearning (a popular RL algorithm) agents in selfplay. Using MASD (MultiAgent Social Dilemma), an nplayer and maction version of the iterated prisoner’s dilemma, we show how a bestresponse learning algorithm, such as Qlearning, modified as proposed, can achieve better cooperative solutions in a shorter time. To test the robustness of the proposed approaches, we present an analysis of their sensitiveness to several learning parameters and some variants of MASD.
Learning To Cooperate in a Social Dilemma: A Satisficing Approach to Bargaining
"... Learning in many multiagent settings is inherently repeated play. This calls into question the naive application of single play Nash equilibria in multiagent learning and suggests, instead, the application of giveandtake principles of bargaining. We modify and analyze a satisficing algorithm base ..."
Abstract
 Add to MetaCart
Learning in many multiagent settings is inherently repeated play. This calls into question the naive application of single play Nash equilibria in multiagent learning and suggests, instead, the application of giveandtake principles of bargaining. We modify and analyze a satisficing algorithm based on (Karandikar et al., 1998) that is compatible with the bargaining perspective. This algorithm is a form of relaxation search that converges to a satisficing equilibrium without knowledge of game payoffs or other agents’ actions. We then develop an M action, N player social dilemma that encodes the key elements of the Prisoner’s Dilemma. This game is instructive because it characterizes social dilemmas with more than two agents and more than two choices. We show how several different multiagent learning algorithms behave in this social dilemma, and demonstrate that the satisficing algorithm converges, with high probability, to a Pareto efficient solution in self play and to the single play Nash equilibrium against selfish agents. Finally, we present theoretical results that characterize the behavior of the algorithm. 1.
I.2.8 [Artificial Intelligence]: Learning—reinforcement
"... Multiagent learning literature has investigated iterated twoplayer games to develop mechanisms that allow agents to learn to converge on Nash Equilibrium strategy profiles. Such equilibrium configuration implies that there is no motivation for one player to change its strategy if the other does not. ..."
Abstract
 Add to MetaCart
Multiagent learning literature has investigated iterated twoplayer games to develop mechanisms that allow agents to learn to converge on Nash Equilibrium strategy profiles. Such equilibrium configuration implies that there is no motivation for one player to change its strategy if the other does not. Often, in general sum games, a higher payoff can be obtained by both players if one chooses not to respond optimally to the other player. By developing mutual trust, agents can avoid iterated best responses that will lead to a lesser payoff Nash Equilibrium. In this paper we work with agents who select actions based on expected utility calculations that incorporates the observed frequencies of the actions of the opponent(s). We augment this stochasticallygreedy agents with an interesting action revelation strategy that involves strategic revealing of one’s action to avoid worstcase, pessimistic moves. We argue that in certain situations, such apparently risky revealing can indeed produce better payoff than a nonrevealing approach. In particular, it is possible to obtain Paretooptimal solutions that dominate Nash Equilibrium. We present results over a large number of randomly generated payoff matrices of varying sizes and compare the payoffs of strategically revealing learners to payoffs at Nash equilibrium.
Analyzing the effects of tags on promoting cooperation in Prisoner’s Dilemma
"... In some twoplayer games, e.g., the Prisoner’s Dilemma(PD), myopic decisions can produce poor performance both for the individual and the agent collection (society). When such games are played repeatedly between agents in a society, auxiliary mechanisms can be used to mitigate such lack of coordinat ..."
Abstract
 Add to MetaCart
In some twoplayer games, e.g., the Prisoner’s Dilemma(PD), myopic decisions can produce poor performance both for the individual and the agent collection (society). When such games are played repeatedly between agents in a society, auxiliary mechanisms can be used to mitigate such lack of coordination. By biasing which other agents a particular agent can play, we can promote social coordination. Use of tags to limit partner selection for playing has been shown to produce stable cooperation in agent populations playing PD. There is, however, a lack of understanding of why sufficiently long tags are needed to achieve this effect. We empirically characterize the population features produced by longer tags that enable sustained cooperation. A theoretical analysis shows that similar effects can be obtained by increasing mutation rate and population size. Experiments partially validate these observations. We also predict that such increases may ultimately be detrimental at larger values.
Satisficing MultiAgent Learning: A Simple But Powerful Algorithm
, 2008
"... Learning in the presence of adaptive, possibly antagonistic, agents presents special challenges to algorithm designers, especially in environments with limited information. We consider situations in which an agent knows its own set of actions and observes its own payoffs, but does not know or observ ..."
Abstract
 Add to MetaCart
Learning in the presence of adaptive, possibly antagonistic, agents presents special challenges to algorithm designers, especially in environments with limited information. We consider situations in which an agent knows its own set of actions and observes its own payoffs, but does not know or observe the actions and payoffs of the other agents. Despite this limited information, a robust learning algorithm must have two properties: security, which requires the algorithm to avoid exploitation by antagonistic agents, and efficiency, which requires the algorithm to find nearly pareto efficient solutions when associating with agents who are inclined to cooperate. However, no learning algorithm in the literature has both of these properties when playing repeated generalsum games in these limitedinformation environments. In this paper, we present and analyze a variation of Karandikar et al.’s learning algorithm [19]. The algorithm is conceptually very simple, but has surprising power given this simplicity. It is provably secure in all matrix games, regardless of the play of its associates, and it is efficient in self play in a very large set of matrix games. Additionally, the algorithm performs well when associating with representative, stateoftheart learning algorithms with similar representational capabilities in generalsum games. These properties make the algorithm highly robust, more so than representative bestresponse and regretminimizing algorithms with similar reasoning capabilities.
Convergence to Pareto Optimality in General Sum Games via Learning Opponent’s Preference
"... We consider the learning problem faced by two selfinterested agents playing any generalsum game repeatedly where the opponent payoff is unknown. The concept of Nash Equilibrium in repeated games provides us an individually rational solution for playing such games and can be achieved by playing the ..."
Abstract
 Add to MetaCart
We consider the learning problem faced by two selfinterested agents playing any generalsum game repeatedly where the opponent payoff is unknown. The concept of Nash Equilibrium in repeated games provides us an individually rational solution for playing such games and can be achieved by playing the Nash Equilibrium strategy for the singleshot game in every iteration. However, such a strategy can sometimes lead to a Paretodominated outcome for the repeated game. Our goal is to design learning strategies that converge to a Paretoefficient outcome that also produces a Nash Equilibrium payoff for repeated two player naction generalsum games. We present a learning algorithm, POSNEL, which learns opponent’s preference structure and produces, under selfplay, Nash equilibrium payoffs in the limit in all such games. We also show that such learning will generate Paretooptimal payoffs in a large majority of games. We derive a probability bound for convergence to Nash Equilibrium payoff and experimentally demonstrate convergence to Pareto optimality for all structurally distinct 2player 2action conflict games. We also compare our algorithm with existing algorithms such as WOLFIGA and JAL and showed that POSNEL on average, outperforms both the algorithms. 1.