Results 11  20
of
98
Efficient noregret multiagent learning
 Proceedings of The 20th National Conference on Artificial Intelligence
, 2005
"... We present new results on the efficiency of noregret algorithms in the context of multiagent learning. We use a known approach to augment a large class of noregret algorithms to allow stochastic sampling of actions and observation of scalar reward of only the action played. We show that the avera ..."
Abstract

Cited by 11 (0 self)
 Add to MetaCart
We present new results on the efficiency of noregret algorithms in the context of multiagent learning. We use a known approach to augment a large class of noregret algorithms to allow stochastic sampling of actions and observation of scalar reward of only the action played. We show that the average actual payoffs of the resulting learner gets (1) close to the best response against (eventually) stationary opponents, (2) close to the asymptotic optimal payoff against opponents that play a converging sequence of policies, and (3) close to at least a dynamic variant of minimax payoff against arbitrary opponents, with a high probability in polynomial time. In addition the polynomial bounds are shown to be significantly better than previously known bounds. Furthermore, we do not need to assume that the learner knows the game matrices and can observe the opponents ’ actions, unlike previous work.
A Multiagent Reinforcement Learning Algorithm with Nonlinear Dynamics
"... Several multiagent reinforcement learning (MARL) algorithms have been proposed to optimize agents ’ decisions. Due to the complexity of the problem, the majority of the previously developed MARL algorithms assumed agents either had some knowledge of the underlying game (such as Nash equilibria) and/ ..."
Abstract

Cited by 10 (3 self)
 Add to MetaCart
(Show Context)
Several multiagent reinforcement learning (MARL) algorithms have been proposed to optimize agents ’ decisions. Due to the complexity of the problem, the majority of the previously developed MARL algorithms assumed agents either had some knowledge of the underlying game (such as Nash equilibria) and/or observed other agents actions and the rewards they received. We introduce a new MARL algorithm called the Weighted Policy Learner (WPL), which allows agents to reach a Nash Equilibrium (NE) in benchmark 2player2action games with minimum knowledge. Using WPL, the only feedback an agent needs is its own local reward (the agent does not observe other agents actions or rewards). Furthermore, WPL does not assume that agents know the underlying game or the corresponding Nash Equilibrium a priori. We experimentally show that our algorithm converges in benchmark twoplayertwoaction games. We also show that our algorithm converges in the challenging Shapley’s game where previous MARL algorithms failed to converge without knowing the underlying game or the NE. Furthermore, we show that WPL outperforms the stateoftheart algorithms in a more realistic setting of 100 agents interacting and learning concurrently.
Leading a BestResponse Teammate in an Ad Hoc Team
, 2009
"... Abstract. Teams of agents may not always be developed in a planned, coordinated fashion. Rather, as deployed agents become more common in ecommerce and other settings, there are increasing opportunities for previously unacquainted agents to cooperate in ad hoc team settings. In such scenarios, it i ..."
Abstract

Cited by 10 (4 self)
 Add to MetaCart
(Show Context)
Abstract. Teams of agents may not always be developed in a planned, coordinated fashion. Rather, as deployed agents become more common in ecommerce and other settings, there are increasing opportunities for previously unacquainted agents to cooperate in ad hoc team settings. In such scenarios, it is useful for individual agents to be able to collaborate with a wide variety of possible teammates under the philosophy that not all agents are fully rational. This paper considers an agent that is to interact repeatedly with a teammate that will adapt to this interaction in a particular suboptimal, but natural way. We formalize this setting in gametheoretic terms, provide and analyze a fullyimplemented algorithm for finding optimal action sequences, prove some theoretical results pertaining to the lengths of these action sequences, and provide empirical results pertaining to the prevalence of our problem of interest in random interaction settings. 1
Reaching paretooptimality in prisoner’s dilemma using conditional joint action learning
 Autonomous Agents and MultiAgent Systems
"... We consider a repeated Prisoner’s Dilemma game where two independent learning agents play against each other. We assume that the players can observe each others ’ action but are oblivious to the payoff received by the other player. Multiagent learning literature has provided mechanisms that allow ag ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
We consider a repeated Prisoner’s Dilemma game where two independent learning agents play against each other. We assume that the players can observe each others ’ action but are oblivious to the payoff received by the other player. Multiagent learning literature has provided mechanisms that allow agents to converge to Nash Equilibrium. In this paper we define a special class of learner called a conditional joint action learner (CJAL) who attempts to learn the conditional probability of an action taken by the other given its own action and uses it to decide its next course of action. We prove that when played against itself, if the payoff structure of Prisoner’s Dilemma game satisfies certain conditions, using a limited exploration technique these agents can actually learn to converge to the Pareto optimal solution that dominates the Nash Equilibrium, while maintaining individual rationality. We analytically derive the conditions for which such a phenomenon can occur and have shown experimental results to support our claim.
Optimal efficient learning equilibrium: Imperfect monitoring in symmetric games
 In Proceedings of the National Conference on Artificial Intelligence (AAAI
, 2005
"... Efficient Learning Equilibrium (ELE) is a natural solution concept for multiagent encounters with incomplete information. It requires the learning algorithms themselves to be in equilibrium for any game selected from a set of (initially unknown) games. In an optimal ELE, the learning algorithms wou ..."
Abstract

Cited by 8 (4 self)
 Add to MetaCart
Efficient Learning Equilibrium (ELE) is a natural solution concept for multiagent encounters with incomplete information. It requires the learning algorithms themselves to be in equilibrium for any game selected from a set of (initially unknown) games. In an optimal ELE, the learning algorithms would efficiently obtain the surplus the agents would obtain in an optimal Nash equilibrium of the initially unknown game which is played. The crucial part is that in an ELE deviations from the learning algorithms would become nonbeneficial after polynomial time, although the game played is initially unknown. While appealing conceptually, the main challenge for establishing learning algorithms based on this concept is to isolate general classes of games where an ELE exists. Unfortunately, it has been shown that while an ELE exists for the setting in which each agent can observe all other agents ’ actions and payoffs, an ELE does not exist in general when the other agents ’ payoffs cannot be observed. In this paper we provide the first positive results on this problem, constructively proving the existence of an optimal ELE for the class of symmetric games where an agent can not observe other agents ’ payoffs. 1.
Multiagent Learning Experiments on Repeated Matrix Games
"... This paper experimentally evaluates multiagent learning algorithms playing repeated matrix games to maximize their cumulative return. Previous works assessed that Qlearning surpassed Nashbased multiagent learning algorithms. Based on allagainstall repeated matrix game tournaments, this paper upd ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
(Show Context)
This paper experimentally evaluates multiagent learning algorithms playing repeated matrix games to maximize their cumulative return. Previous works assessed that Qlearning surpassed Nashbased multiagent learning algorithms. Based on allagainstall repeated matrix game tournaments, this paper updates the state of the art of multiagent learning experiments. In a first stage, it shows that MQubed, S and banditbased algorithms such as UCB are the best algorithms on generalsum games, Exp3 being the best on cooperative games and zerosum games. In a second stage, our experiments show that two features forgetting the far past, and using recent history with statesimprove the learning algorithms. Finally, the best algorithms are two new algorithms, Qlearning and UCB enhanced with the two features, and MQubed. 1.
Online Planning for Optimal Protector Strategies in Resource Conservation Games
"... Protecting our environment and natural resources is a major global challenge. “Protectors ” (law enforcement agencies) try to protect these natural resources, while “extractors ” (criminals) seek to exploit them. In many domains, such as illegal fishing, the extractors know more about the distribut ..."
Abstract

Cited by 7 (4 self)
 Add to MetaCart
(Show Context)
Protecting our environment and natural resources is a major global challenge. “Protectors ” (law enforcement agencies) try to protect these natural resources, while “extractors ” (criminals) seek to exploit them. In many domains, such as illegal fishing, the extractors know more about the distribution and richness of the resources than the protectors, making it extremely difficult for the protectors to optimally allocate their assets for patrol and interdiction. Fortunately, extractors carry out frequent illegal extractions, so protectors can learn about the richness of resources by observing the extractor’s behavior. This paper presents an approach for allocating protector assets based on learning from extractors. We make the following four specific contributions: (i) we model resource conservation as a repeated game; (ii) we transform this repeated game into a POMDP by adopting a fixed model for the adversary’s behavior, which cannot be solved by the latest general POMDP solvers due to its exponential state space; (iii) in response, we propose GMOP, a dedicated algorithm that combines Gibbs sampling with Monte Carlo tree search for online planning in this POMDP; (iv) for a specific class of our game, we can speed up the GMOP algorithm without sacrificing solution quality, as well as provide a heuristic that trades off solution quality for lower computational cost.
Teamwork with Limited Knowledge of Teammates
"... While great strides have been made in multiagent teamwork, existing approaches typically assume extensive information exists about teammates and how to coordinate actions. This paper addresses how robust teamwork can still be created even if limited or no information exists about a specific group of ..."
Abstract

Cited by 6 (4 self)
 Add to MetaCart
While great strides have been made in multiagent teamwork, existing approaches typically assume extensive information exists about teammates and how to coordinate actions. This paper addresses how robust teamwork can still be created even if limited or no information exists about a specific group of teammates, as in the ad hoc teamwork scenario. The main contribution of this paper is the first empirical evaluation of an agent cooperating with teammates not created by the authors, where the agent is not provided expert knowledge of its teammates. For this purpose, we develop a generalpurpose teammate modeling method and test the resulting ad hoc team agent’s ability to collaborate with more than 40 unknown teams of agents to accomplish a benchmark task. These agents were designed by people other than the authors without these designers planning for the ad hoc teamwork setting. A secondary contribution of the paper is a new transfer learning algorithm, TwoStageTransfer, that can improve results when the ad hoc team agent does have some limited observations of its current teammates. 1
Robust learning equilibrium
 In Proceedings of the 22th Annual Conference on Uncertainty in Artificial Intelligence (UAI06), 34–41. Corvallis,Oregon: AUAI
, 2006
"... We introduce robust learning equilibrium and apply it to the context of auctions. 1 ..."
Abstract

Cited by 6 (5 self)
 Add to MetaCart
We introduce robust learning equilibrium and apply it to the context of auctions. 1
Approximation guarantees for fictitious play
 In Proceedings of the 47th Annual Allerton Conference on Communication, Control, and Computing
, 2009
"... Abstract—Fictitious play is a simple, wellknown, and oftenused algorithm for playing (and, especially, learning to play) games. However, in general it does not converge to equilibrium; even when it does, we may not be able to run it to convergence. Still, we may obtain an approximate equilibrium. I ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
(Show Context)
Abstract—Fictitious play is a simple, wellknown, and oftenused algorithm for playing (and, especially, learning to play) games. However, in general it does not converge to equilibrium; even when it does, we may not be able to run it to convergence. Still, we may obtain an approximate equilibrium. In this paper, we study the approximation properties that fictitious play obtains when it is run for a limited number of rounds. We show that if both players randomize uniformly over their actions in the first r rounds of fictitious play, then the result is an ǫequilibrium, where ǫ = (r + 1)/(2r). (Since we are examining only a constant number of pure strategies, we know that ǫ < 1/2 is impossible, due to a result of Feder et al.) We show that this bound is tight in the worst case; however, with an experiment on random games, we illustrate that fictitious play usually obtains a much better approximation. We then consider the possibility that the players fail to choose the same r. We show how to obtain the optimal approximation guarantee when both the opponent’s r and the game are adversarially chosen (but there is an upper bound R on the opponent’s r), using a linear program formulation. We show that if the action played in the ith round of fictitious play is chosen with probability proportional to: 1 for i = 1 and 1/(i −1) for all 2 ≤ i ≤ R+1, this gives an approximation guarantee of 1 − 1/(2 + ln R). We also obtain a lower bound of 1 − 4/ln R. This provides an actionable prescription for how long to run fictitious play. I.