Results 1  10
of
36
AWESOME: A general multiagent learning algorithm that converges in selfplay and learns a best response against stationary opponents
, 2003
"... A satisfactory multiagent learning algorithm should, at a minimum, learn to play optimally against stationary opponents and converge to a Nash equilibrium in selfplay. The algorithm that has come closest, WoLFIGA, has been proven to have these two properties in 2player 2action repeated games— as ..."
Abstract

Cited by 91 (5 self)
 Add to MetaCart
A satisfactory multiagent learning algorithm should, at a minimum, learn to play optimally against stationary opponents and converge to a Nash equilibrium in selfplay. The algorithm that has come closest, WoLFIGA, has been proven to have these two properties in 2player 2action repeated games— assuming that the opponent’s (mixed) strategy is observable. In this paper we present AWESOME, the first algorithm that is guaranteed to have these two properties in all repeated (finite) games. It requires only that the other players ’ actual actions (not their strategies) can be observed at each step. It also learns to play optimally against opponents that eventually become stationary. The basic idea behind AWESOME (Adapt When Everybody is Stationary, Otherwise Move to Equilibrium) is to try to adapt to the others’ strategies when they appear stationary, but otherwise to retreat to a precomputed equilibrium strategy. The techniques used to prove the properties of AWESOME are fundamentally different from those used for previous algorithms, and may help in analyzing other multiagent learning algorithms also.
Adaptive Channel Allocation Spectrum Etiquette for Cognitive Radio Networks
 In Proc. of IEEE DySPAN
, 2005
"... In this work, we propose a game theoretic framework to analyze the behavior of cognitive radios for distributed adaptive channel allocation. We define two different objective functions for the spectrum sharing games, which capture the utility of selfish users and cooperative users, respectively. Bas ..."
Abstract

Cited by 84 (2 self)
 Add to MetaCart
(Show Context)
In this work, we propose a game theoretic framework to analyze the behavior of cognitive radios for distributed adaptive channel allocation. We define two different objective functions for the spectrum sharing games, which capture the utility of selfish users and cooperative users, respectively. Based on the utility definition for cooperative users, we show that the channel allocation problem can be formulated as a potential game, and thus converges to a deterministic channel allocation Nash equilibrium point. Alternatively, a noregret learning implementation is proposed for both scenarios and it is shown to have similar performance with the potential game when cooperation is enforced, but with a higher variability across users. The noregret learning formulation is particularly useful to accommodate selfish users. Noncooperative learning games have the advantage of a very low overhead for information exchange in the network. We show that cooperation based spectrum sharing etiquette improves the overall network performance at the expense of an increased overhead required for information exchange.
Performance bounded reinforcement learning in strategic interactions
 In AAAI’04
, 2004
"... Despite increasing deployment of agent technologies in several business and industry domains, user confidence in fully automated agent driven applications is noticeably lacking. The main reasons for such lack of trust in complete automation are scalability and nonexistence of reasonable guarantees ..."
Abstract

Cited by 24 (4 self)
 Add to MetaCart
(Show Context)
Despite increasing deployment of agent technologies in several business and industry domains, user confidence in fully automated agent driven applications is noticeably lacking. The main reasons for such lack of trust in complete automation are scalability and nonexistence of reasonable guarantees in the performance of selfadapting software. In this paper we address the latter issue in the context of learning agents in a Multiagent System (MAS). Performance guarantees for most existing online Multiagent Learning (MAL) algorithms are realizable only in the limit, thereby seriously limiting its practical utility. Our goal is to provide certain meaningful guarantees about the performance of a learner in a MAS, while it is learning. In particular, we present a novel MAL algorithm that (i) converges to a best response against stationary opponents, (ii) converges to a Nash equilibrium in selfplay and (iii) achieves a constant bounded expected regret at any time (noaverageregret asymptotically) in arbitrary sized generalsum games with nonnegative payoffs, and against any number of opponents.
Regret based dynamics: Convergence in weakly acyclic games
 In Proceedings of the 2007 International Conference on Autonomous Agents and Multiagent Systems (AAMAS
, 2007
"... Regret based algorithms have been proposed to control a wide variety of multiagent systems. The appeal of regretbased algorithms is that (1) these algorithms are easily implementable in large scale multiagent systems and (2) there are existing results proving that the behavior will asymptotically ..."
Abstract

Cited by 21 (9 self)
 Add to MetaCart
(Show Context)
Regret based algorithms have been proposed to control a wide variety of multiagent systems. The appeal of regretbased algorithms is that (1) these algorithms are easily implementable in large scale multiagent systems and (2) there are existing results proving that the behavior will asymptotically converge to a set of points of “noregret ” in any game. We illustrate, through a simple example, that noregret points need not reflect desirable operating conditions for a multiagent system. Multiagent systems often exhibit an additional structure (i.e. being “weakly acyclic”) that has not been exploited in the context of regret based algorithms. In this paper, we introduce a modification of regret based algorithms by (1) exponentially discounting the memory and (2) bringing in a notion of inertia in players ’ decision process. We show how these modifications can lead to an entire class of regret based algorithm that provide almost sure convergence to a pure Nash equilibrium in any weakly acyclic game.
Safe strategies for agent modelling in games
 In AAAI Fall Symposium on Artificial Multiagent Learning
, 2004
"... Research in opponent modelling has shown success, but a fundamental question has been overlooked: what happens when a modeller is faced with an opponent that cannot be successfully modelled? Many opponent modellers could do arbitrarily poorly against such an opponent. In this paper, we aim to augmen ..."
Abstract

Cited by 16 (4 self)
 Add to MetaCart
(Show Context)
Research in opponent modelling has shown success, but a fundamental question has been overlooked: what happens when a modeller is faced with an opponent that cannot be successfully modelled? Many opponent modellers could do arbitrarily poorly against such an opponent. In this paper, we aim to augment opponent modelling techniques with a method that enables models to be used safely. We introduce safe strategies, which bound by the possible loss versus a safe value. We also introduce the Safe Policy Selection algorithm (SPS) as a method to vary in a controlled fashion. We prove in the limit that an agent using SPS is guaranteed to attain at least a safety value in the cases when the opponent modelling is ineffective. We also show empirical evidence that SPS does not adversely affect agents that are capable of modelling the opponent. Tests with a domain of complicated modellers show that SPS is effective at eliminating losses while retaining wins in a variety of modelling algorithms.
Learning against multiple opponents
 in Proceedings of the Fifth International Joint Conference on Autonomous Agents and Multiagent
, 2006
"... We address the problem of learning in repeated nplayer (as opposed to 2player) generalsum games, paying particular attention to the rarely addressed situation in which there are a mixture of agents of different types. We propose new criteria requiring that the agents employing a particular learni ..."
Abstract

Cited by 13 (2 self)
 Add to MetaCart
(Show Context)
We address the problem of learning in repeated nplayer (as opposed to 2player) generalsum games, paying particular attention to the rarely addressed situation in which there are a mixture of agents of different types. We propose new criteria requiring that the agents employing a particular learning algorithm work together to achieve a joint bestresponse against a target class of opponents, while guaranteeing they each achieve at least their individual securitylevel payoff against any possible set of opponents. We then provide algorithms that provably meet these criteria for two target classes: stationary strategies and adaptive strategies with a bounded memory. We also demonstrate that the algorithm for stationary strategies outperforms existing algorithms in tests spanning a wide variety of repeated games with more than two players.
Online Multiagent Learning against Memory Bounded Adversaries
"... Abstract. The traditional agenda in Multiagent Learning (MAL) has been to develop learners that guarantee convergence to an equilibrium in selfplay or that converge to playing the best response against an opponent using one of a fixed set of known targeted strategies. This paper introduces an algor ..."
Abstract

Cited by 12 (9 self)
 Add to MetaCart
(Show Context)
Abstract. The traditional agenda in Multiagent Learning (MAL) has been to develop learners that guarantee convergence to an equilibrium in selfplay or that converge to playing the best response against an opponent using one of a fixed set of known targeted strategies. This paper introduces an algorithm called Learn or Exploit for Adversary Induced Markov Decision Process (LoEAIM) that targets optimality against any learning opponent that can be treated as a memory bounded adversary. LoEAIM makes no prior assumptions about the opponent and is tailored to optimally exploit any adversary which induces a Markov decision process in the state space of joint histories. LoEAIM either explores and gathers new information about the opponent or converges to the best response to the partially learned opponent strategy in repeated play. We further extend LoEAIM to account for online repeated interactions against the same adversary with plays against other adversaries interleaved in between. LoEAIMrepeated stores learned knowledge about an adversary, identifies the adversary in case of repeated interaction, and reuses the stored knowledge about the behavior of the adversary to enhance learning in the current epoch of play. LoEAIM and LoEAIMrepeated are fully implemented, with results demonstrating their superiority over other existing MAL algorithms. 1
Efficient noregret multiagent learning
 Proceedings of The 20th National Conference on Artificial Intelligence
, 2005
"... We present new results on the efficiency of noregret algorithms in the context of multiagent learning. We use a known approach to augment a large class of noregret algorithms to allow stochastic sampling of actions and observation of scalar reward of only the action played. We show that the avera ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
(Show Context)
We present new results on the efficiency of noregret algorithms in the context of multiagent learning. We use a known approach to augment a large class of noregret algorithms to allow stochastic sampling of actions and observation of scalar reward of only the action played. We show that the average actual payoffs of the resulting learner gets (1) close to the best response against (eventually) stationary opponents, (2) close to the asymptotic optimal payoff against opponents that play a converging sequence of policies, and (3) close to at least a dynamic variant of minimax payoff against arbitrary opponents, with a high probability in polynomial time. In addition the polynomial bounds are shown to be significantly better than previously known bounds. Furthermore, we do not need to assume that the learner knows the game matrices and can observe the opponents ’ actions, unlike previous work.
Learning to Teach and Follow in Repeated Games
, 2005
"... The goal of a learning agent playing a repeated game is to maximize its payoffs over time. In repeated games with other learning agents, this often requires that an agent must learn to offer and accept profitable compromises. To do so, past research suggests that agents must implement both teaching ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
The goal of a learning agent playing a repeated game is to maximize its payoffs over time. In repeated games with other learning agents, this often requires that an agent must learn to offer and accept profitable compromises. To do so, past research suggests that agents must implement both teaching and following strategies. However, few algorithms successfully employ both kinds of strategies simultaneously. In this paper, we present an algorithm (called SPaM) that employs both kinds of strategies simultaneously in 2player matrix games when the complete game matrix is observable. We show (empirically) that SPaM learns quickly and effectively when associating with a large class of agents, including self, best response learners, and (perhaps) humans.