Results 1 - 10
of
24
AWESOME: A General Multiagent Learning Algorithm that Converges in Self-Play and Learns a Best Response against Stationary Opponents
- IN PROCEEDINGS OF THE 20TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING
, 2006
"... Two minimal requirements for a satisfactory multiagent learning algorithm are that it 1. learns to play optimally against stationary opponents and 2. converges to a Nash equilibrium in self-play. The previous algorithm that has come closest, WoLF-IGA, has been proven to have these two properties ..."
Abstract
-
Cited by 57 (5 self)
- Add to MetaCart
Two minimal requirements for a satisfactory multiagent learning algorithm are that it 1. learns to play optimally against stationary opponents and 2. converges to a Nash equilibrium in self-play. The previous algorithm that has come closest, WoLF-IGA, has been proven to have these two properties in 2-player 2-action (repeated) games -- assuming that the opponent's mixed strategy is observable. Another algorithm, ReDVaLeR (which was introduced after the algorithm described in this paper), achieves the two properties in games with arbitrary numbers of actions and players, but still requires that the opponents' mixed strategies are observable. In this paper we present AWESOME, the first algorithm that is guaranteed to have the two properties in games with arbitrary numbers of actions and players. It is still the only algorithm that does so while only relying on observing the other players' actual actions (not their mixed strategies). It also learns to play optimally against opponents that eventually become stationary. The basic idea behind AWESOME (Adapt When Everybody is Stationary, Otherwise Move to Equilibrium) is to try to adapt to the others' strategies when they appear stationary, but otherwise to retreat to a precomputed equilibrium strategy. We provide experimental results that suggest that AWESOME converges fast in practice. The techniques used to prove the properties of AWESOME are fundamentally different from those used for previous algorithms, and may help in analyzing future multiagent learning algorithms as well.
If multi-agent learning is the answer, what is the question?
- ARTIFICIAL INTELLIGENCE
, 2007
"... The area of learning in multi-agent systems is today one of the most fertile grounds for interaction between game theory and artificial intelligence. We focus on the foundational questions in this interdisciplinary area, and identify several distinct agendas that ought to, we argue, be separated. Th ..."
Abstract
-
Cited by 43 (0 self)
- Add to MetaCart
The area of learning in multi-agent systems is today one of the most fertile grounds for interaction between game theory and artificial intelligence. We focus on the foundational questions in this interdisciplinary area, and identify several distinct agendas that ought to, we argue, be separated. The goal of this article is to start a discussion in the research community that will result in firmer foundations for the area.
Learning against opponents with bounded memory
- In IJCAI
, 2005
"... Recently, a number of authors have proposed criteria for evaluating learning algorithms in multiagent systems. While well-justified, each of these has generally given little attention to one of the main challenges of a multi-agent setting: the capability of the other agents to adapt and learn as wel ..."
Abstract
-
Cited by 32 (2 self)
- Add to MetaCart
Recently, a number of authors have proposed criteria for evaluating learning algorithms in multiagent systems. While well-justified, each of these has generally given little attention to one of the main challenges of a multi-agent setting: the capability of the other agents to adapt and learn as well. We propose extending existing criteria to apply to a class of adaptive opponents with bounded memory which we describe. We then show an algorithm that provably achieves an ɛ-best response against this richer class of opponents while simultaneously guaranteeing a minimum payoff against any opponent and performing well in self-play. This new algorithm also demonstrates strong performance in empirical tests against a variety of opponents in a wide range of environments. 1
A general criterion and an algorithmic framework for learning in multi-agent systems
- Machine Learning
, 2007
"... in multi-agent systems ..."
On the agenda(s) of research on multi-agent learning
- In AAAI 2004 Symposium on Artificial Multi-Agent Learning
, 2004
"... We survey the recent work in AI on multi-agent reinforcement learning (that is, learning in stochastic games). After tracing a representative sample of the recent literature, we argue that, while exciting, much of this work suffers from a fundamental lack of clarity about the problem or problems bei ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
We survey the recent work in AI on multi-agent reinforcement learning (that is, learning in stochastic games). After tracing a representative sample of the recent literature, we argue that, while exciting, much of this work suffers from a fundamental lack of clarity about the problem or problems being addressed. We then propose five well-defined problems in multi-agent reinforcement learning and single out one that in our view is both well-suited for AI and has not yet been adequately addressed. We conclude with some remarks about how we believe progress is to be made on this problem.
Best-Response Play In Partially Observable Card Games
- In Benelearn 2005: Proceedings of the 14th Annual Machine Learning Conference of Belgium and the Netherlands
, 2005
"... We address the problem of how to play optimally against a fixed opponent in a twoplayer card game with partial information like poker. A game theoretic approach to this problem would specify a pair of stochastic policies that are best-responses to each other, i.e., a Nash equilibrium. Although ..."
Abstract
-
Cited by 6 (5 self)
- Add to MetaCart
We address the problem of how to play optimally against a fixed opponent in a twoplayer card game with partial information like poker. A game theoretic approach to this problem would specify a pair of stochastic policies that are best-responses to each other, i.e., a Nash equilibrium. Although such a Nash-optimal policy guarantees a lower bound to the attainable payo# against any opponent, it may not necessarily be optimal against a fixed opponent. We show here that if the opponent's policy is fixed (either known or estimated by repeated play), then we can model the problem as a partially observable Markov decision process (POMDP) from the perspective of one agent, and solve it by dynamic programming. In particular, for a large class of card games including poker, the derived POMDP consists of a finite number of belief states and it can be solved exactly.
Game-Theoretic Recommendations: Some Progress in an Uphill Battle
"... Game theory has become the central language for the analysis of multi-agent systems. Moreover, the central gametheoretic solution concept, the Nash equilibrium, has become a standard tool for that analysis. A game is a general ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Game theory has become the central language for the analysis of multi-agent systems. Moreover, the central gametheoretic solution concept, the Nash equilibrium, has become a standard tool for that analysis. A game is a general
Multi-agent Learning Experiments on Repeated Matrix Games
"... This paper experimentally evaluates multiagent learning algorithms playing repeated matrix games to maximize their cumulative return. Previous works assessed that Q-learning surpassed Nash-based multi-agent learning algorithms. Based on all-againstall repeated matrix game tournaments, this paper upd ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
This paper experimentally evaluates multiagent learning algorithms playing repeated matrix games to maximize their cumulative return. Previous works assessed that Q-learning surpassed Nash-based multi-agent learning algorithms. Based on all-againstall repeated matrix game tournaments, this paper updates the state of the art of multiagent learning experiments. In a first stage, it shows that M-Qubed, S and bandit-based algorithms such as UCB are the best algorithms on general-sum games, Exp3 being the best on cooperative games and zero-sum games. In a second stage, our experiments show that two features- forgetting the far past, and using recent history with states-improve the learning algorithms. Finally, the best algorithms are two new algorithms, Q-learning and UCB enhanced with the two features, and M-Qubed. 1.
Multiagent Learning in Adaptive Dynamic Systems
"... Classically, an approach to the multiagent policy learning supposed that the agents, via interactions and/or by using preliminary knowledge about the reward functions of all players, would find an interdependent solution called “equilibrium”. Recently, however, certain researchers question the neces ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Classically, an approach to the multiagent policy learning supposed that the agents, via interactions and/or by using preliminary knowledge about the reward functions of all players, would find an interdependent solution called “equilibrium”. Recently, however, certain researchers question the necessity and the validity of the concept of equilibrium as the most important multiagent solution concept. They argue that a “good ” learning algorithm is one that is efficient with respect to a certain class of counterparts. Adaptive players is an important class of agents that learn their policies separately from the maintenance of the beliefs about their counterparts ’ future actions and make their decisions based on that policy and the current belief. In this paper, we propose an efficient learning algorithm in presence of the adaptive counterparts called Adaptive Dynamics Learner (ADL), which is able to learn an efficient policy over the opponents ’ adaptive dynamics rather than over the simple actions and beliefs and, by so doing, to exploit these dynamics to obtain a higher utility than any equilibrium strategy can provide. We tested our algorithm on a substantial representative set of the most known and demonstrative matrix games and observed that ADL agent is highly efficient against Adaptive Play Q-learning (APQ) agent and Infinitesimal Gradient Ascent (IGA) agent. In self-play, when possible, ADL is able to converge to a Pareto optimal strategy maximizing the welfare of all players.
Multi-Agent Reinforcement Learning for Intrusion Detection
"... Abstract. Intrusion Detection has been investigated for many years and the field has matured. Nevertheless, there are still important challenges, e.g., how an IDS can detect new and complex distributed attacks. To tackle these problems, we propose a distributed Reinforcement Learning (RL) approach i ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract. Intrusion Detection has been investigated for many years and the field has matured. Nevertheless, there are still important challenges, e.g., how an IDS can detect new and complex distributed attacks. To tackle these problems, we propose a distributed Reinforcement Learning (RL) approach in a hierarchical architecture of network sensor agents. Each network sensor agent learns to interpret local state observations, and communicates them to a central agent higher up in the agent hierarchy. These central agents, in turn, learn to send signals up the hierarchy, based on the signals that they receive. Finally, the agent at the top of the hierarchy learns when to signal an intrusion alarm. We evaluate our approach in an abstract network domain. 1

