• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Multiagent reinforcement learning in the iterated prisoner’s dilemma (1995)

by T W Sandholm, R H Crites
Venue:Biosystems
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 56
Next 10 →

Learning Models of Intelligent Agents

by David Carmel, Shaul Markovitch , 1996
"... Agents that operate in a multi-agent system need an efficient strategy to handle their encounters with other agents involved. Searching for an optimal interactive strategy is a hard problem because it depends mostly on the behavior of the others. In this work, interaction among agents is represented ..."
Abstract - Cited by 80 (2 self) - Add to MetaCart
Agents that operate in a multi-agent system need an efficient strategy to handle their encounters with other agents involved. Searching for an optimal interactive strategy is a hard problem because it depends mostly on the behavior of the others. In this work, interaction among agents is represented as a repeated two-player game, where the agents' objective is to look for a strategy that maximizes their expected sum of rewards in the game. We assume that agents' strategies can be modeled as finite automata. A model-based approach is presented as a possible method for learning an effective interactive strategy. First, we describe how an agent should find an optimal strategy against a given model. Second, we present an unsupervised algorithm that infers a model of the opponent's automaton from its input/output behavior. A set of experiments that show the potential merit of the algorithm is reported as well. Introduction In recent years, a major research effort has been invested in desi...

An introduction to collective intelligence

by David H. Wolpert, Kagan Tumer - Handbook of Agent technology. AAAI , 1999
"... ..."
Abstract - Cited by 80 (16 self) - Add to MetaCart
Abstract not found

Elevator Group Control Using Multiple Reinforcement Learning Agents

by Robert H. Crites, Andrew G. Barto, Michael Huhns, Gerhard Weiss - Machine Learning , 1998
"... . Recent algorithmic and theoretical advances in reinforcement learning (RL) have attracted widespread interest. RL algorithms have appeared that approximate dynamic programming on an incremental basis. They can be trained on the basis of real or simulated experiences, focusing their computation on ..."
Abstract - Cited by 68 (2 self) - Add to MetaCart
. Recent algorithmic and theoretical advances in reinforcement learning (RL) have attracted widespread interest. RL algorithms have appeared that approximate dynamic programming on an incremental basis. They can be trained on the basis of real or simulated experiences, focusing their computation on areas of state space that are actually visited during control, making them computationally tractable on very large problems. If each member of a team of agents employs one of these algorithms, a new collective learning algorithm emerges for the team as a whole. In this paper we demonstrate that such collective RL algorithms can be powerful heuristic methods for addressing large--scale control problems. Elevator group control serves as our testbed. It is a difficult domain posing a combination of challenges not seen in most multi-agent learning research to date. We use a team of RL agents, each of which is responsible for controlling one elevator car. The team receives a global reinforcement ...

AWESOME: A General Multiagent Learning Algorithm that Converges in Self-Play and Learns a Best Response against Stationary Opponents

by Vincent Conitzer, Tuomas Sandholm - IN PROCEEDINGS OF THE 20TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING , 2006
"... Two minimal requirements for a satisfactory multiagent learning algorithm are that it 1. learns to play optimally against stationary opponents and 2. converges to a Nash equilibrium in self-play. The previous algorithm that has come closest, WoLF-IGA, has been proven to have these two properties ..."
Abstract - Cited by 57 (5 self) - Add to MetaCart
Two minimal requirements for a satisfactory multiagent learning algorithm are that it 1. learns to play optimally against stationary opponents and 2. converges to a Nash equilibrium in self-play. The previous algorithm that has come closest, WoLF-IGA, has been proven to have these two properties in 2-player 2-action (repeated) games -- assuming that the opponent's mixed strategy is observable. Another algorithm, ReDVaLeR (which was introduced after the algorithm described in this paper), achieves the two properties in games with arbitrary numbers of actions and players, but still requires that the opponents' mixed strategies are observable. In this paper we present AWESOME, the first algorithm that is guaranteed to have the two properties in games with arbitrary numbers of actions and players. It is still the only algorithm that does so while only relying on observing the other players' actual actions (not their mixed strategies). It also learns to play optimally against opponents that eventually become stationary. The basic idea behind AWESOME (Adapt When Everybody is Stationary, Otherwise Move to Equilibrium) is to try to adapt to the others' strategies when they appear stationary, but otherwise to retreat to a precomputed equilibrium strategy. We provide experimental results that suggest that AWESOME converges fast in practice. The techniques used to prove the properties of AWESOME are fundamentally different from those used for previous algorithms, and may help in analyzing future multiagent learning algorithms as well.

Preferential Partner Selection in an Evolutionary Study of Prisoner's Dilemma

by Dan Ashlock , Mark D. Smucker , E. Ann Stanley , Leigh Tesfatsion , 1995
"... Partner selection is an important process in many socialinteractions, permitting individuals to decrease the risks associated with cooperation. In large populations, defectors may escape punishment byroving from partner to partner, but defectors in smaller populations risk social isolation. We inves ..."
Abstract - Cited by 40 (12 self) - Add to MetaCart
Partner selection is an important process in many socialinteractions, permitting individuals to decrease the risks associated with cooperation. In large populations, defectors may escape punishment byroving from partner to partner, but defectors in smaller populations risk social isolation. We investigate these possibilities for an evolutionary prisoner's dilemma in which agents use expected payoffs to choose and refuse partners. In comparison to random or round-robin partner matching, we nd that the average payoffs attained with preferential partner selection tend to be more narrowly confined to fia few isolated payoff regions. Most ecologies evolve to essentially full cooperative behavior, but when agents are intolerant of defections, or when the costs of refusal and social isolation are small, we also see the emergence of wall ower ecologies in which all agents are socially isolated. Between these two extremes, we see the emergence of ecologies whose agents tend to engage in a small number of defections followed by cooperation thereafter. The latter ecologies exhibit a plethora of interesting social interaction patterns.

Opponent Modeling in a Multi-agent System

by David Carmel, Shaul Markovitch - Lecture note in AI, 1042: Adaptation and Learning in Multi-agent Systems, Lecture Notes in Artificial Intelligence , 1995
"... Agents that operate in a multi-agent system need an efficient strategy to handle their encounters with other agents involved in that system. Searching for an optimal interactive strategy is a hard problem because it depends mostly on the behavior of the others. In this work, interaction among agents ..."
Abstract - Cited by 30 (0 self) - Add to MetaCart
Agents that operate in a multi-agent system need an efficient strategy to handle their encounters with other agents involved in that system. Searching for an optimal interactive strategy is a hard problem because it depends mostly on the behavior of the others. In this work, interaction among agents is represented as a repeated two-player game, where an agents' objective is to look for a strategy that maximizes their expected sum of rewards in the game. We assume that agents' strategies can be modeled as finite automata. A model based reasoning approach is presented as a possible method for learning an efficient interactive strategy. First, we describe how an agent should find an optimal strategy against a given model. Second, we present a heuristic algorithm that infers a model of the opponent's automata from its input/output behavior. A set of experiments that show the potential merit of the algorithm is reported as well. Keywords: Opponent modeling, Model based reasoning, Finite au...

Learning Sequences of Actions in Collectives of Autonomous Agents

by Kagan Tumer, Adrian K. Agogino, David H. Wolpert - In Proceedings of the First International Joint Conference on Autonomous Agents and Multi-Agent Systems , 2002
"... In this paper we focus on the problem of designing a collective of autonomous agents that individually learn sequences of actions such that the resultant sequence of joint actions achieves a predetermined global objective. We are particularly interested in instances of this problem where centralized ..."
Abstract - Cited by 29 (17 self) - Add to MetaCart
In this paper we focus on the problem of designing a collective of autonomous agents that individually learn sequences of actions such that the resultant sequence of joint actions achieves a predetermined global objective. We are particularly interested in instances of this problem where centralized control is either impossible or impractical. For single agent systems in similar domains, machine learning methods (e.g., reinforcement learners [18]) have been successfully used [1, 2, 3, 31]. However, applying such solutions directly to multi-agent systems often proves problematic, as agents may work at cross-purposes, or have difficulty in evaluating their contribution to achievement of the global objective, or both. Accordingly, the crucial design step in multiagent systems centers on determining the private objectives of each agent so that as the agents strive for those objectives, the system reaches a good global solution. In this work we consider a version of this problem involving multiple autonomous agents in a grid world. We use concepts from collective intelligence [19, 27, 30] to design goals for the agents that are "aligned" with the global goal, and are "learnable" in that agents can readily see how their behavior affects their utility. We show that reinforcement learning agents using those goals outperform both "natural" extensions of single agent algorithms and global reinforcement learning solutions based on "team games".

On Multiagent Q-Learning in a Semi-competitive Domain

by Tuomas W. Sandholm, Robert H. Crites , 1996
"... Q-learning is a recent reinforcement learning (RL) algorithm that does not need a model of its environment and can be used on-line. Therefore it is ..."
Abstract - Cited by 28 (0 self) - Add to MetaCart
Q-learning is a recent reinforcement learning (RL) algorithm that does not need a model of its environment and can be used on-line. Therefore it is

Large-Scale Dynamic Optimization Using Teams of Reinforcement Learning Agents

by Robert Harry Crites , 1996
"... Recent algorithmic and theoretical advances in reinforcement learning (RL) are attracting widespread interest. RL algorithms have appeared that approximate dynamic programming (DP) on an incremental basis. Unlike traditional DP algorithms, these algorithms do not require knowledge of the state trans ..."
Abstract - Cited by 18 (0 self) - Add to MetaCart
Recent algorithmic and theoretical advances in reinforcement learning (RL) are attracting widespread interest. RL algorithms have appeared that approximate dynamic programming (DP) on an incremental basis. Unlike traditional DP algorithms, these algorithms do not require knowledge of the state transition probabilities or reward structure of a system. This allows them to be trained using real or simulated experiences, focusing their computations on the areas of state space that are actually visited during control, making them computationally tractable on very large problems. RL algorithms can be used as components of multi-agent algorithms. If each member of a team of agents employs one of these algorithms, a new collective learning algor...

Multi-Agent Reinforcement Learning: An Approach Based on the Other Agent's Internal Model

by Yasuo Nagayuki, Shin Ishii, Kenji Doya , 2000
"... The application of reinforcement learning to multiagent systems has attracted recent attention. In a multi-agent environment, whether one agent's action is good or not depends on the other agents' actions. In traditional reinforcement learning methods, which assume stationary environments, it is har ..."
Abstract - Cited by 17 (2 self) - Add to MetaCart
The application of reinforcement learning to multiagent systems has attracted recent attention. In a multi-agent environment, whether one agent's action is good or not depends on the other agents' actions. In traditional reinforcement learning methods, which assume stationary environments, it is hard to take account of the other agent's actions which may change due to learning. In this article, we consider a twoagent cooperation problem, and propose a multi-agent reinforcement learning method based on estimation of the other agent's actions. In our learning method, one agent estimates the other agent's action based on the internal model of the other agent. The internal model is acquired by the observation of the other agent's actions. Through experiments, we demonstrate that good cooperative behaviors are achieved by our learning method. 1. Introduction The realization of cooperative behaviors in multiagent systems is an interesting topic from the viewpoint of engineering and cognitiv...
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University