Results 1  10
of
36
The dynamics of reinforcement learning in cooperative multiagent systems
 IN PROCEEDINGS OF NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI98
, 1998
"... Reinforcement learning can provide a robust and natural means for agents to learn how to coordinate their action choices in multiagent systems. We examine some of the factors that can influence the dynamics of the learning process in such a setting. We first distinguish reinforcement learners that a ..."
Abstract

Cited by 305 (1 self)
 Add to MetaCart
Reinforcement learning can provide a robust and natural means for agents to learn how to coordinate their action choices in multiagent systems. We examine some of the factors that can influence the dynamics of the learning process in such a setting. We first distinguish reinforcement learners that are unaware of (or ignore) the presence of other agents from those that explicitly attempt to learn the value of joint actions and the strategies of their counterparts. We study (a simple form of) Qlearning in cooperative multiagent systems under these two perspectives, focusing on the influence of that game structure and exploration strategies on convergence to (optimal and suboptimal) Nash equilibria. We then propose alternative optimistic exploration strategies that increase the likelihood of convergence to an optimal equilibrium.
Reinforcement Learning to Play an Optimal Nash Equilibrium in Team Markov Games
 in Advances in Neural Information Processing Systems
, 2002
"... Multiagent learning is a key problem in game theory and AI. It involves two interrelated learning problems: identifying the game and learning to play. These two problems prevail even in team games where the agents' interests do not conflict. Even team games can have multiple Nash equilibria, only so ..."
Abstract

Cited by 78 (3 self)
 Add to MetaCart
Multiagent learning is a key problem in game theory and AI. It involves two interrelated learning problems: identifying the game and learning to play. These two problems prevail even in team games where the agents' interests do not conflict. Even team games can have multiple Nash equilibria, only some of which are optimal. We present optimal adaptive learning (OAL), the first algorithm that converges to an optimal Nash equilibrium for any team Markov game. We provide a convergence proof, and show that the algorithm's parameters are easy to set so that the convergence conditions are met. Our experiments show that existing algorithms do not converge in many of these problems while OAL does. We also demonstrate the importance of the fundamental ideas behind OAL: incomplete history sampling and biased action selection.
EVOLUTIONARY DRIFT AND EQUILIBRIUM SELECTION
, 1996
"... This paper develops an approach to equilibrium selection in game theory based on studying the equilibriating process through which equilibrium is achieved. The differential equations derived from models of interactive learning typically have stationary states that are not isolated. Instead, Nash equ ..."
Abstract

Cited by 52 (2 self)
 Add to MetaCart
This paper develops an approach to equilibrium selection in game theory based on studying the equilibriating process through which equilibrium is achieved. The differential equations derived from models of interactive learning typically have stationary states that are not isolated. Instead, Nash equilibria that specify the same behavior on the equilibrium path, but different outofequilibrium behavior, appear in connected components of stationary states. The stability properties of these components often depend critically on the perturbations to which the system is subjected. We argue that it is then important to incorporate such drift into the model. A su±cient condition is provided for drift to create stationary states with strong stability properties near a component of equilibria. This result is used to derive comparative static predictions concerning common questions raised in the literature on refinements of Nash equilibrium
Rational coordination in multiagent environments
 JAAMAS
, 2000
"... Abstract. We adopt the decisiontheoretic principle of expected utility maximization as a paradigm for designing autonomous rational agents, and present a framework that uses this paradigm to determine the choice of coordinated action. We endow an agent with a specialized representation that capture ..."
Abstract

Cited by 19 (5 self)
 Add to MetaCart
Abstract. We adopt the decisiontheoretic principle of expected utility maximization as a paradigm for designing autonomous rational agents, and present a framework that uses this paradigm to determine the choice of coordinated action. We endow an agent with a specialized representation that captures the agent’s knowledge about the environment and about the other agents, including its knowledge about their states of knowledge, which can include what they know about the other agents, and so on. This reciprocity leads to a recursive nesting of models. Our framework puts forth a representation for the recursive models and, under the assumption that the nesting of models is finite, uses dynamic programming to solve this representation for the agent’s rational choice of action. Using a decisiontheoretic approach, our work addresses concerns of agent decisionmaking about coordinated action in unpredictable situations, without imposing upon agents predesigned prescriptions, or protocols, about standard rules of interaction. We implemented our method in a number of domains and we show results of coordination among our automated agents, among humancontrolled agents, and among our agents coordinating with humancontrolled agents. Keywords: coordination; rationality; decision theory; game theory; agent modeling 1.
Learning Conventions in Multiagent Stochastic Domains using Likelihood Estimates
 In Proceedings of the Twelfth Conference on Uncertainty in Artificial Intelligence
, 1996
"... Fully cooperative multiagent systems—those in which agents share a joint utility model—is of special interest in AI. A key problem is that of ensuring that the actions of individual agents are coordinated, especially in settings where the agents are autonomous decision makers. We investigate approac ..."
Abstract

Cited by 18 (2 self)
 Add to MetaCart
Fully cooperative multiagent systems—those in which agents share a joint utility model—is of special interest in AI. A key problem is that of ensuring that the actions of individual agents are coordinated, especially in settings where the agents are autonomous decision makers. We investigate approaches to learning coordinated strategies in stochastic domains where an agent’s actions are not directly observable by others. Much recent work in game theory has adopted a Bayesian learning perspective to the more general problem of equilibrium selection, but tends to assume that actions can be observed. We discuss the special problems that arise when actions are not observable, including effects on rates of convergence, and the effect of action failure probabilities and asymmetries. We also use likelihood estimates as a means of generalizing fictitious play learning models in our setting. Finally, we propose the use of maximum likelihood as a means of removing strategies from consideration, with the aim of convergence to a conventional equilibrium, at which point learning and deliberation can cease. 1
Rational interactions in multiagent environments: communication
, 1998
"... We address the issue of rational communicative behavior among autonomous intelligent agents that have to make decisions as to what, to whom, and how to communicate. We treat communicative actions as aimed at increasing the efficiency of interaction among agents. We postulate that a rational speaker ..."
Abstract

Cited by 13 (5 self)
 Add to MetaCart
We address the issue of rational communicative behavior among autonomous intelligent agents that have to make decisions as to what, to whom, and how to communicate. We treat communicative actions as aimed at increasing the efficiency of interaction among agents. We postulate that a rational speaker design a speech act so as to maximally increase the benefit obtained as the result of the interaction. We quantify the gain in the quality of interaction as the expected utility, and we present a framework that allows an agent to compute the expected utility of various communicative actions. Our framework uses the Recursive Modeling Method as the representation of the agent's state of knowledge, including the agent's preferences, abilities and beliefs about the world, as well as the beliefs the agent has about the other agents, the beliefs it has about the other agents ' beliefs, and so on. A decisiontheoretic pragmatics of a communicative act can be then defined as the transformation it induces on the agent's state of knowledge about its decisionmaking situation. This transformation leads to a change in the quality of the interaction, expressed in terms of the benefit to the agent. We analyze decisiontheoretic pragmatics of a number of important communicative acts, and investigate their expected utility using examples.
Synchronous and Asynchronous Learning by Responsive Learning Automata
 Learning and Implementation on the Internet." Manuscript
, 1996
"... We consider the ability of economic agents to learn in a decentralized environment in which agents do not know the (stochastic) payoff matrix and can not observe their opponents' actions; they merely know, at each stage of the game, their own action and the resulting payoff. We discuss the requir ..."
Abstract

Cited by 10 (4 self)
 Add to MetaCart
We consider the ability of economic agents to learn in a decentralized environment in which agents do not know the (stochastic) payoff matrix and can not observe their opponents' actions; they merely know, at each stage of the game, their own action and the resulting payoff. We discuss the requirements for learning in such an environment, and show that a simple probabilistic learning algorithm satisfies two important optimizing properties: i) When placed in an unknown but eventually stationary random environment, they converge in bounded time, in a sense we make precise, to strategies that maximize average payoff. ii) They satisfy a monotonicity property (related to the "law of the effect") in which increasing the payoffs for a given strategy increases the probability of that strategy being played in the future. We then study how groups of such learners interact in a general game. We show that synchronous groups of these learners converge to the serially undominated set. ...
Multiagent Cooperative Search for Portfolio Selection
, 2001
"... this paper because we assume throughout that the total initial wealth of all systems of agents is $1 ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
this paper because we assume throughout that the total initial wealth of all systems of agents is $1
Exploiting Focal Points Among Alternative Solutions: Two Approaches
 Annals of Mathematics and Artificial Intelligence
, 2000
"... Focal points refer to prominent solutions of an interaction, solutions to which agents are drawn. This paper considers how automated agents could use focal points for coordination in communicationimpoverished situations. Coordination is a central theme of Distributed Artificial Intelligence. Much w ..."
Abstract

Cited by 9 (3 self)
 Add to MetaCart
Focal points refer to prominent solutions of an interaction, solutions to which agents are drawn. This paper considers how automated agents could use focal points for coordination in communicationimpoverished situations. Coordination is a central theme of Distributed Artificial Intelligence. Much work in this field can be seen as a search for mechanisms that allow agents with differing knowledge and goals to coordinate their actions for mutual benefit. Additionally, one of the main assumptions of the field is that communication is expensive relative to computation. Thus, coordination techniques that minimize communication are of particular importance. Our purpose in this paper is to consider how to model the process of finding focal points from domainindependent criteria, under the assumption that agents cannot communicate with one another. We consider two alternative approaches for finding focal points, one based on decision theory, the second on steplogic. The first provides for ...