Results 1 - 10
of
56
The dynamics of reinforcement learning in cooperative multiagent systems
- In Proceedings of National Conference on Artificial Intelligence (AAAI-98
, 1998
"... Reinforcement learning can provide a robust and natural means for agents to learn how to coordinate their action choices in multiagent systems. We examine some of the factors that can influence the dynamics of the learning process in such a setting. We first distinguish reinforcement learners that a ..."
Abstract
-
Cited by 249 (1 self)
- Add to MetaCart
Reinforcement learning can provide a robust and natural means for agents to learn how to coordinate their action choices in multiagent systems. We examine some of the factors that can influence the dynamics of the learning process in such a setting. We first distinguish reinforcement learners that are unaware of (or ignore) the presence of other agents from those that explicitly attempt to learn the value of joint actions and the strategies of their counterparts. We study (a simple form of) Q-learning in cooperative multiagent systems under these two perspectives, focusing on the influence of that game structure and exploration strategies on convergence to (optimal and suboptimal) Nash equilibria. We then propose alternative optimistic exploration strategies that increase the likelihood of convergence to an optimal equilibrium. 1
The Communicative Multiagent Team Decision Problem: Analyzing Teamwork Theories and Models
- Journal of Artificial Intelligence Research
, 2002
"... Despite the significant progress in multiagent teamwork, existing research does not address the optimality of its prescriptions nor the complexity of the teamwork problem. Without a characterization of the optimality-complexity tradeoffs, it is impossible to determine whether the assumptions and app ..."
Abstract
-
Cited by 147 (18 self)
- Add to MetaCart
Despite the significant progress in multiagent teamwork, existing research does not address the optimality of its prescriptions nor the complexity of the teamwork problem. Without a characterization of the optimality-complexity tradeoffs, it is impossible to determine whether the assumptions and approximations made by a particular theory gain enough efficiency to justify the losses in overall performance. To provide a tool for use by multiagent researchers in evaluating this tradeoff, we present a unified framework, the COMmunicative Multiagent Team Decision Problem (COM-MTDP). The COM-MTDP model combines and extends existing multiagent theories, such as decentralized partially observable Markov decision processes and economic team theory. In addition to their generality of representation, COM-MTDPs also support the analysis of both the optimality of team performance and the computational complexity of the agents' decision problem. In analyzing complexity, we present a breakdown of the computational complexity of constructing optimal teams under various classes of problem domains, along the dimensions of observability and communication cost. In analyzing optimality, we exploit the COM-MTDP's ability to encode existing teamwork theories and models to encode two instantiations of joint intentions theory taken from the literature. Furthermore, the COM-MTDP model provides a basis for the development of novel team coordination algorithms. We derive a domain-independent criterion for optimal communication and provide a comparative analysis of the two joint intentions instantiations with respect to this optimal policy. We have implemented a reusable, domain-independent software package based on COM-MTDPs to analyze teamwork coordination strategies, and we demons...
Taming Decentralized POMDPs: Towards Efficient Policy Computation for Multiagent Settings
- In IJCAI
, 2003
"... The problem of deriving joint policies for a group of agents that maximize some joint reward function can be modeled as a decentralized partially observable Markov decision process (POMDP). ..."
Abstract
-
Cited by 122 (19 self)
- Add to MetaCart
The problem of deriving joint policies for a group of agents that maximize some joint reward function can be modeled as a decentralized partially observable Markov decision process (POMDP).
Role Allocation and Reallocation in Multiagent Teams: Towards A Practical Analysis
- In AAMAS
, 2003
"... Despite the success of the BDI approach to agent teamwork, initial role allocation (i.e. deciding which agents to allocate to key roles in the team) and role reallocation upon failure remain open challenges. What remain missing are analysis techniques to aid human developers in quantitatively compar ..."
Abstract
-
Cited by 37 (11 self)
- Add to MetaCart
Despite the success of the BDI approach to agent teamwork, initial role allocation (i.e. deciding which agents to allocate to key roles in the team) and role reallocation upon failure remain open challenges. What remain missing are analysis techniques to aid human developers in quantitatively comparing different initial role allocations and competing role reallocation algorithms. To remedy this problem, this paper makes three key contributions. First, the paper introduces RMTDP (Role-based Markov Team Decision Problem), an extension to MTDP [9], for quantitative evaluations of role allocation and reallocation approaches. Second, the paper illustrates an RMTDP-based methodology for not only comparing two competing algorithms for role reallocation, but also for identifying the types of domains where each algorithm is suboptimal, how much each algorithm can be improved and at what computational cost (complexity). Such algorithmic improvements are identified via a new automated procedure that generates a family of locally optimal policies for comparative evaluations. Third, since there are combinatorially many initial role allocations, evaluating each in RMTDP to identify the best is extremely difficult. Therefore, we introduce methods to exploit task decompositions among subteams to significantly prune the search space of initial role allocations. We present experimental results from two distinct domains.
Learning Other Agents' Preferences in Multiagent Negotiation
- in Proceedings of the National Conference on Artificial Intelligence (AAAI-96
, 1996
"... In multiagent systems, an agent does not usually have complete information about the preferences and decision making processes of other agents. This might prevent the agents from making coordinated choices, purely due to their ignorance of what others want. This paper describes the integration of a ..."
Abstract
-
Cited by 25 (2 self)
- Add to MetaCart
In multiagent systems, an agent does not usually have complete information about the preferences and decision making processes of other agents. This might prevent the agents from making coordinated choices, purely due to their ignorance of what others want. This paper describes the integration of a learning module into a communication-intensive negotiating agent architecture. The learning module gives the agents the ability to learn about other agents' preferences via past interactions. Over time, the agents can incrementally update their models of other agents' preferences and use them to make better coordinated decisions. Combining both communication and learning, as two complement knowledge acquisition methods, helps to reduce the amount of communication needed on average, and is justified in situations where communication is computationally costly or simply not desirable (e.g. to preserve the individual privacy). Introduction Multiagent systems are networks of loosely-coupled comp...
An Analysis of Stochastic Game Theory for Multiagent Reinforcement Learning
- Computer Science Department, Carnegie Mellon University
, 2000
"... Learning behaviors in a multiagent environment is crucial for developing and adapting multiagent systems. Reinforcement learning techniques have addressed this problem for a single agent acting in a stationary environment, which is modeled as a Markov decision process (MDP). But, multiagent environm ..."
Abstract
-
Cited by 23 (1 self)
- Add to MetaCart
Learning behaviors in a multiagent environment is crucial for developing and adapting multiagent systems. Reinforcement learning techniques have addressed this problem for a single agent acting in a stationary environment, which is modeled as a Markov decision process (MDP). But, multiagent environments are inherently non-stationary since the other agents are free to change their behavior as they also learn and adapt. Stochastic games, first studied in the game theory community, are a natural extension of MDPs to include multiple agents. In this paper we contribute a comprehensive presentation of the relevant techniques for solving stochastic games from both the game theory community and reinforcement learning communities. We examine the assumptions and limitations of these algorithms, and identify similarities between these algorithms, single agent reinforcement learners, and basic game theory techniques. This research was sponsored by the United States Air Force under Cooperative Agreements No. F30602-97-2-0250 and No. F30602-98-2-0135. The views and conclusions contained in this document are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the Defense Advanced Research Projects Agency (DARPA), the Air Force, or the US Government. Keywords: Multiagent systems, stochastic games, reinforcement learning, game theory 1
Optimal and approximate Q-value functions for decentralized POMDPs
- J. Artificial Intelligence Research
"... Decision-theoretic planning is a popular approach to sequential decision making problems, because it treats uncertainty in sensing and acting in a principled way. In single-agent frameworks like MDPs and POMDPs, planning can be carried out by resorting to Q-value functions: an optimal Q-value functi ..."
Abstract
-
Cited by 22 (9 self)
- Add to MetaCart
Decision-theoretic planning is a popular approach to sequential decision making problems, because it treats uncertainty in sensing and acting in a principled way. In single-agent frameworks like MDPs and POMDPs, planning can be carried out by resorting to Q-value functions: an optimal Q-value function Q ∗ is computed in a recursive manner by dynamic programming, and then an optimal policy is extracted from Q ∗. In this paper we study whether similar Q-value functions can be defined for decentralized POMDP models (Dec-POMDPs), and how policies can be extracted from such value functions. We define two forms of the optimal Q-value function for Dec-POMDPs: one that gives a normative description as the Q-value function of an optimal pure joint policy and another one that is sequentially rational and thus gives a recipe for computation. This computation, however, is infeasible for all but the smallest problems. Therefore, we analyze various approximate Q-value functions that allow for efficient computation. We describe how they relate, and we prove that they all provide an upper bound to the optimal Q-value function Q ∗. Finally, unifying some previous approaches for solving Dec-POMDPs, we describe a family of algorithms for extracting policies from such Q-value functions, and perform an experimental evaluation on existing test problems, including a new firefighting benchmark problem. 1.
Partial-Order Planning with Concurrent Interacting Actions
- Journal of Artificial Intelligence Research
, 2001
"... In order to generate plans for agents with multiple actuators, agent teams, or distributed ..."
Abstract
-
Cited by 22 (0 self)
- Add to MetaCart
In order to generate plans for agents with multiple actuators, agent teams, or distributed

