Results 1  10
of
73
Solving transition independent decentralized Markov decision processes
 JAIR
, 2004
"... Formal treatment of collaborative multiagent systems has been lagging behind the rapid progress in sequential decision making by individual agents. Recent work in the area of decentralized Markov Decision Processes (MDPs) has contributed to closing this gap, but the computational complexity of thes ..."
Abstract

Cited by 74 (11 self)
 Add to MetaCart
Formal treatment of collaborative multiagent systems has been lagging behind the rapid progress in sequential decision making by individual agents. Recent work in the area of decentralized Markov Decision Processes (MDPs) has contributed to closing this gap, but the computational complexity of these models remains a serious obstacle. To overcome this complexity barrier, we identify a specific class of decentralized MDPs in which the agents ’ transitions are independent. The class consists of independent collaborating agents that are tied together through a structured global reward function that depends on all of their histories of states and actions. We present a novel algorithm for solving this class of problems and examine its properties, both as an optimal algorithm and as an anytime algorithm. To the best of our knowledge, this is the first algorithm to optimally solve a nontrivial subclass of decentralized MDPs. It lays the foundation for further work in this area on both exact and approximate algorithms. 1.
Decentralized control of cooperative systems: Categorization and complexity analysis
 Journal of Artificial Intelligence Research
, 2004
"... Decentralized control of cooperative systems captures the operation of a group of decisionmakers that share a single global objective. The difficulty in solving optimally such problems arises when the agents lack full observability of the global state of the system when they operate. The general pr ..."
Abstract

Cited by 68 (8 self)
 Add to MetaCart
Decentralized control of cooperative systems captures the operation of a group of decisionmakers that share a single global objective. The difficulty in solving optimally such problems arises when the agents lack full observability of the global state of the system when they operate. The general problem has been shown to be NEXPcomplete. In this paper, we identify classes of decentralized control problems whose complexity ranges between NEXP and P. In particular, we study problems characterized by independent transitions, independent observations, and goaloriented objective functions. Two algorithms are shown to solve optimally useful classes of goaloriented decentralized processes in polynomial time. This paper also studies information sharing among the decisionmakers, which can improve their performance. We distinguish between three ways in which agents can exchange information: indirect communication, direct communication and sharing state features that are not controlled by the agents. Our analysis shows that for every class of problems we consider, introducing direct or indirect communication does not change the worstcase complexity. The results provide a better understanding of the complexity of decentralized control problems that arise in practice and facilitate the development of planning algorithms for these problems. 1.
TransitionIndependent Decentralized Markov Decision Processes
, 2003
"... There has been substantial progress with formal models for sequential decision making by individual agents using the Markov decision process (MDP). However, similar treatment of multiagent systems is lacking. A recent complexity result, showing that solving decentralized MDPs is NEXPhard, provides ..."
Abstract

Cited by 61 (12 self)
 Add to MetaCart
There has been substantial progress with formal models for sequential decision making by individual agents using the Markov decision process (MDP). However, similar treatment of multiagent systems is lacking. A recent complexity result, showing that solving decentralized MDPs is NEXPhard, provides a partial explanation. To overcome this complexity barrier, we identify a general class of transitionindependent decentralized MDPs that is widely applicable. The class consists of independent collaborating agents that are tied together through a global reward function that depends upon both of their histories. We present a novel algorithm for solving this class of problems and examine its properties. The result is the first effective technique to solve optimally a class of decentralized MDPs. This lays the foundation for further work in this area on both exact and approximate solutions.
Optimal and approximate Qvalue functions for decentralized POMDPs
 J. Artificial Intelligence Research
"... Decisiontheoretic planning is a popular approach to sequential decision making problems, because it treats uncertainty in sensing and acting in a principled way. In singleagent frameworks like MDPs and POMDPs, planning can be carried out by resorting to Qvalue functions: an optimal Qvalue functi ..."
Abstract

Cited by 39 (16 self)
 Add to MetaCart
Decisiontheoretic planning is a popular approach to sequential decision making problems, because it treats uncertainty in sensing and acting in a principled way. In singleagent frameworks like MDPs and POMDPs, planning can be carried out by resorting to Qvalue functions: an optimal Qvalue function Q ∗ is computed in a recursive manner by dynamic programming, and then an optimal policy is extracted from Q ∗. In this paper we study whether similar Qvalue functions can be defined for decentralized POMDP models (DecPOMDPs), and how policies can be extracted from such value functions. We define two forms of the optimal Qvalue function for DecPOMDPs: one that gives a normative description as the Qvalue function of an optimal pure joint policy and another one that is sequentially rational and thus gives a recipe for computation. This computation, however, is infeasible for all but the smallest problems. Therefore, we analyze various approximate Qvalue functions that allow for efficient computation. We describe how they relate, and we prove that they all provide an upper bound to the optimal Qvalue function Q ∗. Finally, unifying some previous approaches for solving DecPOMDPs, we describe a family of algorithms for extracting policies from such Qvalue functions, and perform an experimental evaluation on existing test problems, including a new firefighting benchmark problem. 1.
Scaling Teamwork to Very Large Teams
 IN PROCEEDINGS OF AAMAS’04
, 2004
"... As a paradigm for coordinating cooperative agents in dynamic environments, teamwork has been shown to be capable of leading to flexible and robust behavior. However, when we apply teamwork to the problem of building teams with hundreds of members, fundamental limitations become apparent. We have dev ..."
Abstract

Cited by 36 (12 self)
 Add to MetaCart
As a paradigm for coordinating cooperative agents in dynamic environments, teamwork has been shown to be capable of leading to flexible and robust behavior. However, when we apply teamwork to the problem of building teams with hundreds of members, fundamental limitations become apparent. We have developed a model of teamwork that addresses the limitations of existing models as they apply to very large teams. A central idea of the model is to organize team members into dynamically evolving subteams. Additionally, we present a novel approach to sharing information, leveraging the properties of small worlds networks. The algorithm provides targeted, efficient information delivery. We have developed domain independant software proxies with which we demonstrate teams at least an order of magnitude bigger than previously published. Moreover, the same proxies proved effective for teamwork in two distinct domains, illustrating the generality of the approach.
Decentralized Markov decision processes with eventdriven interactions
 in: Proceedings of the 3rd International Joint Conference on Autonomous Agents and MultiAgent Systems
"... Decentralized MDPs provide a powerful formal framework for planning in multiagent systems, but the complexity of the model limits its usefulness. We study in this paper a class of DECMDPs that restricts the interactions between the agents to a structured, eventdriven dependency. These dependencie ..."
Abstract

Cited by 32 (5 self)
 Add to MetaCart
Decentralized MDPs provide a powerful formal framework for planning in multiagent systems, but the complexity of the model limits its usefulness. We study in this paper a class of DECMDPs that restricts the interactions between the agents to a structured, eventdriven dependency. These dependencies can model locking a shared resource or temporal enabling constraints, both of which arise frequently in practice. The complexity of this class of problems is shown to be no harder than exponential in the number of states and doubly exponential in the number of dependencies. Since the number of dependencies is much smaller than the number of states for many problems, this is significantly better than the doubly exponential (in the state space) complexity of DECMDPs. We also demonstrate how an algorithm we previously developed can be used to solve problems in this class both optimally and approximately. Experimental work indicates that this solution technique is significantly faster than a naive policy search approach. 1.
Collaborative Multiagent Reinforcement Learning by Payoff Propagation
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2006
"... In this article we describe a set of scalable techniques for learning the behavior of a group of agents in a collaborative multiagent setting. As a basis we use the framework of coordination graphs of Guestrin, Koller, and Parr (2002a) which exploits the dependencies between agents to decompose t ..."
Abstract

Cited by 32 (2 self)
 Add to MetaCart
In this article we describe a set of scalable techniques for learning the behavior of a group of agents in a collaborative multiagent setting. As a basis we use the framework of coordination graphs of Guestrin, Koller, and Parr (2002a) which exploits the dependencies between agents to decompose the global payoff function into a sum of local terms. First, we deal with the singlestate case and describe a payoff propagation algorithm that computes the individual actions that approximately maximize the global payoff function. The method can be viewed as the decisionmaking analogue of belief propagation in Bayesian networks. Second, we focus on learning the behavior of the agents in sequential decisionmaking tasks. We introduce different modelfree reinforcementlearning techniques, unitedly called Sparse Cooperative Qlearning, which approximate the global actionvalue function based on the topology of a coordination graph, and perform updates using the contribution of the individual agents to the maximal global action value. The combined use of an edgebased decomposition of the actionvalue function and the payoff propagation algorithm for efficient action selection, result in an approach that scales only linearly in the problem size. We provide experimental evidence that our method outperforms related multiagent reinforcementlearning methods based on temporal differences.
Communication for improving policy computation in distributed POMDPs
 In Proceedings of The Third International Joint Conference on Autonomous Agents and Multiagent Systems
, 2004
"... K steps, even greater space and timesavings can be obtained. 1. ..."
Abstract

Cited by 32 (11 self)
 Add to MetaCart
K steps, even greater space and timesavings can be obtained. 1.
An integrated tokenbased algorithm for scalable coordination
 In AAMAS’05
, 2005
"... Efficient coordination among large numbers of heterogeneous agents promises to revolutionize the way in which some complex tasks, such as responding to urban disasters can be performed. However, state of the art coordination algorithms are not capable of achieving efficient and effective coordinatio ..."
Abstract

Cited by 30 (14 self)
 Add to MetaCart
Efficient coordination among large numbers of heterogeneous agents promises to revolutionize the way in which some complex tasks, such as responding to urban disasters can be performed. However, state of the art coordination algorithms are not capable of achieving efficient and effective coordination when a team is very large. Building on recent successful tokenbased algorithms for task allocation and information sharing, we have developed an integrated and efficient approach to effective coordination of large scale teams. We use tokens to encapsulate anything that needs to be shared by the team, including information, tasks and resources. The tokens are efficiently routed through the team via the use of local decision theoretic models. Each token is used to improve the routing of other tokens leading to a dramatic performance improvement when the algorithms work together. We present results from an implementation of this approach which demonstrates its ability to coordinate large teams. 1.