Results 1  10
of
77
Solving transition independent decentralized Markov decision processes
 JAIR
, 2004
"... Formal treatment of collaborative multiagent systems has been lagging behind the rapid progress in sequential decision making by individual agents. Recent work in the area of decentralized Markov Decision Processes (MDPs) has contributed to closing this gap, but the computational complexity of thes ..."
Abstract

Cited by 107 (14 self)
 Add to MetaCart
(Show Context)
Formal treatment of collaborative multiagent systems has been lagging behind the rapid progress in sequential decision making by individual agents. Recent work in the area of decentralized Markov Decision Processes (MDPs) has contributed to closing this gap, but the computational complexity of these models remains a serious obstacle. To overcome this complexity barrier, we identify a specific class of decentralized MDPs in which the agents ’ transitions are independent. The class consists of independent collaborating agents that are tied together through a structured global reward function that depends on all of their histories of states and actions. We present a novel algorithm for solving this class of problems and examine its properties, both as an optimal algorithm and as an anytime algorithm. To the best of our knowledge, this is the first algorithm to optimally solve a nontrivial subclass of decentralized MDPs. It lays the foundation for further work in this area on both exact and approximate algorithms. 1.
Decentralized control of cooperative systems: Categorization and complexity analysis
 Journal of Artificial Intelligence Research
, 2004
"... Decentralized control of cooperative systems captures the operation of a group of decisionmakers that share a single global objective. The difficulty in solving optimally such problems arises when the agents lack full observability of the global state of the system when they operate. The general pr ..."
Abstract

Cited by 88 (9 self)
 Add to MetaCart
(Show Context)
Decentralized control of cooperative systems captures the operation of a group of decisionmakers that share a single global objective. The difficulty in solving optimally such problems arises when the agents lack full observability of the global state of the system when they operate. The general problem has been shown to be NEXPcomplete. In this paper, we identify classes of decentralized control problems whose complexity ranges between NEXP and P. In particular, we study problems characterized by independent transitions, independent observations, and goaloriented objective functions. Two algorithms are shown to solve optimally useful classes of goaloriented decentralized processes in polynomial time. This paper also studies information sharing among the decisionmakers, which can improve their performance. We distinguish between three ways in which agents can exchange information: indirect communication, direct communication and sharing state features that are not controlled by the agents. Our analysis shows that for every class of problems we consider, introducing direct or indirect communication does not change the worstcase complexity. The results provide a better understanding of the complexity of decentralized control problems that arise in practice and facilitate the development of planning algorithms for these problems. 1.
Collaborative Multiagent Reinforcement Learning by Payoff Propagation
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2006
"... In this article we describe a set of scalable techniques for learning the behavior of a group of agents in a collaborative multiagent setting. As a basis we use the framework of coordination graphs of Guestrin, Koller, and Parr (2002a) which exploits the dependencies between agents to decompose t ..."
Abstract

Cited by 64 (2 self)
 Add to MetaCart
In this article we describe a set of scalable techniques for learning the behavior of a group of agents in a collaborative multiagent setting. As a basis we use the framework of coordination graphs of Guestrin, Koller, and Parr (2002a) which exploits the dependencies between agents to decompose the global payoff function into a sum of local terms. First, we deal with the singlestate case and describe a payoff propagation algorithm that computes the individual actions that approximately maximize the global payoff function. The method can be viewed as the decisionmaking analogue of belief propagation in Bayesian networks. Second, we focus on learning the behavior of the agents in sequential decisionmaking tasks. We introduce different modelfree reinforcementlearning techniques, unitedly called Sparse Cooperative Qlearning, which approximate the global actionvalue function based on the topology of a coordination graph, and perform updates using the contribution of the individual agents to the maximal global action value. The combined use of an edgebased decomposition of the actionvalue function and the payoff propagation algorithm for efficient action selection, result in an approach that scales only linearly in the problem size. We provide experimental evidence that our method outperforms related multiagent reinforcementlearning methods based on temporal differences.
Decentralized Markov decision processes with eventdriven interactions
 in: Proceedings of the 3rd International Joint Conference on Autonomous Agents and MultiAgent Systems
"... Decentralized MDPs provide a powerful formal framework for planning in multiagent systems, but the complexity of the model limits its usefulness. We study in this paper a class of DECMDPs that restricts the interactions between the agents to a structured, eventdriven dependency. These dependencie ..."
Abstract

Cited by 53 (7 self)
 Add to MetaCart
(Show Context)
Decentralized MDPs provide a powerful formal framework for planning in multiagent systems, but the complexity of the model limits its usefulness. We study in this paper a class of DECMDPs that restricts the interactions between the agents to a structured, eventdriven dependency. These dependencies can model locking a shared resource or temporal enabling constraints, both of which arise frequently in practice. The complexity of this class of problems is shown to be no harder than exponential in the number of states and doubly exponential in the number of dependencies. Since the number of dependencies is much smaller than the number of states for many problems, this is significantly better than the doubly exponential (in the state space) complexity of DECMDPs. We also demonstrate how an algorithm we previously developed can be used to solve problems in this class both optimally and approximately. Experimental work indicates that this solution technique is significantly faster than a naive policy search approach. 1.
Hybrid BDIPOMDP framework for multiagent teaming
 JAIR
"... Many current largescale multiagent team implementations can be characterized as following the “beliefdesireintention ” (BDI) paradigm, with explicit representation of team plans. Despite their promise, current BDI team approaches lack tools for quantitative performance analysis under uncertainty. ..."
Abstract

Cited by 37 (9 self)
 Add to MetaCart
Many current largescale multiagent team implementations can be characterized as following the “beliefdesireintention ” (BDI) paradigm, with explicit representation of team plans. Despite their promise, current BDI team approaches lack tools for quantitative performance analysis under uncertainty. Distributed partially observable Markov decision problems (POMDPs) are well suited for such analysis, but the complexity of finding optimal policies in such models is highly intractable. The key contribution of this article is a hybrid BDIPOMDP approach, where BDI team plans are exploited to improve POMDP tractability and POMDP analysis improves BDI team plan performance. Concretely, we focus on role allocation, a fundamental problem in BDI teams: which agents to allocate to the different roles in the team. The article provides three key contributions. First, we describe a role allocation technique that takes into account future uncertainties in the domain; prior work in multiagent role allocation has failed to address such uncertainties. To that end, we introduce RMTDP (Rolebased Markov Team Decision Problem), a new distributed POMDP model for analysis of role allocations. Our
Optimizing FixedSize Stochastic Controllers for POMDPs and Decentralized POMDPs
"... POMDPs and their decentralized multiagent counterparts, DECPOMDPs, offer a rich framework for sequential decision making under uncertainty. Their computational complexity, however, presents an important research challenge. One approach that effectively addresses the intractable memory requirements ..."
Abstract

Cited by 30 (15 self)
 Add to MetaCart
(Show Context)
POMDPs and their decentralized multiagent counterparts, DECPOMDPs, offer a rich framework for sequential decision making under uncertainty. Their computational complexity, however, presents an important research challenge. One approach that effectively addresses the intractable memory requirements of current algorithms is based on representing agent policies as finitestate controllers. In this paper, we propose a new approach that uses this representation and formulates the problem as a nonlinear program (NLP). The NLP defines an optimal policy of a desired size for each agent. This new representation allows a wide range of powerful nonlinear programming algorithms to be used to solve POMDPs and DECPOMDPs. Although solving the NLP optimally is often intractable, the results we obtain using an offtheshelf optimization method are competitive with stateoftheart POMDP algorithms and outperform stateoftheart DECPOMDP algorithms. Our approach is easy to implement and it opens up promising research directions for solving POMDPs and DECPOMDPs using nonlinear programming methods. 1.
Minimizing communication cost in a distributed Bayesian network using a decentralized MDP
 Proceedings of Second International Joint Conference on Autonomous Agents and MultiAgent Systems (AAMAS 2003
, 2003
"... In complex distributed applications, a problem is often decomposed into a set of subproblems that are distributed to multiple agents. We formulate this class of problems with a two layer Bayesian Network. Instead of merely providing a statistical view, we propose a satisficing approach to predict ..."
Abstract

Cited by 24 (5 self)
 Add to MetaCart
(Show Context)
In complex distributed applications, a problem is often decomposed into a set of subproblems that are distributed to multiple agents. We formulate this class of problems with a two layer Bayesian Network. Instead of merely providing a statistical view, we propose a satisficing approach to predict the minimum expected communication needed to reach a desired solution quality. The problem is modelled with a decentralized MDP, and two approximate algorithms are developed to find the near optimal communication strategy for a given problem structure and a required solution quality.
Modeling and Simulating Human Teamwork Behaviors Using Intelligent Agents
 In Journal of Physics of Life Reviews
, 2004
"... Among researchers in multiagent systems there has been growing interest in using intelligent agents to model and simulate human teamwork behaviors. Teamwork modeling is important for training humans in gaining collaborative skills, for supporting humans in making critical decisions by proactively ..."
Abstract

Cited by 21 (1 self)
 Add to MetaCart
(Show Context)
Among researchers in multiagent systems there has been growing interest in using intelligent agents to model and simulate human teamwork behaviors. Teamwork modeling is important for training humans in gaining collaborative skills, for supporting humans in making critical decisions by proactively gathering, fusing, and sharing information, and for building coherent teams with both humans and agents working effectively on intelligenceintensive problems. Teamwork modeling is also challenging because the research has spanned diverse disciplines from business management to cognitive science, human discourse, and distributed artificial intelligence. This article presents an extensive, but not exhaustive, list of work in the field, where the taxonomy is organized along two main dimensions: team social structure and social behaviors. Along the dimension of social structure, we consider agentonly teams and mixed human/agent teams. Along the dimension of social behaviors, we consider collaborative behaviors, communicative behaviors, helping behaviors, and the underpinning of effective teamwork shared mental models. The contribution of this article is that it presents an organizational framework for analyzing a variety of teamwork simulation systems and for further studying simulated teamwork behaviors.
Graphical models inference in optimal control of stochastic multiagent systems
 JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH
, 2008
"... In this article we consider the issue of optimal control in collaborative multiagent systems with stochastic dynamics. The agents have a joint task in which they have to reach a number of target states. The dynamics of the agents contains additive control and additive noise, and the autonomous part ..."
Abstract

Cited by 21 (1 self)
 Add to MetaCart
In this article we consider the issue of optimal control in collaborative multiagent systems with stochastic dynamics. The agents have a joint task in which they have to reach a number of target states. The dynamics of the agents contains additive control and additive noise, and the autonomous part factorizes over the agents. Full observation of the global state is assumed. The goal is to minimize the accumulated joint cost, which consists of integrated instantaneous costs and a joint end cost. The joint end cost expresses the joint task of the agents. The instantaneous costs are quadratic in the control and factorize over the agents. The optimal control is given as a weighted linear combination of singleagent to singletarget controls. The singleagent to singletarget controls are expressed in terms of diffusion processes. These controls, when not closed form expressions, are formulated in terms of path integrals, which are calculated approximately by MetropolisHastings sampling. The weights in the control are interpreted as marginals of a joint distribution over agent to target assignments. The structure of the latter is represented by a graphical model, and the marginals are obtained by graphical model inference. Exact inference of the graphical model will break down in large systems, and so approximate inference methods are needed. We use naive mean field approximation and belief propagation to approximate the optimal control in systems with linear dynamics. We compare the approximate inference methods with the exact solution, and we show that they can accurately compute the optimal control. Finally, we demonstrate the control method in multiagent systems with nonlinear dynamics consisting of up to 80 agents that have to reach an equal number of target states.