Results 1  10
of
47
Solving transition independent decentralized Markov decision processes
 JAIR
, 2004
"... Formal treatment of collaborative multiagent systems has been lagging behind the rapid progress in sequential decision making by individual agents. Recent work in the area of decentralized Markov Decision Processes (MDPs) has contributed to closing this gap, but the computational complexity of thes ..."
Abstract

Cited by 74 (11 self)
 Add to MetaCart
Formal treatment of collaborative multiagent systems has been lagging behind the rapid progress in sequential decision making by individual agents. Recent work in the area of decentralized Markov Decision Processes (MDPs) has contributed to closing this gap, but the computational complexity of these models remains a serious obstacle. To overcome this complexity barrier, we identify a specific class of decentralized MDPs in which the agents ’ transitions are independent. The class consists of independent collaborating agents that are tied together through a structured global reward function that depends on all of their histories of states and actions. We present a novel algorithm for solving this class of problems and examine its properties, both as an optimal algorithm and as an anytime algorithm. To the best of our knowledge, this is the first algorithm to optimally solve a nontrivial subclass of decentralized MDPs. It lays the foundation for further work in this area on both exact and approximate algorithms. 1.
Decentralized control of cooperative systems: Categorization and complexity analysis
 Journal of Artificial Intelligence Research
, 2004
"... Decentralized control of cooperative systems captures the operation of a group of decisionmakers that share a single global objective. The difficulty in solving optimally such problems arises when the agents lack full observability of the global state of the system when they operate. The general pr ..."
Abstract

Cited by 68 (8 self)
 Add to MetaCart
Decentralized control of cooperative systems captures the operation of a group of decisionmakers that share a single global objective. The difficulty in solving optimally such problems arises when the agents lack full observability of the global state of the system when they operate. The general problem has been shown to be NEXPcomplete. In this paper, we identify classes of decentralized control problems whose complexity ranges between NEXP and P. In particular, we study problems characterized by independent transitions, independent observations, and goaloriented objective functions. Two algorithms are shown to solve optimally useful classes of goaloriented decentralized processes in polynomial time. This paper also studies information sharing among the decisionmakers, which can improve their performance. We distinguish between three ways in which agents can exchange information: indirect communication, direct communication and sharing state features that are not controlled by the agents. Our analysis shows that for every class of problems we consider, introducing direct or indirect communication does not change the worstcase complexity. The results provide a better understanding of the complexity of decentralized control problems that arise in practice and facilitate the development of planning algorithms for these problems. 1.
Decentralized Markov decision processes with eventdriven interactions
 in: Proceedings of the 3rd International Joint Conference on Autonomous Agents and MultiAgent Systems
"... Decentralized MDPs provide a powerful formal framework for planning in multiagent systems, but the complexity of the model limits its usefulness. We study in this paper a class of DECMDPs that restricts the interactions between the agents to a structured, eventdriven dependency. These dependencie ..."
Abstract

Cited by 32 (5 self)
 Add to MetaCart
Decentralized MDPs provide a powerful formal framework for planning in multiagent systems, but the complexity of the model limits its usefulness. We study in this paper a class of DECMDPs that restricts the interactions between the agents to a structured, eventdriven dependency. These dependencies can model locking a shared resource or temporal enabling constraints, both of which arise frequently in practice. The complexity of this class of problems is shown to be no harder than exponential in the number of states and doubly exponential in the number of dependencies. Since the number of dependencies is much smaller than the number of states for many problems, this is significantly better than the doubly exponential (in the state space) complexity of DECMDPs. We also demonstrate how an algorithm we previously developed can be used to solve problems in this class both optimally and approximately. Experimental work indicates that this solution technique is significantly faster than a naive policy search approach. 1.
Collaborative Multiagent Reinforcement Learning by Payoff Propagation
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2006
"... In this article we describe a set of scalable techniques for learning the behavior of a group of agents in a collaborative multiagent setting. As a basis we use the framework of coordination graphs of Guestrin, Koller, and Parr (2002a) which exploits the dependencies between agents to decompose t ..."
Abstract

Cited by 32 (2 self)
 Add to MetaCart
In this article we describe a set of scalable techniques for learning the behavior of a group of agents in a collaborative multiagent setting. As a basis we use the framework of coordination graphs of Guestrin, Koller, and Parr (2002a) which exploits the dependencies between agents to decompose the global payoff function into a sum of local terms. First, we deal with the singlestate case and describe a payoff propagation algorithm that computes the individual actions that approximately maximize the global payoff function. The method can be viewed as the decisionmaking analogue of belief propagation in Bayesian networks. Second, we focus on learning the behavior of the agents in sequential decisionmaking tasks. We introduce different modelfree reinforcementlearning techniques, unitedly called Sparse Cooperative Qlearning, which approximate the global actionvalue function based on the topology of a coordination graph, and perform updates using the contribution of the individual agents to the maximal global action value. The combined use of an edgebased decomposition of the actionvalue function and the payoff propagation algorithm for efficient action selection, result in an approach that scales only linearly in the problem size. We provide experimental evidence that our method outperforms related multiagent reinforcementlearning methods based on temporal differences.
Hybrid BDIPOMDP framework for multiagent teaming
 JAIR
"... Many current largescale multiagent team implementations can be characterized as following the “beliefdesireintention ” (BDI) paradigm, with explicit representation of team plans. Despite their promise, current BDI team approaches lack tools for quantitative performance analysis under uncertainty. ..."
Abstract

Cited by 25 (8 self)
 Add to MetaCart
Many current largescale multiagent team implementations can be characterized as following the “beliefdesireintention ” (BDI) paradigm, with explicit representation of team plans. Despite their promise, current BDI team approaches lack tools for quantitative performance analysis under uncertainty. Distributed partially observable Markov decision problems (POMDPs) are well suited for such analysis, but the complexity of finding optimal policies in such models is highly intractable. The key contribution of this article is a hybrid BDIPOMDP approach, where BDI team plans are exploited to improve POMDP tractability and POMDP analysis improves BDI team plan performance. Concretely, we focus on role allocation, a fundamental problem in BDI teams: which agents to allocate to the different roles in the team. The article provides three key contributions. First, we describe a role allocation technique that takes into account future uncertainties in the domain; prior work in multiagent role allocation has failed to address such uncertainties. To that end, we introduce RMTDP (Rolebased Markov Team Decision Problem), a new distributed POMDP model for analysis of role allocations. Our
Optimizing FixedSize Stochastic Controllers for POMDPs and Decentralized POMDPs
"... POMDPs and their decentralized multiagent counterparts, DECPOMDPs, offer a rich framework for sequential decision making under uncertainty. Their computational complexity, however, presents an important research challenge. One approach that effectively addresses the intractable memory requirements ..."
Abstract

Cited by 19 (10 self)
 Add to MetaCart
POMDPs and their decentralized multiagent counterparts, DECPOMDPs, offer a rich framework for sequential decision making under uncertainty. Their computational complexity, however, presents an important research challenge. One approach that effectively addresses the intractable memory requirements of current algorithms is based on representing agent policies as finitestate controllers. In this paper, we propose a new approach that uses this representation and formulates the problem as a nonlinear program (NLP). The NLP defines an optimal policy of a desired size for each agent. This new representation allows a wide range of powerful nonlinear programming algorithms to be used to solve POMDPs and DECPOMDPs. Although solving the NLP optimally is often intractable, the results we obtain using an offtheshelf optimization method are competitive with stateoftheart POMDP algorithms and outperform stateoftheart DECPOMDP algorithms. Our approach is easy to implement and it opens up promising research directions for solving POMDPs and DECPOMDPs using nonlinear programming methods. 1.
Graphical models inference in optimal control of stochastic multiagent systems
 JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH
, 2008
"... In this article we consider the issue of optimal control in collaborative multiagent systems with stochastic dynamics. The agents have a joint task in which they have to reach a number of target states. The dynamics of the agents contains additive control and additive noise, and the autonomous part ..."
Abstract

Cited by 16 (1 self)
 Add to MetaCart
In this article we consider the issue of optimal control in collaborative multiagent systems with stochastic dynamics. The agents have a joint task in which they have to reach a number of target states. The dynamics of the agents contains additive control and additive noise, and the autonomous part factorizes over the agents. Full observation of the global state is assumed. The goal is to minimize the accumulated joint cost, which consists of integrated instantaneous costs and a joint end cost. The joint end cost expresses the joint task of the agents. The instantaneous costs are quadratic in the control and factorize over the agents. The optimal control is given as a weighted linear combination of singleagent to singletarget controls. The singleagent to singletarget controls are expressed in terms of diffusion processes. These controls, when not closed form expressions, are formulated in terms of path integrals, which are calculated approximately by MetropolisHastings sampling. The weights in the control are interpreted as marginals of a joint distribution over agent to target assignments. The structure of the latter is represented by a graphical model, and the marginals are obtained by graphical model inference. Exact inference of the graphical model will break down in large systems, and so approximate inference methods are needed. We use naive mean field approximation and belief propagation to approximate the optimal control in systems with linear dynamics. We compare the approximate inference methods with the exact solution, and we show that they can accurately compute the optimal control. Finally, we demonstrate the control method in multiagent systems with nonlinear dynamics consisting of up to 80 agents that have to reach an equal number of target states.
An iterative algorithm for solving constrained decentralized Markov decision processes
 in: Proceedings of the 21st National Conference on Artificial Intelligence
"... Despite the significant progress to extend Markov Decision Processes (MDP) to cooperative multiagent systems, developing approaches that can deal with realistic problems remains a serious challenge. Existing approaches that solve Decentralized Markov Decision Processes (DECMDPs) suffer from the fa ..."
Abstract

Cited by 13 (0 self)
 Add to MetaCart
Despite the significant progress to extend Markov Decision Processes (MDP) to cooperative multiagent systems, developing approaches that can deal with realistic problems remains a serious challenge. Existing approaches that solve Decentralized Markov Decision Processes (DECMDPs) suffer from the fact that they can only solve relatively small problems without complex constraints on task execution. OCDECMDP has been introduced to deal with large DECMDPs under resource and temporal constraints. However, the proposed algorithm to solve this class of DECMDPs has some limits: it suffers from overestimation of opportunity cost and restricts policy improvement to one sweep (or iteration). In this paper, we propose to overcome these limits by first introducing the notion of Expected Opportunity Cost to better assess the influence of a local decision of an agent on the others. We then describe an iterative version of the algorithm to incrementally improve the policies of agents leading to higher quality solutions in some settings. Experimental results are shown to support our claims.
Achieving goals in decentralized POMDPs
 In Proc. of the Eighth Int. Joint Conf. on Autonomous Agents and Multiagent Systems
"... Coordination of multiple agents under uncertainty in the decentralized POMDP model is known to be NEXPcomplete, even when the agents have a joint set of goals. Nevertheless, we show that the existence of goals can help develop effective planning algorithms. We examine an approach to model these pro ..."
Abstract

Cited by 11 (5 self)
 Add to MetaCart
Coordination of multiple agents under uncertainty in the decentralized POMDP model is known to be NEXPcomplete, even when the agents have a joint set of goals. Nevertheless, we show that the existence of goals can help develop effective planning algorithms. We examine an approach to model these problems as indefinitehorizon decentralized POMDPs, suitable for many practical problems that terminate after some unspecified number of steps. Our algorithm for solving these problems is optimal under some common assumptions – that terminal actions exist for each agent and rewards for nonterminal actions are negative. We also propose an infinitehorizon approximation method that allows us to relax these assumptions while maintaining goal conditions. An optimality bound is developed for this samplebased approach and experimental results show that it is able to exploit the goal structure effectively. Compared with the stateoftheart, our approach can solve larger problems and produce significantly better solutions. 1.
A Bilinear Programming Approach for Multiagent Planning
"... Multiagent planning and coordination problems are common and known to be computationally hard. We show that a wide range of twoagent problems can be formulated as bilinear programs. We present a successive approximation algorithm that significantly outperforms the coverage set algorithm, which is t ..."
Abstract

Cited by 9 (2 self)
 Add to MetaCart
Multiagent planning and coordination problems are common and known to be computationally hard. We show that a wide range of twoagent problems can be formulated as bilinear programs. We present a successive approximation algorithm that significantly outperforms the coverage set algorithm, which is the stateoftheart method for this class of multiagent problems. Because the algorithm is formulated for bilinear programs, it is more general and simpler to implement. The new algorithm can be terminated at any time and–unlike the coverage set algorithm–it facilitates the derivation of a useful online performance bound. It is also much more efficient, on average reducing the computation time of the optimal solution by about four orders of magnitude. Finally, we introduce an automatic dimensionality reduction method that improves the effectiveness of the algorithm, extending its applicability to new domains and providing a new way to analyze a subclass of bilinear programs. 1.