Results 1  10
of
93
Improved memorybounded dynamic programming for decentralized POMDPs
 In Proceedings of the TwentyThird Conference on Uncertainty in Artificial Intelligence
, 2007
"... Decentralized decision making under uncertainty has been shown to be intractable when each agent has different partial information about the domain. Thus, improving the applicability and scalability of planning algorithms is an important challenge. We present the first memorybounded dynamic program ..."
Abstract

Cited by 94 (22 self)
 Add to MetaCart
Decentralized decision making under uncertainty has been shown to be intractable when each agent has different partial information about the domain. Thus, improving the applicability and scalability of planning algorithms is an important challenge. We present the first memorybounded dynamic programming algorithm for finitehorizon decentralized POMDPs. A set of heuristics is used to identify relevant points of the infinitely large belief space. Using these belief points, the algorithm successively selects the best joint policies for each horizon. The algorithm is extremely efficient, having linear time and space complexity with respect to the horizon length. Experimental results show that it can handle horizons that are multiple orders of magnitude larger than what was previously possible, while achieving the same or better solution quality. These results significantly increase the applicability of decentralized decisionmaking techniques. 1
Optimal and approximate Qvalue functions for decentralized POMDPs
 J. Artificial Intelligence Research
"... Decisiontheoretic planning is a popular approach to sequential decision making problems, because it treats uncertainty in sensing and acting in a principled way. In singleagent frameworks like MDPs and POMDPs, planning can be carried out by resorting to Qvalue functions: an optimal Qvalue functi ..."
Abstract

Cited by 62 (26 self)
 Add to MetaCart
(Show Context)
Decisiontheoretic planning is a popular approach to sequential decision making problems, because it treats uncertainty in sensing and acting in a principled way. In singleagent frameworks like MDPs and POMDPs, planning can be carried out by resorting to Qvalue functions: an optimal Qvalue function Q ∗ is computed in a recursive manner by dynamic programming, and then an optimal policy is extracted from Q ∗. In this paper we study whether similar Qvalue functions can be defined for decentralized POMDP models (DecPOMDPs), and how policies can be extracted from such value functions. We define two forms of the optimal Qvalue function for DecPOMDPs: one that gives a normative description as the Qvalue function of an optimal pure joint policy and another one that is sequentially rational and thus gives a recipe for computation. This computation, however, is infeasible for all but the smallest problems. Therefore, we analyze various approximate Qvalue functions that allow for efficient computation. We describe how they relate, and we prove that they all provide an upper bound to the optimal Qvalue function Q ∗. Finally, unifying some previous approaches for solving DecPOMDPs, we describe a family of algorithms for extracting policies from such Qvalue functions, and perform an experimental evaluation on existing test problems, including a new firefighting benchmark problem. 1.
Exploiting locality of interaction in factored DecPOMDPs
 In Proc. Int. Joint Conf. Autonomous Agents and Multi Agent Systems
, 2008
"... Decentralized partially observable Markov decision processes (DecPOMDPs) constitute an expressive framework for multiagent planning under uncertainty, but solving them is provably intractable. We demonstrate how their scalability can be improved by exploiting locality of interaction between agents ..."
Abstract

Cited by 46 (20 self)
 Add to MetaCart
(Show Context)
Decentralized partially observable Markov decision processes (DecPOMDPs) constitute an expressive framework for multiagent planning under uncertainty, but solving them is provably intractable. We demonstrate how their scalability can be improved by exploiting locality of interaction between agents in a factored representation. Factored DecPOMDP representations have been proposed before, but only for DecPOMDPs whose transition and observation models are fully independent. Such strong assumptions simplify the planning problem, but result in models with limited applicability. By contrast, we consider general factored DecPOMDPs for which we analyze the model dependencies over space (locality of interaction) and time (horizon of the problem). We also present a formulation of decomposable value functions. Together, our results allow us to exploit the problem structure as well as heuristics in a single framework that is based on collaborative graphical Bayesian games (CGBGs). A preliminary experiment shows a speedup of two orders of magnitude.
Letting loose a SPIDER on a network of POMDPs: Generating quality guaranteed policies
 In AAMAS
, 2007
"... Distributed Partially Observable Markov Decision Problems (Distributed POMDPs) are a popular approach for modeling multiagent systems acting in uncertain domains. Given the significant complexity of solving distributed POMDPs, particularly as we scale up the numbers of agents, one popular approach ..."
Abstract

Cited by 36 (5 self)
 Add to MetaCart
(Show Context)
Distributed Partially Observable Markov Decision Problems (Distributed POMDPs) are a popular approach for modeling multiagent systems acting in uncertain domains. Given the significant complexity of solving distributed POMDPs, particularly as we scale up the numbers of agents, one popular approach has focused on approximate solutions. Though this approach is efficient, the algorithms within this approach do not provide any guarantees on solution quality. A second less popular approach focuses on global optimality, but typical results are available only for two agents, and also at considerable computational cost. This paper overcomes the limitations of both these approaches by providing SPIDER, a novel combination of three key features for policy generation in distributed POMDPs: (i) it exploits agent interaction structure given a network of agents (i.e. allowing easier scaleup to larger number of agents); (ii) it uses a combination of heuristics to speedup policy search; and (iii) it allows quality guaranteed approximations, allowing a systematic tradeoff of solution quality for time. Experimental results show orders of magnitude improvement in performance when compared with previous global optimal algorithms.
Pointbased dynamic programming for DECPOMDPs
 In Proc. of the National Conference on Artificial Intelligence
, 2006
"... We introduce pointbased dynamic programming (DP) for decentralized partially observable Markov decision processes (DECPOMDPs), a new discrete DP algorithm for planning strategies for cooperative multiagent systems. Our approach makes a connection between optimal DP algorithms for partially obser ..."
Abstract

Cited by 35 (2 self)
 Add to MetaCart
We introduce pointbased dynamic programming (DP) for decentralized partially observable Markov decision processes (DECPOMDPs), a new discrete DP algorithm for planning strategies for cooperative multiagent systems. Our approach makes a connection between optimal DP algorithms for partially observable stochastic games, and pointbased approximations for singleagent POMDPs. We show for the first time how relevant multiagent belief states can be computed. Building on this insight, we then show how the linear programming part in current multiagent DP algorithms can be avoided, and how multiagent DP can thus be applied to solve larger problems. We derive both an optimal and an approximated version of our algorithm, and we show its efficiency on test examples from the literature.
Policy iteration for decentralized control of Markov decision processes
 JAIR
"... Coordination of distributed agents is required for problems arising in many areas, including multirobot systems, networking and ecommerce. As a formal framework for such problems, we use the decentralized partially observable Markov decision process (DECPOMDP). Though much work has been done on o ..."
Abstract

Cited by 32 (19 self)
 Add to MetaCart
(Show Context)
Coordination of distributed agents is required for problems arising in many areas, including multirobot systems, networking and ecommerce. As a formal framework for such problems, we use the decentralized partially observable Markov decision process (DECPOMDP). Though much work has been done on optimal dynamic programming algorithms for the singleagent version of the problem, optimal algorithms for the multiagent case have been elusive. The main contribution of this paper is an optimal policy iteration algorithm for solving DECPOMDPs. The algorithm uses stochastic finitestate controllers to represent policies. The solution can include a correlation device, which allows agents to correlate their actions without communicating. This approach alternates between expanding the controller and performing valuepreserving transformations, which modify the controller without sacrificing value. We present two efficient valuepreserving transformations: one can reduce the size of the controller and the other can improve its value while keeping the size fixed. Empirical results demonstrate the usefulness of valuepreserving transformations in increasing value while keeping controller size to a minimum. To broaden the applicability of the approach, we also present a heuristic version of the policy iteration algorithm, which sacrifices convergence to optimality. This algorithm further reduces the size of the controllers at each step by assuming that probability distributions over the other agents’ actions are known. While this assumption may not hold in general, it helps produce higher quality solutions in our test problems. 1.
Optimizing FixedSize Stochastic Controllers for POMDPs and Decentralized POMDPs
"... POMDPs and their decentralized multiagent counterparts, DECPOMDPs, offer a rich framework for sequential decision making under uncertainty. Their computational complexity, however, presents an important research challenge. One approach that effectively addresses the intractable memory requirements ..."
Abstract

Cited by 30 (15 self)
 Add to MetaCart
(Show Context)
POMDPs and their decentralized multiagent counterparts, DECPOMDPs, offer a rich framework for sequential decision making under uncertainty. Their computational complexity, however, presents an important research challenge. One approach that effectively addresses the intractable memory requirements of current algorithms is based on representing agent policies as finitestate controllers. In this paper, we propose a new approach that uses this representation and formulates the problem as a nonlinear program (NLP). The NLP defines an optimal policy of a desired size for each agent. This new representation allows a wide range of powerful nonlinear programming algorithms to be used to solve POMDPs and DECPOMDPs. Although solving the NLP optimally is often intractable, the results we obtain using an offtheshelf optimization method are competitive with stateoftheart POMDP algorithms and outperform stateoftheart DECPOMDP algorithms. Our approach is easy to implement and it opens up promising research directions for solving POMDPs and DECPOMDPs using nonlinear programming methods. 1.
Lossless clustering of histories in decentralized POMDPs
 In Proc. of the Eighth Int. Joint Conf. on Autonomous Agents and Multiagent Systems
, 2009
"... Decentralized partially observable Markov decision processes (DecPOMDPs) constitute a generic and expressive framework for multiagent planning under uncertainty. However, planning optimally is difficult because solutions map local observation histories to actions, and the number of such histories g ..."
Abstract

Cited by 26 (7 self)
 Add to MetaCart
(Show Context)
Decentralized partially observable Markov decision processes (DecPOMDPs) constitute a generic and expressive framework for multiagent planning under uncertainty. However, planning optimally is difficult because solutions map local observation histories to actions, and the number of such histories grows exponentially in the planning horizon. In this work, we identify a criterion that allows for lossless clustering of observation histories: i.e., we prove that when two histories satisfy the criterion, they have the same optimal value and thus can be treated as one. We show how this result can be exploited in optimal policy search and demonstrate empirically that it can provide a speedup of multiple orders of magnitude, allowing the optimal solution of significantly larger problems. We also perform an empirical analysis of the generality of our clustering method, which suggests that it may also be useful in other (approximate) DecPOMDP solution methods.
An optimal bestfirst search algorithm for solving infinite horizon DECPOMDPs
 In Proc. of the European Conference on Machine Learning
, 2005
"... Abstract. In the domain of decentralized Markov decision processes, we develop the first complete and optimal algorithm that is able to extract deterministic policy vectors based on finite state controllers for a cooperative team of agents. Our algorithm applies to the discounted infinite horizon ..."
Abstract

Cited by 26 (1 self)
 Add to MetaCart
(Show Context)
Abstract. In the domain of decentralized Markov decision processes, we develop the first complete and optimal algorithm that is able to extract deterministic policy vectors based on finite state controllers for a cooperative team of agents. Our algorithm applies to the discounted infinite horizon case and extends bestfirst search methods to the domain of decentralized control theory. We prove the optimality of our approach and give some first experimental results for two small test problems. We believe this to be an important step forward in learning and planning in stochastic multiagent systems. 1