Results 1  10
of
60
Perseus: Randomized pointbased value iteration for POMDPs
 Journal of Artificial Intelligence Research
, 2005
"... Partially observable Markov decision processes (POMDPs) form an attractive and principled framework for agent planning under uncertainty. Pointbased approximate techniques for POMDPs compute a policy based on a finite set of points collected in advance from the agent’s belief space. We present a ra ..."
Abstract

Cited by 141 (11 self)
 Add to MetaCart
Partially observable Markov decision processes (POMDPs) form an attractive and principled framework for agent planning under uncertainty. Pointbased approximate techniques for POMDPs compute a policy based on a finite set of points collected in advance from the agent’s belief space. We present a randomized pointbased value iteration algorithm called Perseus. The algorithm performs approximate value backup stages, ensuring that in each backup stage the value of each point in the belief set is improved; the key observation is that a single backup may improve the value of many belief points. Contrary to other pointbased methods, Perseus backs up only a (randomly selected) subset of points in the belief set, sufficient for improving the value of each belief point in the set. We show how the same idea can be extended to dealing with continuous action spaces. Experimental results show the potential of Perseus in large scale POMDP problems. 1.
MAA*: A heuristic search algorithm for solving decentralized POMDPs
 In Proceedings of the TwentyFirst Conference on Uncertainty in Artificial Intelligence
, 2005
"... We present multiagent A * (MAA*), the first complete and optimal heuristic search algorithm for solving decentralized partiallyobservable Markov decision problems (DECPOMDPs) with finite horizon. The algorithm is suitable for computing optimal plans for a cooperative group of agents that operate i ..."
Abstract

Cited by 68 (20 self)
 Add to MetaCart
We present multiagent A * (MAA*), the first complete and optimal heuristic search algorithm for solving decentralized partiallyobservable Markov decision problems (DECPOMDPs) with finite horizon. The algorithm is suitable for computing optimal plans for a cooperative group of agents that operate in a stochastic environment such as multirobot coordination, network traffic control, or distributed resource allocation. Solving such problems effectively is a major challenge in the area of planning under uncertainty. Our solution is based on a synthesis of classical heuristic search and decentralized control theory. Experimental results show that MAA * has significant advantages. We introduce an anytime variant of MAA * and conclude with a discussion of promising extensions such as an approach to solving infinite horizon problems. 1
Networked Distributed POMDPs: A Synthesis of Distributed Constraint Optimization and POMDPs
, 2005
"... In many realworld multiagent applications such as distributed sensor nets, a network of agents is formed based on each agent’s limited interactions with a small number of neighbors. While distributed POMDPs capture the realworld uncertainty in multiagent domains, they fail to exploit such locality ..."
Abstract

Cited by 64 (14 self)
 Add to MetaCart
In many realworld multiagent applications such as distributed sensor nets, a network of agents is formed based on each agent’s limited interactions with a small number of neighbors. While distributed POMDPs capture the realworld uncertainty in multiagent domains, they fail to exploit such locality of interaction. Distributed constraint optimization (DCOP) captures the locality of interaction but fails to capture planning under uncertainty. This paper present a new model synthesized from distributed POMDPs and DCOPs, called Networked Distributed POMDPs (NDPOMDPs). Exploiting network structure enables us to present two novel algorithms for NDPOMDPs: a distributed policy generation algorithm that performs local search and a systematic policy search that is guaranteed to reach the global optimal.
Improved memorybounded dynamic programming for decentralized POMDPs
 In Proceedings of the TwentyThird Conference on Uncertainty in Artificial Intelligence
, 2007
"... Decentralized decision making under uncertainty has been shown to be intractable when each agent has different partial information about the domain. Thus, improving the applicability and scalability of planning algorithms is an important challenge. We present the first memorybounded dynamic program ..."
Abstract

Cited by 61 (20 self)
 Add to MetaCart
Decentralized decision making under uncertainty has been shown to be intractable when each agent has different partial information about the domain. Thus, improving the applicability and scalability of planning algorithms is an important challenge. We present the first memorybounded dynamic programming algorithm for finitehorizon decentralized POMDPs. A set of heuristics is used to identify relevant points of the infinitely large belief space. Using these belief points, the algorithm successively selects the best joint policies for each horizon. The algorithm is extremely efficient, having linear time and space complexity with respect to the horizon length. Experimental results show that it can handle horizons that are multiple orders of magnitude larger than what was previously possible, while achieving the same or better solution quality. These results significantly increase the applicability of decentralized decisionmaking techniques. 1
Optimal and approximate Qvalue functions for decentralized POMDPs
 J. Artificial Intelligence Research
"... Decisiontheoretic planning is a popular approach to sequential decision making problems, because it treats uncertainty in sensing and acting in a principled way. In singleagent frameworks like MDPs and POMDPs, planning can be carried out by resorting to Qvalue functions: an optimal Qvalue functi ..."
Abstract

Cited by 39 (16 self)
 Add to MetaCart
Decisiontheoretic planning is a popular approach to sequential decision making problems, because it treats uncertainty in sensing and acting in a principled way. In singleagent frameworks like MDPs and POMDPs, planning can be carried out by resorting to Qvalue functions: an optimal Qvalue function Q ∗ is computed in a recursive manner by dynamic programming, and then an optimal policy is extracted from Q ∗. In this paper we study whether similar Qvalue functions can be defined for decentralized POMDP models (DecPOMDPs), and how policies can be extracted from such value functions. We define two forms of the optimal Qvalue function for DecPOMDPs: one that gives a normative description as the Qvalue function of an optimal pure joint policy and another one that is sequentially rational and thus gives a recipe for computation. This computation, however, is infeasible for all but the smallest problems. Therefore, we analyze various approximate Qvalue functions that allow for efficient computation. We describe how they relate, and we prove that they all provide an upper bound to the optimal Qvalue function Q ∗. Finally, unifying some previous approaches for solving DecPOMDPs, we describe a family of algorithms for extracting policies from such Qvalue functions, and perform an experimental evaluation on existing test problems, including a new firefighting benchmark problem. 1.
Pointbased Dynamic Programming for DECPOMDPs
, 2006
"... We introduce pointbased dynamic programming (DP) for decentralized partially observable Markov decision processes (DECPOMDPs), a new discrete DP algorithm for planning strategies for cooperative multiagent systems. Our approach makes a connection between optimal DP algorithms for partially o ..."
Abstract

Cited by 31 (2 self)
 Add to MetaCart
We introduce pointbased dynamic programming (DP) for decentralized partially observable Markov decision processes (DECPOMDPs), a new discrete DP algorithm for planning strategies for cooperative multiagent systems. Our approach makes a connection between optimal DP algorithms for partially observable stochastic games, and pointbased approximations for singleagent POMDPs. We show for the first time how relevant multiagent belief states can be computed. Building on this insight, we then show how the linear programming part in current multiagent DP algorithms can be avoided, and how multiagent DP can thus be applied to solve larger problems. We derive both an optimal and an approximated version of our algorithm, and we show its efficiency on test examples from the literature.
Letting loose a SPIDER on a network of POMDPs: Generating quality guaranteed policies
 In AAMAS
, 2007
"... Distributed Partially Observable Markov Decision Problems (Distributed POMDPs) are a popular approach for modeling multiagent systems acting in uncertain domains. Given the significant complexity of solving distributed POMDPs, particularly as we scale up the numbers of agents, one popular approach ..."
Abstract

Cited by 27 (5 self)
 Add to MetaCart
Distributed Partially Observable Markov Decision Problems (Distributed POMDPs) are a popular approach for modeling multiagent systems acting in uncertain domains. Given the significant complexity of solving distributed POMDPs, particularly as we scale up the numbers of agents, one popular approach has focused on approximate solutions. Though this approach is efficient, the algorithms within this approach do not provide any guarantees on solution quality. A second less popular approach focuses on global optimality, but typical results are available only for two agents, and also at considerable computational cost. This paper overcomes the limitations of both these approaches by providing SPIDER, a novel combination of three key features for policy generation in distributed POMDPs: (i) it exploits agent interaction structure given a network of agents (i.e. allowing easier scaleup to larger number of agents); (ii) it uses a combination of heuristics to speedup policy search; and (iii) it allows quality guaranteed approximations, allowing a systematic tradeoff of solution quality for time. Experimental results show orders of magnitude improvement in performance when compared with previous global optimal algorithms.
Exploiting Coordination Locales in Distributed POMDPs via Social Model Shaping
"... Distributed POMDPs provide an expressive framework for modeling multiagent collaboration problems, but NEXPComplete complexity hinders their scalability and application in realworld domains. This paper introduces a subclass of distributed POMDPs, and TREMOR, an algorithm to solve such distributed ..."
Abstract

Cited by 26 (10 self)
 Add to MetaCart
Distributed POMDPs provide an expressive framework for modeling multiagent collaboration problems, but NEXPComplete complexity hinders their scalability and application in realworld domains. This paper introduces a subclass of distributed POMDPs, and TREMOR, an algorithm to solve such distributed POMDPs. The primary novelty of TREMOR is that agents plan individually with a single agent POMDP solver and use social model shaping to implicitly coordinate with other agents. Experiments demonstrate that TREMOR can provide solutions orders of magnitude faster than existing algorithms while achieving comparable, or even superior, solution quality.
Exploiting locality of interaction in factored DecPOMDPs
 In Proc. Int. Joint Conf. Autonomous Agents and Multi Agent Systems
, 2008
"... Decentralized partially observable Markov decision processes (DecPOMDPs) constitute an expressive framework for multiagent planning under uncertainty, but solving them is provably intractable. We demonstrate how their scalability can be improved by exploiting locality of interaction between agents ..."
Abstract

Cited by 26 (8 self)
 Add to MetaCart
Decentralized partially observable Markov decision processes (DecPOMDPs) constitute an expressive framework for multiagent planning under uncertainty, but solving them is provably intractable. We demonstrate how their scalability can be improved by exploiting locality of interaction between agents in a factored representation. Factored DecPOMDP representations have been proposed before, but only for DecPOMDPs whose transition and observation models are fully independent. Such strong assumptions simplify the planning problem, but result in models with limited applicability. By contrast, we consider general factored DecPOMDPs for which we analyze the model dependencies over space (locality of interaction) and time (horizon of the problem). We also present a formulation of decomposable value functions. Together, our results allow us to exploit the problem structure as well as heuristics in a single framework that is based on collaborative graphical Bayesian games (CGBGs). A preliminary experiment shows a speedup of two orders of magnitude.