Results 1  10
of
50
Optimal and approximate Qvalue functions for decentralized POMDPs
 J. Artificial Intelligence Research
"... Decisiontheoretic planning is a popular approach to sequential decision making problems, because it treats uncertainty in sensing and acting in a principled way. In singleagent frameworks like MDPs and POMDPs, planning can be carried out by resorting to Qvalue functions: an optimal Qvalue functi ..."
Abstract

Cited by 39 (16 self)
 Add to MetaCart
Decisiontheoretic planning is a popular approach to sequential decision making problems, because it treats uncertainty in sensing and acting in a principled way. In singleagent frameworks like MDPs and POMDPs, planning can be carried out by resorting to Qvalue functions: an optimal Qvalue function Q ∗ is computed in a recursive manner by dynamic programming, and then an optimal policy is extracted from Q ∗. In this paper we study whether similar Qvalue functions can be defined for decentralized POMDP models (DecPOMDPs), and how policies can be extracted from such value functions. We define two forms of the optimal Qvalue function for DecPOMDPs: one that gives a normative description as the Qvalue function of an optimal pure joint policy and another one that is sequentially rational and thus gives a recipe for computation. This computation, however, is infeasible for all but the smallest problems. Therefore, we analyze various approximate Qvalue functions that allow for efficient computation. We describe how they relate, and we prove that they all provide an upper bound to the optimal Qvalue function Q ∗. Finally, unifying some previous approaches for solving DecPOMDPs, we describe a family of algorithms for extracting policies from such Qvalue functions, and perform an experimental evaluation on existing test problems, including a new firefighting benchmark problem. 1.
Letting loose a SPIDER on a network of POMDPs: Generating quality guaranteed policies
 In AAMAS
, 2007
"... Distributed Partially Observable Markov Decision Problems (Distributed POMDPs) are a popular approach for modeling multiagent systems acting in uncertain domains. Given the significant complexity of solving distributed POMDPs, particularly as we scale up the numbers of agents, one popular approach ..."
Abstract

Cited by 27 (5 self)
 Add to MetaCart
Distributed Partially Observable Markov Decision Problems (Distributed POMDPs) are a popular approach for modeling multiagent systems acting in uncertain domains. Given the significant complexity of solving distributed POMDPs, particularly as we scale up the numbers of agents, one popular approach has focused on approximate solutions. Though this approach is efficient, the algorithms within this approach do not provide any guarantees on solution quality. A second less popular approach focuses on global optimality, but typical results are available only for two agents, and also at considerable computational cost. This paper overcomes the limitations of both these approaches by providing SPIDER, a novel combination of three key features for policy generation in distributed POMDPs: (i) it exploits agent interaction structure given a network of agents (i.e. allowing easier scaleup to larger number of agents); (ii) it uses a combination of heuristics to speedup policy search; and (iii) it allows quality guaranteed approximations, allowing a systematic tradeoff of solution quality for time. Experimental results show orders of magnitude improvement in performance when compared with previous global optimal algorithms.
Exploiting locality of interaction in factored DecPOMDPs
 In Proc. Int. Joint Conf. Autonomous Agents and Multi Agent Systems
, 2008
"... Decentralized partially observable Markov decision processes (DecPOMDPs) constitute an expressive framework for multiagent planning under uncertainty, but solving them is provably intractable. We demonstrate how their scalability can be improved by exploiting locality of interaction between agents ..."
Abstract

Cited by 26 (8 self)
 Add to MetaCart
Decentralized partially observable Markov decision processes (DecPOMDPs) constitute an expressive framework for multiagent planning under uncertainty, but solving them is provably intractable. We demonstrate how their scalability can be improved by exploiting locality of interaction between agents in a factored representation. Factored DecPOMDP representations have been proposed before, but only for DecPOMDPs whose transition and observation models are fully independent. Such strong assumptions simplify the planning problem, but result in models with limited applicability. By contrast, we consider general factored DecPOMDPs for which we analyze the model dependencies over space (locality of interaction) and time (horizon of the problem). We also present a formulation of decomposable value functions. Together, our results allow us to exploit the problem structure as well as heuristics in a single framework that is based on collaborative graphical Bayesian games (CGBGs). A preliminary experiment shows a speedup of two orders of magnitude.
Quality guarantees on koptimal solutions for distributed constraint optimization
, 2007
"... A distributed constraint optimization problem (DCOP) is a formalism that captures the rewards and costs of local interactions within a team of agents. Because complete algorithms to solve DCOPs are unsuitable for some dynamic or anytime domains, researchers have explored incomplete DCOP algorithms t ..."
Abstract

Cited by 23 (6 self)
 Add to MetaCart
A distributed constraint optimization problem (DCOP) is a formalism that captures the rewards and costs of local interactions within a team of agents. Because complete algorithms to solve DCOPs are unsuitable for some dynamic or anytime domains, researchers have explored incomplete DCOP algorithms that result in locally optimal solutions. One type of categorization of such algorithms, and the solutions they produce, is koptimality; a koptimal solution is one that cannot be improved by any deviation by k or fewer agents. This paper presents the first known guarantees on solution quality for koptimal solutions. The guarantees are independent of the costs and rewards in the DCOP, and once computed can be used for any DCOP of a given constraint graph structure. 1
Policy iteration for decentralized control of Markov decision processes
 JAIR
"... Coordination of distributed agents is required for problems arising in many areas, including multirobot systems, networking and ecommerce. As a formal framework for such problems, we use the decentralized partially observable Markov decision process (DECPOMDP). Though much work has been done on o ..."
Abstract

Cited by 23 (15 self)
 Add to MetaCart
Coordination of distributed agents is required for problems arising in many areas, including multirobot systems, networking and ecommerce. As a formal framework for such problems, we use the decentralized partially observable Markov decision process (DECPOMDP). Though much work has been done on optimal dynamic programming algorithms for the singleagent version of the problem, optimal algorithms for the multiagent case have been elusive. The main contribution of this paper is an optimal policy iteration algorithm for solving DECPOMDPs. The algorithm uses stochastic finitestate controllers to represent policies. The solution can include a correlation device, which allows agents to correlate their actions without communicating. This approach alternates between expanding the controller and performing valuepreserving transformations, which modify the controller without sacrificing value. We present two efficient valuepreserving transformations: one can reduce the size of the controller and the other can improve its value while keeping the size fixed. Empirical results demonstrate the usefulness of valuepreserving transformations in increasing value while keeping controller size to a minimum. To broaden the applicability of the approach, we also present a heuristic version of the policy iteration algorithm, which sacrifices convergence to optimality. This algorithm further reduces the size of the controllers at each step by assuming that probability distributions over the other agents’ actions are known. While this assumption may not hold in general, it helps produce higher quality solutions in our test problems. 1.
Pointbased incremental pruning heuristic for solving finitehorizon DECPOMDPs
 In Proc. of the Eighth Int. Joint Conf. on Autonomous Agents and Multiagent Systems
, 2009
"... Recent scaling up of decentralized partially observable Markov decision process (DECPOMDP) solvers towards realistic applications is mainly due to approximate methods. Of this family, MEMORY BOUNDED DYNAMIC PROGRAMMING (MBDP), which combines in a suitable manner topdown heuristics and bottomup va ..."
Abstract

Cited by 16 (5 self)
 Add to MetaCart
Recent scaling up of decentralized partially observable Markov decision process (DECPOMDP) solvers towards realistic applications is mainly due to approximate methods. Of this family, MEMORY BOUNDED DYNAMIC PROGRAMMING (MBDP), which combines in a suitable manner topdown heuristics and bottomup value function updates, can solve DECPOMDPs with large horizons. The performances of MBDP, can be, however, drastically improved by avoiding the systematic generation and evaluation of all possible policies which result from the exhaustive backup. To achieve that, we suggest a heuristic search method, namely POINT BASED INCREMENTAL PRUNING (PBIP), which is able to distinguish policies with different heuristic estimates. Taking this insight into account, PBIP searches only among the most promising policies, finds those useful, and prunes dominated ones. Doing so permits us to reduce clearly the amount of computation required by the exhaustive backup. The computation experiment shows that PBIP solves DECPOMDP benchmarks up to 800 times faster than the current best approximate algorithms, while providing solutions with higher values.
Not all agents are equal: Scaling up distributed POMDPs for agent networks
 In: Proceedings of the seventh international
, 2008
"... Many applications of networks of agents, including mobile sensor networks, unmanned air vehicles, autonomous underwater vehicles, involve 100s of agents acting collaboratively under uncertainty. Distributed Partially Observable Markov Decision Problems (Distributed POMDPs) are wellsuited to address ..."
Abstract

Cited by 16 (3 self)
 Add to MetaCart
Many applications of networks of agents, including mobile sensor networks, unmanned air vehicles, autonomous underwater vehicles, involve 100s of agents acting collaboratively under uncertainty. Distributed Partially Observable Markov Decision Problems (Distributed POMDPs) are wellsuited to address such applications, but so far, only limited scaleups of up to five agents have been demonstrated. This paper escalates the scaleup, presenting an algorithm called FANS, increasing the number of agents in distributed POMDPs for the first time into double digits. FANS is founded on finite state machines (FSMs) for policy representation and expoits these FSMs to provide three key contributions: (i) Not all agents within an agent network need the same expressivity of policy representation; FANS introduces novel heuristics to automatically vary the FSM size in different agents for scaleup;
Formal models and algorithms for decentralized control of multiple agents
 Journal of Autonomous Agents and MultiAgent Systems
, 2008
"... Over the last five years, the AI community has shown considerable interest in decentralized control of multiple decision makers or “agents ” under uncertainty. This problem arises in many application domains, such as multirobot coordination, manufacturing, information gathering, and load balancing. ..."
Abstract

Cited by 15 (9 self)
 Add to MetaCart
Over the last five years, the AI community has shown considerable interest in decentralized control of multiple decision makers or “agents ” under uncertainty. This problem arises in many application domains, such as multirobot coordination, manufacturing, information gathering, and load balancing. Such problems must be treated as decentralized decision problems because each agent may have different partial information about the other agents and about the state of the world. It has been shown that these problems are significantly harder than their centralized counterparts, requiring new formal models and algorithms to be developed. Rapid progress in recent years has produced a number of different frameworks, complexity results, and planning algorithms. The objectives of this paper are to provide a comprehensive overview of these results, to compare and contrast the existing frameworks, and to provide a deeper understanding of their relationships with one another, their strengths, and their weaknesses. While we focus on cooperative systems, we do point out important connections with gametheoretic approaches. We analyze five different formal frameworks, three different optimal algorithms, as well as a series of approximation techniques. The paper provides interesting insights into the structure of decentralized problems, the expressiveness of
Distributed model shaping for scaling to decentralized POMDPs with hundreds of agents
 In Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS
, 2011
"... The use of distributed POMDPs for cooperative teams has been severely limited by the incredibly large joint policyspace that results from combining the policyspaces of the individual agents. However, much of the computational cost of exploring the entire joint policy space can be avoided by observi ..."
Abstract

Cited by 12 (6 self)
 Add to MetaCart
The use of distributed POMDPs for cooperative teams has been severely limited by the incredibly large joint policyspace that results from combining the policyspaces of the individual agents. However, much of the computational cost of exploring the entire joint policy space can be avoided by observing that in many domains important interactions between agents occur in a relatively small set of scenarios, previously defined as coordination locales (CLs) [11]. Moreover, even when numerous interactions might occur, given a set of individual policies there are relatively few actual interactions. Exploiting this observation and building on an existing model shaping algorithm, this paper presents DTREMOR, an algorithm in which cooperative agents iteratively generate individual policies, identify and communicate possible