Results 1 - 10
of
13
Scaling Up Optimal Heuristic Search in Dec-POMDPs via Incremental Expansion
"... Planning under uncertainty for multiagent systems can be formalized as a decentralized partially observable Markov decision process. We advance the state of the art for optimal solution of this model, building on the Multiagent A * heuristic search method. A key insight is that we can avoid the full ..."
Abstract
-
Cited by 23 (15 self)
- Add to MetaCart
(Show Context)
Planning under uncertainty for multiagent systems can be formalized as a decentralized partially observable Markov decision process. We advance the state of the art for optimal solution of this model, building on the Multiagent A * heuristic search method. A key insight is that we can avoid the full expansion of a search node that generates a number of children that is doubly exponential in the node’s depth. Instead, we incrementally expand the children only when a next child might have the highest heuristic value. We target a subsequent bottleneck by introducing a more memory-efficient representation for our heuristic functions. Proof is given that the resulting algorithm is correct and experiments demonstrate a significant speedup over the state of the art, allowing for optimal solutions over longer horizons for many benchmark problems. 1
Incremental Clustering and Expansion for Faster Optimal Planning in Decentralized POMDPs
, 2013
"... This article presents the state-of-the-art in optimal solution methods for decentralized partially observable Markov decision processes (Dec-POMDPs), which are general models for collaborative multiagent planning under uncertainty. Building off the generalized multiagent A * (GMAA*) algorithm, which ..."
Abstract
-
Cited by 18 (12 self)
- Add to MetaCart
(Show Context)
This article presents the state-of-the-art in optimal solution methods for decentralized partially observable Markov decision processes (Dec-POMDPs), which are general models for collaborative multiagent planning under uncertainty. Building off the generalized multiagent A * (GMAA*) algorithm, which reduces the problem to a tree of one-shot collaborative Bayesian games (CBGs), we describe several advances that greatly expand the range of Dec-POMDPs that can be solved optimally. First, we introduce lossless incremental clustering of the CBGs solved by GMAA*, which achieves exponential speedups without sacrificing optimality. Second, we introduce incremental expansion of nodes in the GMAA * search tree, which avoids the need to expand all children, the number of which is in the worst case doubly exponential in the node’s depth. This is particularly beneficial when little clustering is possible. In addition, we introduce new hybrid heuristic representations that are more compact and thereby enable the solution of larger Dec-POMDPs. We provide theoretical guarantees that, when a suitable heuristic is used, both incremental clustering and incremental expansion yield algorithms that are both complete and search equivalent. Finally, we present extensive empirical results demonstrating that GMAA*-ICE, an algorithm that synthesizes these advances, can optimally solve Dec-POMDPs of unprecedented size.
Bayesian Action-Graph Games
"... Games of incomplete information, or Bayesian games, are an important gametheoretic model and have many applications in economics. We propose Bayesian action-graph games (BAGGs), a novel graphical representation for Bayesian games. BAGGs can represent arbitrary Bayesian games, and furthermore can com ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
(Show Context)
Games of incomplete information, or Bayesian games, are an important gametheoretic model and have many applications in economics. We propose Bayesian action-graph games (BAGGs), a novel graphical representation for Bayesian games. BAGGs can represent arbitrary Bayesian games, and furthermore can compactly express Bayesian games exhibiting commonly encountered types of structure including symmetry, action- and type-specific utility independence, and probabilistic independence of type distributions. We provide an algorithm for computing expected utility in BAGGs, and discuss conditions under which the algorithm runs in polynomial time. Bayes-Nash equilibria of BAGGs can be computed by adapting existing algorithms for complete-information normal form games and leveraging our expected utility algorithm. We show both theoretically and empirically that our approaches improve significantly on the state of the art. 1
Exploiting structure in cooperative Bayesian games
- In UAI
, 2012
"... Cooperative Bayesian games (BGs) can model decision-making problems for teams of agents under imperfect information, but require space and computation time that is exponential in the number of agents. While agent independence has been used to mitigate these problems in perfect information settings, ..."
Abstract
-
Cited by 7 (7 self)
- Add to MetaCart
(Show Context)
Cooperative Bayesian games (BGs) can model decision-making problems for teams of agents under imperfect information, but require space and computation time that is exponential in the number of agents. While agent independence has been used to mitigate these problems in perfect information settings, we propose a novel approach for BGs based on the observation that BGs additionally possess a different types of structure, which we call type independence. We propose a factor graph representation that captures both forms of independence and present a theoretical analysis showing that non-serial dynamic programming cannot effectively exploit type independence, while Max-Sum can. Experimental results demonstrate that ourapproachcantacklecooperativeBayesian games of unprecedented size. 1
Point Based Value Iteration with Optimal Belief Compression for Dec-POMDPs
"... We present four major results towards solving decentralized partially observable Markov decision problems (DecPOMDPs) culminating in an algorithm that out-performs all existing algorithms on all but one standard infinite-horizon bench-mark problems. (1) We give an integer program that solves collabo ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
(Show Context)
We present four major results towards solving decentralized partially observable Markov decision problems (DecPOMDPs) culminating in an algorithm that out-performs all existing algorithms on all but one standard infinite-horizon bench-mark problems. (1) We give an integer program that solves collaborative Bayesian games (CBGs). The program is notable because its linear relaxation is very often integral. (2) We show that a DecPOMDP with bounded belief can be converted to a POMDP (albeit with actions exponential in the number of beliefs). These ac-tions correspond to strategies of a CBG. (3) We present a method to transform any DecPOMDP into a DecPOMDP with bounded beliefs (the number of beliefs is a free parameter) using optimal (not lossless) belief compression. (4) We show that the combination of these results opens the door for new classes of DecPOMDP al-gorithms based on previous POMDP algorithms. We choose one such algorithm, point-based valued iteration, and modify it to produce the first tractable value it-eration method for DecPOMDPs that outperforms existing algorithms. 1
Tree-based solution methods for multiagent POMDPs with delayed communication
- In Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence
, 2012
"... Multiagent Partially Observable Markov Decision Processes (MPOMDPs) provide a powerful framework for optimal de-cision making under the assumption of instantaneous com-munication. We focus on a delayed communication setting (MPOMDP-DC), in which broadcasted information is de-layed by at most one tim ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
(Show Context)
Multiagent Partially Observable Markov Decision Processes (MPOMDPs) provide a powerful framework for optimal de-cision making under the assumption of instantaneous com-munication. We focus on a delayed communication setting (MPOMDP-DC), in which broadcasted information is de-layed by at most one time step. This model allows agents to act on their most recent (private) observation. Such an as-sumption is a strict generalization over having agents wait un-til the global information is available and is more appropriate for applications in which response time is critical. In this set-ting, however, value function backups are significantly more costly, and naive application of incremental pruning, the core of many state-of-the-art optimal POMDP techniques, is in-tractable. In this paper, we overcome this problem by demon-strating that computation of the MPOMDP-DC backup can be structured as a tree and by introducing two novel tree-based pruning techniques that exploit this structure in an effective way. We experimentally show that these methods have the po-tential to outperform naive incremental pruning by orders of magnitude, allowing for the solution of larger problems. 1
Computing Convex Coverage Sets for Faster Multi-Objective Coordination
"... Abstract In this article, we propose new algorithms for multi-objective coordination graphs (MOCoGs). Key to the efficiency of these algorithms is that they compute a convex coverage set (CCS) instead of a Pareto coverage set (PCS). Not only is a CCS a sufficient solution set for a large class of p ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
Abstract In this article, we propose new algorithms for multi-objective coordination graphs (MOCoGs). Key to the efficiency of these algorithms is that they compute a convex coverage set (CCS) instead of a Pareto coverage set (PCS). Not only is a CCS a sufficient solution set for a large class of problems, it also has important characteristics that facilitate more efficient solutions. We propose two main algorithms for computing a CCS in MO-CoGs. Convex multi-objective variable elimination (CMOVE) computes a CCS by performing a series of agent eliminations, which can be seen as solving a series of local multi-objective subproblems. Variable elimination linear support (VELS) iteratively identifies the single weight vector w that can lead to the maximal possible improvement on a partial CCS and calls variable elimination to solve a scalarized instance of the problem for w. VELS is faster than CMOVE for small and medium numbers of objectives and can compute an ε-approximate CCS in a fraction of the runtime. In addition, we propose variants of these methods that employ AND/OR tree search instead of variable elimination to achieve memory efficiency. We analyze the runtime and space complexities of these methods, prove their correctness, and compare them empirically against a naive baseline and an existing PCS method, both in terms of memory-usage and runtime. Our results show that, by focusing on the CCS, these methods achieve much better scalability in the number of agents than the current state of the art.
Error-bounded approximations for infinite-horizon discounted decentralized POMDPs
- In Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases
, 2014
"... Abstract. We address decentralized stochastic control problems represented as decentralized partially observable Markov decision processes (Dec-POMDPs). This formalism provides a general model for decision-making under uncertainty in cooperative, decentralized settings, but the worst-case complexity ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Abstract. We address decentralized stochastic control problems represented as decentralized partially observable Markov decision processes (Dec-POMDPs). This formalism provides a general model for decision-making under uncertainty in cooperative, decentralized settings, but the worst-case complexity makes it difficult to solve optimally (NEXP-complete). Recent advances suggest recast-ing Dec-POMDPs into continuous-state and deterministic MDPs. In this form, however, states and actions are embedded into high-dimensional spaces, making accurate estimate of states and greedy selection of actions intractable for all but trivial-sized problems. The primary contribution of this paper is the first frame-work for error-monitoring during approximate estimation of states and selection of actions. Such a framework permits us to convert state-of-the-art exact methods into error-bounded algorithms, which results in a scalability increase as demon-strated by experiments over problems of unprecedented sizes.