Results 1 - 10
of
13
Optimal and approximate Q-value functions for decentralized POMDPs
- J. Artificial Intelligence Research
"... Decision-theoretic planning is a popular approach to sequential decision making problems, because it treats uncertainty in sensing and acting in a principled way. In single-agent frameworks like MDPs and POMDPs, planning can be carried out by resorting to Q-value functions: an optimal Q-value functi ..."
Abstract
-
Cited by 22 (9 self)
- Add to MetaCart
Decision-theoretic planning is a popular approach to sequential decision making problems, because it treats uncertainty in sensing and acting in a principled way. In single-agent frameworks like MDPs and POMDPs, planning can be carried out by resorting to Q-value functions: an optimal Q-value function Q ∗ is computed in a recursive manner by dynamic programming, and then an optimal policy is extracted from Q ∗. In this paper we study whether similar Q-value functions can be defined for decentralized POMDP models (Dec-POMDPs), and how policies can be extracted from such value functions. We define two forms of the optimal Q-value function for Dec-POMDPs: one that gives a normative description as the Q-value function of an optimal pure joint policy and another one that is sequentially rational and thus gives a recipe for computation. This computation, however, is infeasible for all but the smallest problems. Therefore, we analyze various approximate Q-value functions that allow for efficient computation. We describe how they relate, and we prove that they all provide an upper bound to the optimal Q-value function Q ∗. Finally, unifying some previous approaches for solving Dec-POMDPs, we describe a family of algorithms for extracting policies from such Q-value functions, and perform an experimental evaluation on existing test problems, including a new firefighting benchmark problem. 1.
Multiagent Planning under Uncertainty with Stochastic Communication Delays
- In Proc. of the 18th Int. Conf. on Automated Planning and Scheduling
, 2008
"... We consider the problem of cooperative multiagent planning under uncertainty, formalized as a decentralized partially observable Markov decision process (Dec-POMDP). Unfortunately, in these models optimal planning is provably intractable. By communicating their local observations before they take ac ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
We consider the problem of cooperative multiagent planning under uncertainty, formalized as a decentralized partially observable Markov decision process (Dec-POMDP). Unfortunately, in these models optimal planning is provably intractable. By communicating their local observations before they take actions, agents synchronize their knowledge of the environment, and the planning problem reduces to a centralized POMDP. As such, relying on communication significantly reduces the complexity of planning. In the real world however, such communication might fail temporarily. We present a step towards more realistic communication models for Dec-POMDPs by proposing a model that: (1) allows that communication might be delayed by one or more time steps, and (2) explicitly considers future probabilities of successful communication. For our model, we discuss how to efficiently compute an (approximate) value function and corresponding policies, and we demonstrate our theoretical results with encouraging experiments.
Multi-Agent Online Planning with Communication
"... We propose an online algorithm for planning under uncertainty in multi-agent settings modeled as DEC-POMDPs. The algorithm helps overcome the high computational complexity of solving such problems off-line. The key challenge is to produce coordinated behavior using little or no communication. When c ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
We propose an online algorithm for planning under uncertainty in multi-agent settings modeled as DEC-POMDPs. The algorithm helps overcome the high computational complexity of solving such problems off-line. The key challenge is to produce coordinated behavior using little or no communication. When communication is allowed but constrained, the challenge is to produce high value with minimal communication. The algorithm addresses these challenges by communicating only when history inconsistency is detected, allowing communication to be postponed if necessary. Moreover, it bounds the memory usage at each step and can be applied to problems with arbitrary horizons. The experimental results confirm that the algorithm can solve problems that are too large for the best existing off-line planning algorithms and it outperforms the best online method, producing higher value with much less communication in most cases.
Myopic and Non-Myopic Communication Under
"... Abstract—In decentralized settings with partial observability, agents can often benefit from communicating, but communication resources may be limited and costly. Current approaches tend to dismiss or underestimate this cost, resulting in overcommunication. This paper presents a general framework to ..."
Abstract
- Add to MetaCart
Abstract—In decentralized settings with partial observability, agents can often benefit from communicating, but communication resources may be limited and costly. Current approaches tend to dismiss or underestimate this cost, resulting in overcommunication. This paper presents a general framework to compute the value of communicating from each agent’s local perspective, by comparing the expected reward with and without communication. In order to obtain these expectations, each agent must reason about the state and belief states of the other agents, both before and after communication. We show how this can be done in the context of decentralized POMDPs and discuss ways to mitigate a common myopic assumption, where agents tend to overcommunicate because they overlook the possibility that communication can be deferred or initiated by the other agents. The paper presents a theoretical framework to precisely quantify the value of communication and an effective algorithm to manage communication. Experimental results show that our approach performs well compared to other techniques suggested in the literature. I.
Value of Communication In Decentralized POMDPs
"... In decentralized settings with partial observability, agents can often benefit from communicating, but communication resources may be limited and costly. Current approaches tend to dismiss or underestimate this cost, resulting in overcommunication. This paper presents a general framework to compute ..."
Abstract
- Add to MetaCart
In decentralized settings with partial observability, agents can often benefit from communicating, but communication resources may be limited and costly. Current approaches tend to dismiss or underestimate this cost, resulting in overcommunication. This paper presents a general framework to compute the value of communicating from each agent’s local perspective, by comparing the expected reward with and without communication. In order to obtain these expectations, each agent must reason about the state and belief states of the other agents, both before and after communication. We show how this can be done in the context of decentralized POMDPs and discuss ways to mitigate a common myopic assumption, where agents tend to overcommunicate because they overlook the possibility that communication can be deferred or initiated by the other agents. The paper presents a theoretical framework to precisely quantify the value of communication and an effective algorithm to manage communication. Experimental results show that our approach performs well compared to other techniques suggested in the literature. 1.
On the Varying Benefit of Communication between Mobile Agents in a Decentralized Team
"... Abstract — Cooperative behavior in decentralized mobile agents requires communication among them. However, such communication can also have negative consequences for a variety of reasons, and the benefits of communication can vary over the course of the agent team’s task or mission. In this paper, w ..."
Abstract
- Add to MetaCart
Abstract — Cooperative behavior in decentralized mobile agents requires communication among them. However, such communication can also have negative consequences for a variety of reasons, and the benefits of communication can vary over the course of the agent team’s task or mission. In this paper, we compare the benefits of communication over time for a team of mobile agents performing a general search-and-engage mission. Monte Carlo simulations show that communication provides variable benefit at different times and under different circumstances. In particular, communication can sometimes lead to poorer performance if agents have imperfect sensors. I.
Coordinating Randomized Policies for Increasing Security in Multiagent Systems
"... Abstract. Despite significant recent advances in decision theoretic frameworks for reasoning about multiagent teams, little attention has been paid to applying such frameworks in adversarial domains, where the agent team may face security threats from other agents. This paper focuses on domains wher ..."
Abstract
- Add to MetaCart
Abstract. Despite significant recent advances in decision theoretic frameworks for reasoning about multiagent teams, little attention has been paid to applying such frameworks in adversarial domains, where the agent team may face security threats from other agents. This paper focuses on domains where such threats are caused by unseen adversaries whose actions or payoffs are unknown. In such domains, action randomization is recognized as a key technique to deteriorate an adversarys capability to predict and exploit an agent/agent teams actions. Unfortunately, there are two key challenges in such randomization. First, randomization can reduce the expected reward (quality) of the agent team’s plans, and thus we must provide some guarantees on such rewards. Second, randomization results in miscoordination in teams. While communication within an agent team can help in alleviating the miscoordination problem, communication is unavailable in many real domains or sometimes scarcely available. To address these challenges, this paper provides the following contributions. First, we recall the Multiagent Constrained MDP (MCMDP) framework that enables policy generation for a team of agents where each agent may have a limited or no(communication) resource. Second, since randomized policies generated directly for MCMDPs lead to miscoordination, we introduce a transformation algorithm that converts the MCMDP into a transformed MCMDP incorporating explicit communication and no communication actions. Third, we show that incorporating randomization results in a non-linear program and the unavailability/limited availability of communication results in addition of non-convex constraints to the non-linear program. Finally, we experimentally illustrate the benefits of our work.
Decentralized Monitoring of Distributed Anytime Algorithms ABSTRACT
"... Anytime algorithms allow a system to trade solution quality for computation time. In previous work, monitoring techniques have been developed to allow agents to stop the computation at the “right” time so as to optimize a given time-dependent utility function. However, these results apply only to th ..."
Abstract
- Add to MetaCart
Anytime algorithms allow a system to trade solution quality for computation time. In previous work, monitoring techniques have been developed to allow agents to stop the computation at the “right” time so as to optimize a given time-dependent utility function. However, these results apply only to the single-agent case. In this paper we analyze the problems that arise when several agents solve components of a larger problem, each using an anytime algorithm. Monitoring in this case is more challenging as each agent is uncertain about the progress made so far by the others. We develop a formal framework for decentralized monitoring, establish the complexity of several interesting variants of the problem, and propose solution techniques for each one. Finally, we show that the framework can be applied to decentralized flow and planning problems.

