Results 1  10
of
130
Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs
 In Proc. of Int. Joint Conference on Autonomous Agents and Multi Agent Systems
, 2004
"... Partially observable decentralized decision making in robot teams is fundamentally different from decision making in fully observable problems. Team members cannot simply apply singleagent solution techniques in parallel. Instead, we must turn to game theoretic frameworks to correctly model the pro ..."
Abstract

Cited by 92 (2 self)
 Add to MetaCart
Partially observable decentralized decision making in robot teams is fundamentally different from decision making in fully observable problems. Team members cannot simply apply singleagent solution techniques in parallel. Instead, we must turn to game theoretic frameworks to correctly model the problem. While partially observable stochastic games (POSGs) provide a solution model for decentralized robot teams, this model quickly becomes intractable. We propose an algorithm that approximates POSGs as a series of smaller, related Bayesian games, using heuristics such as QMDP to provide the future discounted value of actions. This algorithm trades off limited lookahead in uncertainty for computational feasibility, and results in policies that are locally optimal with respect to the selected heuristic. Empirical results are provided for both a simple problem for which the full POSG can also be constructed, as well as more complex, robotinspired, problems.
Optimal and approximate Qvalue functions for decentralized POMDPs
 J. Artificial Intelligence Research
"... Decisiontheoretic planning is a popular approach to sequential decision making problems, because it treats uncertainty in sensing and acting in a principled way. In singleagent frameworks like MDPs and POMDPs, planning can be carried out by resorting to Qvalue functions: an optimal Qvalue functi ..."
Abstract

Cited by 62 (27 self)
 Add to MetaCart
(Show Context)
Decisiontheoretic planning is a popular approach to sequential decision making problems, because it treats uncertainty in sensing and acting in a principled way. In singleagent frameworks like MDPs and POMDPs, planning can be carried out by resorting to Qvalue functions: an optimal Qvalue function Q ∗ is computed in a recursive manner by dynamic programming, and then an optimal policy is extracted from Q ∗. In this paper we study whether similar Qvalue functions can be defined for decentralized POMDP models (DecPOMDPs), and how policies can be extracted from such value functions. We define two forms of the optimal Qvalue function for DecPOMDPs: one that gives a normative description as the Qvalue function of an optimal pure joint policy and another one that is sequentially rational and thus gives a recipe for computation. This computation, however, is infeasible for all but the smallest problems. Therefore, we analyze various approximate Qvalue functions that allow for efficient computation. We describe how they relate, and we prove that they all provide an upper bound to the optimal Qvalue function Q ∗. Finally, unifying some previous approaches for solving DecPOMDPs, we describe a family of algorithms for extracting policies from such Qvalue functions, and perform an experimental evaluation on existing test problems, including a new firefighting benchmark problem. 1.
Game theoretic control for robot teams
 In Proc. of the IEEE International Conference on Robotics and Automation
, 2005
"... Abstract — In the real world, noisy sensors and limited communication make it difficult for robot teams to coordinate in tighty coupled tasks. Team members cannot simply apply singlerobot solution techniques for partially observable problems in parallel because they do not take into account the rec ..."
Abstract

Cited by 43 (1 self)
 Add to MetaCart
(Show Context)
Abstract — In the real world, noisy sensors and limited communication make it difficult for robot teams to coordinate in tighty coupled tasks. Team members cannot simply apply singlerobot solution techniques for partially observable problems in parallel because they do not take into account the recursive effect that reasoning about the beliefs of others has on policy generation. Instead, we must turn to a game theoretic approach to model the problem correctly. Partially observable stochastic games (POSGs) provide a solution model for decentralized robot teams, however, this model quickly becomes intractable. In previous work we presented an algorithm for lookahead search in POSGs. Here we present an extension which reduces computation during lookahead by clustering similar observation histories together. We show that by clustering histories which have similar profiles of predicted reward, we can greatly reduce the computation time required to solve a POSG while maintaining a good approximation to the optimal policy. We demonstrate the power of the clustering algorithm in a realtime robot controller as well as for a simple benchmark problem.
Interactiondriven Markov games for decentralized multiagent planning under uncertainty
 in Proc. AAMAS
, 2008
"... In this paper we propose interactiondriven Markov games (IDMGs), a new model for multiagent decision making under uncertainty. IDMGs aim at describing multiagent decision problems in which interaction among agents is a local phenomenon. To this purpose, we explicitly distinguish between situations ..."
Abstract

Cited by 34 (10 self)
 Add to MetaCart
(Show Context)
In this paper we propose interactiondriven Markov games (IDMGs), a new model for multiagent decision making under uncertainty. IDMGs aim at describing multiagent decision problems in which interaction among agents is a local phenomenon. To this purpose, we explicitly distinguish between situations in which agents should interact and situations in which they can afford to act independently. The agents are coupled through the joint rewards and joint transitions in the states in which they interact. The model combines several fundamental properties from transitionindependent DecMDPs and weakly coupled MDPs while allowing to address, in several aspects, more general problems. We introduce a fast approximate solution method for planning in IDMGs, exploiting their particular structure, and we illustrate its successful application on several large multiagent tasks.
Pointbased dynamic programming for DECPOMDPs
 In Proc. of the National Conference on Artificial Intelligence
, 2006
"... We introduce pointbased dynamic programming (DP) for decentralized partially observable Markov decision processes (DECPOMDPs), a new discrete DP algorithm for planning strategies for cooperative multiagent systems. Our approach makes a connection between optimal DP algorithms for partially obser ..."
Abstract

Cited by 31 (2 self)
 Add to MetaCart
(Show Context)
We introduce pointbased dynamic programming (DP) for decentralized partially observable Markov decision processes (DECPOMDPs), a new discrete DP algorithm for planning strategies for cooperative multiagent systems. Our approach makes a connection between optimal DP algorithms for partially observable stochastic games, and pointbased approximations for singleagent POMDPs. We show for the first time how relevant multiagent belief states can be computed. Building on this insight, we then show how the linear programming part in current multiagent DP algorithms can be avoided, and how multiagent DP can thus be applied to solve larger problems. We derive both an optimal and an approximated version of our algorithm, and we show its efficiency on test examples from the literature.
Graphical models for interactive POMDPs: representations and solutions
 AUTON AGENT MULTIAGENT SYST (2009) 18:376–416
, 2008
"... We develop new graphical representations for the problem of sequential decision making in partially observable multiagent environments, as formalized by interactive partially observable Markov decision processes (IPOMDPs). The graphical models called interactive influence diagrams (IIDs) and the ..."
Abstract

Cited by 31 (14 self)
 Add to MetaCart
We develop new graphical representations for the problem of sequential decision making in partially observable multiagent environments, as formalized by interactive partially observable Markov decision processes (IPOMDPs). The graphical models called interactive influence diagrams (IIDs) and their dynamic counterparts, interactive dynamic influence diagrams (IDIDs), seek to explicitly model the structure that is often present in realworld problems by decomposing the situation into chance and decision variables, and the dependencies between the variables. IDIDs generalize DIDs, which may be viewed as graphical representations of POMDPs, to multiagent settings in the same way that IPOMDPs generalize POMDPs. IDIDs may be used to compute the policy of an agent given its belief as the agent acts and observes in a setting that is populated by other interacting agents. Using several examples, we show how IIDs and IDIDs may be applied and demonstrate their usefulness. We also show how the models may be solved using the standard algorithms that are applicable to DIDs. Solving IDIDs exactly involves knowing the solutions of possible models of the other agents. The space of models grows exponentially with the number of time steps. We present a method of solving IDIDs approximately by limiting the number
Not all agents are equal: Scaling up distributed POMDPs for agent networks
 In: Proceedings of the seventh international
, 2008
"... Many applications of networks of agents, including mobile sensor networks, unmanned air vehicles, autonomous underwater vehicles, involve 100s of agents acting collaboratively under uncertainty. Distributed Partially Observable Markov Decision Problems (Distributed POMDPs) are wellsuited to address ..."
Abstract

Cited by 25 (4 self)
 Add to MetaCart
(Show Context)
Many applications of networks of agents, including mobile sensor networks, unmanned air vehicles, autonomous underwater vehicles, involve 100s of agents acting collaboratively under uncertainty. Distributed Partially Observable Markov Decision Problems (Distributed POMDPs) are wellsuited to address such applications, but so far, only limited scaleups of up to five agents have been demonstrated. This paper escalates the scaleup, presenting an algorithm called FANS, increasing the number of agents in distributed POMDPs for the first time into double digits. FANS is founded on finite state machines (FSMs) for policy representation and expoits these FSMs to provide three key contributions: (i) Not all agents within an agent network need the same expressivity of policy representation; FANS introduces novel heuristics to automatically vary the FSM size in different agents for scaleup;
Minimal Mental Models
 In Proceedings of the TwentySecond Conference on Artificial Intelligence, 1038–1044. Menlo Park
"... Abstract Agents must form and update mental models about each other in a wide range of domains: team coordination, plan recognition, social simulation, user modeling, games of incomplete information, etc. Existing research typically treats the problem of forming beliefs about other agents as an iso ..."
Abstract

Cited by 22 (2 self)
 Add to MetaCart
(Show Context)
Abstract Agents must form and update mental models about each other in a wide range of domains: team coordination, plan recognition, social simulation, user modeling, games of incomplete information, etc. Existing research typically treats the problem of forming beliefs about other agents as an isolated subproblem, where the modeling agent starts from an initial set of possible models for another agent and then maintains a belief about which of those models applies. This initial set of models is typically a full specification of possible agent types. Although such a rich space gives the modeling agent high accuracy in its beliefs, it will also incur high cost in maintaining those beliefs. In this paper, we demonstrate that by taking this modeling problem out of its isolation and placing it back within the overall decisionmaking context, the modeling agent can drastically reduce this rich model space without sacrificing any performance. Our approach comprises three methods. The first method clusters models that lead to the same behaviors in the modeling agent's decisionmaking context. The second method clusters models that may produce different behaviors, but produce equally preferred outcomes with respect to the utility of the modeling agent. The third technique sacrifices a fixed amount of accuracy by clustering models that lead to performance losses that are below a certain threshold. We illustrate our framework using a social simulation domain and demonstrate its value by showing the minimal mental model spaces that it generates.
Valuebased observation compression for decpomdps
 In Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems
, 2008
"... Representing agent policies compactly is essential for improving the scalability of multiagent planning algorithms. In this paper, we focus on developing a pruning technique that allows us to merge certain observations within agent policies, while minimizing loss of value. This is particularly impo ..."
Abstract

Cited by 21 (7 self)
 Add to MetaCart
(Show Context)
Representing agent policies compactly is essential for improving the scalability of multiagent planning algorithms. In this paper, we focus on developing a pruning technique that allows us to merge certain observations within agent policies, while minimizing loss of value. This is particularly important for solving finitehorizon decentralized POMDPs, where agent policies are represented as trees, and where the size of policy trees grows exponentially with the number of observations. We introduce a valuebased observation compression technique that prunes the least valuable observations while maintaining an error bound on the value lost as a result of pruning. We analyze the characteristics of this pruning strategy and show empirically that it is effective. Thus, we use compact policies to obtain significantly higher values compared with the best existing DECPOMDP algorithm.