Results 1  10
of
248
The Complexity of Decentralized Control of Markov Decision Processes
 Mathematics of Operations Research
, 2000
"... We consider decentralized control of Markov decision processes and give complexity bounds on the worstcase running time for algorithms that find optimal solutions. Generalizations of both the fullyobservable case and the partiallyobservable case that allow for decentralized control are described. ..."
Abstract

Cited by 287 (46 self)
 Add to MetaCart
We consider decentralized control of Markov decision processes and give complexity bounds on the worstcase running time for algorithms that find optimal solutions. Generalizations of both the fullyobservable case and the partiallyobservable case that allow for decentralized control are described. For even two agents, the finitehorizon problems corresponding to both of these models are hard for nondeterministic exponential time. These complexity results illustrate a fundamental difference between centralized and decentralized control of Markov decision processes. In contrast to the problems involving centralized control, the problems we consider provably do not admit polynomialtime algorithms. Furthermore, assuming EXP NEXP, the problems require superexponential time to solve in the worst case.
Acting Optimally in Partially Observable Stochastic Domains
, 1994
"... In this paper, we describe the partially observable Markov decision process (pomdp) approach to finding optimal or nearoptimal control strategies for partially observable stochastic environments, given a complete model of the environment. The pomdp approach was originally developed in the oper ..."
Abstract

Cited by 274 (16 self)
 Add to MetaCart
In this paper, we describe the partially observable Markov decision process (pomdp) approach to finding optimal or nearoptimal control strategies for partially observable stochastic environments, given a complete model of the environment. The pomdp approach was originally developed in the operations research community and provides a formal basis for planning problems that have been of interest to the AI community. We found the existing algorithms for computing optimal control strategies to be highly computationally inefficient and have developed a new algorithm that is empirically more efficient. We sketch this algorithm and present preliminary results on several small problems that illustrate important properties of the pomdp approach.
Learning policies for partially observable environments: Scaling up
, 1995
"... Partially observable Markov decision processes (pomdp's) model decision problems in which an agent tries to maximize its reward in the face of limited and/or noisy sensor feedback. While the study of pomdp's is motivated by a need to address realistic problems, existing techniques for finding optim ..."
Abstract

Cited by 234 (11 self)
 Add to MetaCart
Partially observable Markov decision processes (pomdp's) model decision problems in which an agent tries to maximize its reward in the face of limited and/or noisy sensor feedback. While the study of pomdp's is motivated by a need to address realistic problems, existing techniques for finding optimal behavior do not appear to scale well and have been unable to find satisfactory policies for problems with more than a dozen states. After a brief review of pomdp's, this paper discusses several simple solution methods and shows that all are capable of finding nearoptimal policies for a selection of extremely small pomdp's taken from the learning literature. In contrast, we show that none are able to solve a slightly larger and noisier problem based on robot navigation. We find that a combination of two novel approaches performs well on these problems and suggest methods for scaling to even larger and more complicated domains. 1 Introduction Mobile robots must act on the basis of thei...
Acting under Uncertainty: Discrete Bayesian Models for MobileRobot Navigation
 In Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems
, 1996
"... Discrete Bayesian models have been used to model uncertainty for mobilerobot navigation, but the question of how actions should be chosen remains largely unexplored. This paper presents the optimal solution to the problem, formulated as a partially observable Markov decision process. Since solving ..."
Abstract

Cited by 183 (12 self)
 Add to MetaCart
Discrete Bayesian models have been used to model uncertainty for mobilerobot navigation, but the question of how actions should be chosen remains largely unexplored. This paper presents the optimal solution to the problem, formulated as a partially observable Markov decision process. Since solving for the optimal control policy is intractable, in general, it goes on to explore a variety of heuristic control strategies. The control strategies are compared experimentally, both in simulation and in runs on a robot. 1 Introduction A robot that delivers items and performs errands in an office environment needs to be able to navigate robustly. It should be able to overcome errors in perception and action, at worst getting lost for some period of time, but then being able to recover by relocalizing itself and continuing with its task. The Bayesian framework is particularly appropriate for modeling the robot's belief about its location (or, more generally, the state of the world). It suppl...
The Communicative Multiagent Team Decision Problem: Analyzing Teamwork Theories and Models
 Journal of Artificial Intelligence Research
, 2002
"... Despite the significant progress in multiagent teamwork, existing research does not address the optimality of its prescriptions nor the complexity of the teamwork problem. Without a characterization of the optimalitycomplexity tradeoffs, it is impossible to determine whether the assumptions and app ..."
Abstract

Cited by 181 (22 self)
 Add to MetaCart
Despite the significant progress in multiagent teamwork, existing research does not address the optimality of its prescriptions nor the complexity of the teamwork problem. Without a characterization of the optimalitycomplexity tradeoffs, it is impossible to determine whether the assumptions and approximations made by a particular theory gain enough efficiency to justify the losses in overall performance. To provide a tool for use by multiagent researchers in evaluating this tradeoff, we present a unified framework, the COMmunicative Multiagent Team Decision Problem (COMMTDP). The COMMTDP model combines and extends existing multiagent theories, such as decentralized partially observable Markov decision processes and economic team theory. In addition to their generality of representation, COMMTDPs also support the analysis of both the optimality of team performance and the computational complexity of the agents' decision problem. In analyzing complexity, we present a breakdown of the computational complexity of constructing optimal teams under various classes of problem domains, along the dimensions of observability and communication cost. In analyzing optimality, we exploit the COMMTDP's ability to encode existing teamwork theories and models to encode two instantiations of joint intentions theory taken from the literature. Furthermore, the COMMTDP model provides a basis for the development of novel team coordination algorithms. We derive a domainindependent criterion for optimal communication and provide a comparative analysis of the two joint intentions instantiations with respect to this optimal policy. We have implemented a reusable, domainindependent software package based on COMMTDPs to analyze teamwork coordination strategies, and we demons...
Algorithms for Sequential Decision Making
, 1996
"... Sequential decision making is a fundamental task faced by any intelligent agent in an extended interaction with its environment; it is the act of answering the question "What should I do now?" In this thesis, I show how to answer this question when "now" is one of a finite set of states, "do" is one ..."
Abstract

Cited by 175 (8 self)
 Add to MetaCart
Sequential decision making is a fundamental task faced by any intelligent agent in an extended interaction with its environment; it is the act of answering the question "What should I do now?" In this thesis, I show how to answer this question when "now" is one of a finite set of states, "do" is one of a finite set of actions, "should" is maximize a longrun measure of reward, and "I" is an automated planning or learning system (agent). In particular,
Taming Decentralized POMDPs: Towards Efficient Policy Computation for Multiagent Settings
 In IJCAI
, 2003
"... The problem of deriving joint policies for a group of agents that maximize some joint reward function can be modeled as a decentralized partially observable Markov decision process (POMDP). ..."
Abstract

Cited by 155 (23 self)
 Add to MetaCart
The problem of deriving joint policies for a group of agents that maximize some joint reward function can be modeled as a decentralized partially observable Markov decision process (POMDP).
Perseus: Randomized pointbased value iteration for POMDPs
 Journal of Artificial Intelligence Research
, 2005
"... Partially observable Markov decision processes (POMDPs) form an attractive and principled framework for agent planning under uncertainty. Pointbased approximate techniques for POMDPs compute a policy based on a finite set of points collected in advance from the agent’s belief space. We present a ra ..."
Abstract

Cited by 141 (11 self)
 Add to MetaCart
Partially observable Markov decision processes (POMDPs) form an attractive and principled framework for agent planning under uncertainty. Pointbased approximate techniques for POMDPs compute a policy based on a finite set of points collected in advance from the agent’s belief space. We present a randomized pointbased value iteration algorithm called Perseus. The algorithm performs approximate value backup stages, ensuring that in each backup stage the value of each point in the belief set is improved; the key observation is that a single backup may improve the value of many belief points. Contrary to other pointbased methods, Perseus backs up only a (randomly selected) subset of points in the belief set, sufficient for improving the value of each belief point in the set. We show how the same idea can be extended to dealing with continuous action spaces. Experimental results show the potential of Perseus in large scale POMDP problems. 1.
Valuefunction approximations for partially observable Markov decision processes
 Journal of Artificial Intelligence Research
, 2000
"... Partially observable Markov decision processes (POMDPs) provide an elegant mathematical framework for modeling complex decision and planning problems in stochastic domains in which states of the system are observable only indirectly, via a set of imperfect or noisy observations. The modeling advanta ..."
Abstract

Cited by 127 (0 self)
 Add to MetaCart
Partially observable Markov decision processes (POMDPs) provide an elegant mathematical framework for modeling complex decision and planning problems in stochastic domains in which states of the system are observable only indirectly, via a set of imperfect or noisy observations. The modeling advantage of POMDPs, however, comes at a price — exact methods for solving them are computationally very expensive and thus applicable in practice only to very simple problems. We focus on efficient approximation (heuristic) methods that attempt to alleviate the computational problem and trade off accuracy for speed. We have two objectives here. First, we survey various approximation methods, analyze their properties and relations and provide some new insights into their differences. Second, we present a number of new approximation methods and novel refinements of existing techniques. The theoretical results are supported by experiments on a problem from the agent navigation domain. 1.
A Survey of Computational Complexity Results in Systems and Control
, 2000
"... The purpose of this paper is twofold: (a) to provide a tutorial introduction to some key concepts from the theory of computational complexity, highlighting their relevance to systems and control theory, and (b) to survey the relatively recent research activity lying at the interface between these fi ..."
Abstract

Cited by 116 (21 self)
 Add to MetaCart
The purpose of this paper is twofold: (a) to provide a tutorial introduction to some key concepts from the theory of computational complexity, highlighting their relevance to systems and control theory, and (b) to survey the relatively recent research activity lying at the interface between these fields. We begin with a brief introduction to models of computation, the concepts of undecidability, polynomial time algorithms, NPcompleteness, and the implications of intractability results. We then survey a number of problems that arise in systems and control theory, some of them classical, some of them related to current research. We discuss them from the point of view of computational complexity and also point out many open problems. In particular, we consider problems related to stability or stabilizability of linear systems with parametric uncertainty, robust control, timevarying linear systems, nonlinear and hybrid systems, and stochastic optimal control.