Results 1  10
of
248
The Complexity of Decentralized Control of Markov Decision Processes
 Mathematics of Operations Research
, 2000
"... We consider decentralized control of Markov decision processes and give complexity bounds on the worstcase running time for algorithms that find optimal solutions. Generalizations of both the fullyobservable case and the partiallyobservable case that allow for decentralized control are described. ..."
Abstract

Cited by 287 (46 self)
 Add to MetaCart
We consider decentralized control of Markov decision processes and give complexity bounds on the worstcase running time for algorithms that find optimal solutions. Generalizations of both the fullyobservable case and the partiallyobservable case that allow for decentralized control are described. For even two agents, the finitehorizon problems corresponding to both of these models are hard for nondeterministic exponential time. These complexity results illustrate a fundamental difference between centralized and decentralized control of Markov decision processes. In contrast to the problems involving centralized control, the problems we consider provably do not admit polynomialtime algorithms. Furthermore, assuming EXP NEXP, the problems require superexponential time to solve in the worst case.
Acting Optimally in Partially Observable Stochastic Domains
, 1994
"... In this paper, we describe the partially observable Markov decision process (pomdp) approach to finding optimal or nearoptimal control strategies for partially observable stochastic environments, given a complete model of the environment. The pomdp approach was originally developed in the oper ..."
Abstract

Cited by 275 (16 self)
 Add to MetaCart
In this paper, we describe the partially observable Markov decision process (pomdp) approach to finding optimal or nearoptimal control strategies for partially observable stochastic environments, given a complete model of the environment. The pomdp approach was originally developed in the operations research community and provides a formal basis for planning problems that have been of interest to the AI community. We found the existing algorithms for computing optimal control strategies to be highly computationally inefficient and have developed a new algorithm that is empirically more efficient. We sketch this algorithm and present preliminary results on several small problems that illustrate important properties of the pomdp approach.
Learning policies for partially observable environments: Scaling up
, 1995
"... Partially observable Markov decision processes (pomdp's) model decision problems in which an agent tries to maximize its reward in the face of limited and/or noisy sensor feedback. While the study of pomdp's is motivated by a need to address realistic problems, existing techniques for finding optim ..."
Abstract

Cited by 232 (11 self)
 Add to MetaCart
Partially observable Markov decision processes (pomdp's) model decision problems in which an agent tries to maximize its reward in the face of limited and/or noisy sensor feedback. While the study of pomdp's is motivated by a need to address realistic problems, existing techniques for finding optimal behavior do not appear to scale well and have been unable to find satisfactory policies for problems with more than a dozen states. After a brief review of pomdp's, this paper discusses several simple solution methods and shows that all are capable of finding nearoptimal policies for a selection of extremely small pomdp's taken from the learning literature. In contrast, we show that none are able to solve a slightly larger and noisier problem based on robot navigation. We find that a combination of two novel approaches performs well on these problems and suggest methods for scaling to even larger and more complicated domains. 1 Introduction Mobile robots must act on the basis of thei...
Acting under Uncertainty: Discrete Bayesian Models for MobileRobot Navigation
 In Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems
, 1996
"... Discrete Bayesian models have been used to model uncertainty for mobilerobot navigation, but the question of how actions should be chosen remains largely unexplored. This paper presents the optimal solution to the problem, formulated as a partially observable Markov decision process. Since solving ..."
Abstract

Cited by 183 (12 self)
 Add to MetaCart
Discrete Bayesian models have been used to model uncertainty for mobilerobot navigation, but the question of how actions should be chosen remains largely unexplored. This paper presents the optimal solution to the problem, formulated as a partially observable Markov decision process. Since solving for the optimal control policy is intractable, in general, it goes on to explore a variety of heuristic control strategies. The control strategies are compared experimentally, both in simulation and in runs on a robot. 1 Introduction A robot that delivers items and performs errands in an office environment needs to be able to navigate robustly. It should be able to overcome errors in perception and action, at worst getting lost for some period of time, but then being able to recover by relocalizing itself and continuing with its task. The Bayesian framework is particularly appropriate for modeling the robot's belief about its location (or, more generally, the state of the world). It suppl...
The Communicative Multiagent Team Decision Problem: Analyzing Teamwork Theories and Models
 Journal of Artificial Intelligence Research
, 2002
"... Despite the significant progress in multiagent teamwork, existing research does not address the optimality of its prescriptions nor the complexity of the teamwork problem. Without a characterization of the optimalitycomplexity tradeoffs, it is impossible to determine whether the assumptions and app ..."
Abstract

Cited by 182 (22 self)
 Add to MetaCart
Despite the significant progress in multiagent teamwork, existing research does not address the optimality of its prescriptions nor the complexity of the teamwork problem. Without a characterization of the optimalitycomplexity tradeoffs, it is impossible to determine whether the assumptions and approximations made by a particular theory gain enough efficiency to justify the losses in overall performance. To provide a tool for use by multiagent researchers in evaluating this tradeoff, we present a unified framework, the COMmunicative Multiagent Team Decision Problem (COMMTDP). The COMMTDP model combines and extends existing multiagent theories, such as decentralized partially observable Markov decision processes and economic team theory. In addition to their generality of representation, COMMTDPs also support the analysis of both the optimality of team performance and the computational complexity of the agents' decision problem. In analyzing complexity, we present a breakdown of the computational complexity of constructing optimal teams under various classes of problem domains, along the dimensions of observability and communication cost. In analyzing optimality, we exploit the COMMTDP's ability to encode existing teamwork theories and models to encode two instantiations of joint intentions theory taken from the literature. Furthermore, the COMMTDP model provides a basis for the development of novel team coordination algorithms. We derive a domainindependent criterion for optimal communication and provide a comparative analysis of the two joint intentions instantiations with respect to this optimal policy. We have implemented a reusable, domainindependent software package based on COMMTDPs to analyze teamwork coordination strategies, and we demons...
Algorithms for Sequential Decision Making
, 1996
"... Sequential decision making is a fundamental task faced by any intelligent agent in an extended interaction with its environment; it is the act of answering the question "What should I do now?" In this thesis, I show how to answer this question when "now" is one of a finite set of states, "do" is one ..."
Abstract

Cited by 177 (8 self)
 Add to MetaCart
Sequential decision making is a fundamental task faced by any intelligent agent in an extended interaction with its environment; it is the act of answering the question "What should I do now?" In this thesis, I show how to answer this question when "now" is one of a finite set of states, "do" is one of a finite set of actions, "should" is maximize a longrun measure of reward, and "I" is an automated planning or learning system (agent). In particular,
Taming Decentralized POMDPs: Towards Efficient Policy Computation for Multiagent Settings
 In IJCAI
, 2003
"... The problem of deriving joint policies for a group of agents that maximize some joint reward function can be modeled as a decentralized partially observable Markov decision process (POMDP). ..."
Abstract

Cited by 154 (23 self)
 Add to MetaCart
The problem of deriving joint policies for a group of agents that maximize some joint reward function can be modeled as a decentralized partially observable Markov decision process (POMDP).
Perseus: Randomized pointbased value iteration for POMDPs
 Journal of Artificial Intelligence Research
, 2005
"... Partially observable Markov decision processes (POMDPs) form an attractive and principled framework for agent planning under uncertainty. Pointbased approximate techniques for POMDPs compute a policy based on a finite set of points collected in advance from the agent’s belief space. We present a ra ..."
Abstract

Cited by 142 (11 self)
 Add to MetaCart
Partially observable Markov decision processes (POMDPs) form an attractive and principled framework for agent planning under uncertainty. Pointbased approximate techniques for POMDPs compute a policy based on a finite set of points collected in advance from the agent’s belief space. We present a randomized pointbased value iteration algorithm called Perseus. The algorithm performs approximate value backup stages, ensuring that in each backup stage the value of each point in the belief set is improved; the key observation is that a single backup may improve the value of many belief points. Contrary to other pointbased methods, Perseus backs up only a (randomly selected) subset of points in the belief set, sufficient for improving the value of each belief point in the set. We show how the same idea can be extended to dealing with continuous action spaces. Experimental results show the potential of Perseus in large scale POMDP problems. 1.
Valuefunction approximations for partially observable Markov decision processes
 Journal of Artificial Intelligence Research
, 2000
"... Partially observable Markov decision processes (POMDPs) provide an elegant mathematical framework for modeling complex decision and planning problems in stochastic domains in which states of the system are observable only indirectly, via a set of imperfect or noisy observations. The modeling advanta ..."
Abstract

Cited by 128 (0 self)
 Add to MetaCart
Partially observable Markov decision processes (POMDPs) provide an elegant mathematical framework for modeling complex decision and planning problems in stochastic domains in which states of the system are observable only indirectly, via a set of imperfect or noisy observations. The modeling advantage of POMDPs, however, comes at a price — exact methods for solving them are computationally very expensive and thus applicable in practice only to very simple problems. We focus on efficient approximation (heuristic) methods that attempt to alleviate the computational problem and trade off accuracy for speed. We have two objectives here. First, we survey various approximation methods, analyze their properties and relations and provide some new insights into their differences. Second, we present a number of new approximation methods and novel refinements of existing techniques. The theoretical results are supported by experiments on a problem from the agent navigation domain. 1.
Cooperative MultiAgent Learning: The State of the Art
 Autonomous Agents and MultiAgent Systems
, 2005
"... Cooperative multiagent systems are ones in which several agents attempt, through their interaction, to jointly solve tasks or to maximize utility. Due to the interactions among the agents, multiagent problem complexity can rise rapidly with the number of agents or their behavioral sophistication. ..."
Abstract

Cited by 115 (6 self)
 Add to MetaCart
Cooperative multiagent systems are ones in which several agents attempt, through their interaction, to jointly solve tasks or to maximize utility. Due to the interactions among the agents, multiagent problem complexity can rise rapidly with the number of agents or their behavioral sophistication. The challenge this presents to the task of programming solutions to multiagent systems problems has spawned increasing interest in machine learning techniques to automate the search and optimization process. We provide a broad survey of the cooperative multiagent learning literature. Previous surveys of this area have largely focused on issues common to specific subareas (for example, reinforcement learning or robotics). In this survey we attempt to draw from multiagent learning work in a spectrum of areas, including reinforcement learning, evolutionary computation, game theory, complex systems, agent modeling, and robotics. We find that this broad view leads to a division of the work into two categories, each with its own special issues: applying a single learner to discover joint solutions to multiagent problems (team learning), or using multiple simultaneous learners, often one per agent (concurrent learning). Additionally, we discuss direct and indirect communication in connection with learning, plus open issues in task decomposition, scalability, and adaptive dynamics. We conclude with a presentation of multiagent learning problem domains, and a list of multiagent learning resources. 1