Results 1  10
of
115
Planning and acting in partially observable stochastic domains
 ARTIFICIAL INTELLIGENCE
, 1998
"... In this paper, we bring techniques from operations research to bear on the problem of choosing optimal actions in partially observable stochastic domains. We begin by introducing the theory of Markov decision processes (mdps) and partially observable mdps (pomdps). We then outline a novel algorithm ..."
Abstract

Cited by 822 (31 self)
 Add to MetaCart
In this paper, we bring techniques from operations research to bear on the problem of choosing optimal actions in partially observable stochastic domains. We begin by introducing the theory of Markov decision processes (mdps) and partially observable mdps (pomdps). We then outline a novel algorithm for solving pomdps offline and show how, in some cases, a finitememory controller can be extracted from the solution to a pomdp. We conclude with a discussion of how our approach relates to previous work, the complexity of finding exact solutions to pomdps, and of some possibilities for finding approximate solutions.
DecisionTheoretic Planning: Structural Assumptions and Computational Leverage
 JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH
, 1999
"... Planning under uncertainty is a central problem in the study of automated sequential decision making, and has been addressed by researchers in many different fields, including AI planning, decision analysis, operations research, control theory and economics. While the assumptions and perspectives ..."
Abstract

Cited by 417 (4 self)
 Add to MetaCart
Planning under uncertainty is a central problem in the study of automated sequential decision making, and has been addressed by researchers in many different fields, including AI planning, decision analysis, operations research, control theory and economics. While the assumptions and perspectives adopted in these areas often differ in substantial ways, many planning problems of interest to researchers in these fields can be modeled as Markov decision processes (MDPs) and analyzed using the techniques of decision theory. This paper presents an overview and synthesis of MDPrelated methods, showing how they provide a unifying framework for modeling many classes of planning problems studied in AI. It also describes structural properties of MDPs that, when exhibited by particular classes of problems, can be exploited in the construction of optimal or approximately optimal policies or plans. Planning problems commonly possess structure in the reward and value functions used to de...
The Complexity of Decentralized Control of Markov Decision Processes
 Mathematics of Operations Research
, 2000
"... We consider decentralized control of Markov decision processes and give complexity bounds on the worstcase running time for algorithms that find optimal solutions. Generalizations of both the fullyobservable case and the partiallyobservable case that allow for decentralized control are described. ..."
Abstract

Cited by 287 (46 self)
 Add to MetaCart
We consider decentralized control of Markov decision processes and give complexity bounds on the worstcase running time for algorithms that find optimal solutions. Generalizations of both the fullyobservable case and the partiallyobservable case that allow for decentralized control are described. For even two agents, the finitehorizon problems corresponding to both of these models are hard for nondeterministic exponential time. These complexity results illustrate a fundamental difference between centralized and decentralized control of Markov decision processes. In contrast to the problems involving centralized control, the problems we consider provably do not admit polynomialtime algorithms. Furthermore, assuming EXP NEXP, the problems require superexponential time to solve in the worst case.
Pointbased value iteration: An anytime algorithm for POMDPs
, 2003
"... This paper introduces the PointBased Value Iteration (PBVI) algorithm for POMDP planning. PBVI approximates an exact value iteration solution by selecting a small set of representative belief points, and planning for those only. By using stochastic trajectories to choose belief points, and by ..."
Abstract

Cited by 243 (27 self)
 Add to MetaCart
This paper introduces the PointBased Value Iteration (PBVI) algorithm for POMDP planning. PBVI approximates an exact value iteration solution by selecting a small set of representative belief points, and planning for those only. By using stochastic trajectories to choose belief points, and by maintaining only one value hyperplane per point, it is able to successfully solve large problems, including the robotic Tag domain, a POMDP version of the popular game of lasertag.
Taming Decentralized POMDPs: Towards Efficient Policy Computation for Multiagent Settings
 In IJCAI
, 2003
"... The problem of deriving joint policies for a group of agents that maximize some joint reward function can be modeled as a decentralized partially observable Markov decision process (POMDP). ..."
Abstract

Cited by 155 (23 self)
 Add to MetaCart
The problem of deriving joint policies for a group of agents that maximize some joint reward function can be modeled as a decentralized partially observable Markov decision process (POMDP).
Perseus: Randomized pointbased value iteration for POMDPs
 Journal of Artificial Intelligence Research
, 2005
"... Partially observable Markov decision processes (POMDPs) form an attractive and principled framework for agent planning under uncertainty. Pointbased approximate techniques for POMDPs compute a policy based on a finite set of points collected in advance from the agent’s belief space. We present a ra ..."
Abstract

Cited by 141 (11 self)
 Add to MetaCart
Partially observable Markov decision processes (POMDPs) form an attractive and principled framework for agent planning under uncertainty. Pointbased approximate techniques for POMDPs compute a policy based on a finite set of points collected in advance from the agent’s belief space. We present a randomized pointbased value iteration algorithm called Perseus. The algorithm performs approximate value backup stages, ensuring that in each backup stage the value of each point in the belief set is improved; the key observation is that a single backup may improve the value of many belief points. Contrary to other pointbased methods, Perseus backs up only a (randomly selected) subset of points in the belief set, sufficient for improving the value of each belief point in the set. We show how the same idea can be extended to dealing with continuous action spaces. Experimental results show the potential of Perseus in large scale POMDP problems. 1.
Dynamic Programming for Partially Observable Stochastic Games
 IN PROCEEDINGS OF THE NINETEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE
, 2004
"... We develop an exact dynamic programming algorithm for partially observable stochastic games (POSGs). The algorithm is a synthesis of dynamic programming for partially observable Markov decision processes (POMDPs) and iterated elimination of dominated strategies in normal form games. ..."
Abstract

Cited by 119 (23 self)
 Add to MetaCart
We develop an exact dynamic programming algorithm for partially observable stochastic games (POSGs). The algorithm is a synthesis of dynamic programming for partially observable Markov decision processes (POMDPs) and iterated elimination of dominated strategies in normal form games.
Pointbased POMDP algorithms: Improved analysis and implementation
 in Proceedings of Uncertainty in Artificial Intelligence
"... Existing complexity bounds for pointbased POMDP value iteration algorithms focus either on the curse of dimensionality or the curse of history. We derive a new bound that relies on both and uses the concept of discounted reachability; our conclusions may help guide future algorithm design. We also ..."
Abstract

Cited by 106 (3 self)
 Add to MetaCart
Existing complexity bounds for pointbased POMDP value iteration algorithms focus either on the curse of dimensionality or the curse of history. We derive a new bound that relies on both and uses the concept of discounted reachability; our conclusions may help guide future algorithm design. We also discuss recent improvements to our (pointbased) heuristic search value iteration algorithm. Our new implementation calculates tighter initial bounds, avoids solving linear programs, and makes more effective use of sparsity. Empirical results show speedups of more than two orders of magnitude. 1
Heuristic Search Value Iteration for POMDPs
, 2004
"... We present a novel POMDP planning algorithm called heuristic search value iteration (HSVI). HSVI is ..."
Abstract

Cited by 95 (3 self)
 Add to MetaCart
We present a novel POMDP planning algorithm called heuristic search value iteration (HSVI). HSVI is
Bounded finite state controllers
 In NIPS
, 2004
"... We describe a new approximation algorithm for solving partially observable MDPs. Our bounded policy iteration approach searches through the space of boundedsize, stochastic finite state controllers, combining several advantages of gradient ascent (efficiency, search through restricted controller sp ..."
Abstract

Cited by 78 (12 self)
 Add to MetaCart
We describe a new approximation algorithm for solving partially observable MDPs. Our bounded policy iteration approach searches through the space of boundedsize, stochastic finite state controllers, combining several advantages of gradient ascent (efficiency, search through restricted controller space) and policy iteration (less vulnerability to local optima). 1