Results 1 - 10
of
102
Planning and acting in partially observable stochastic domains
- ARTIFICIAL INTELLIGENCE
, 1998
"... In this paper, we bring techniques from operations research to bear on the problem of choosing optimal actions in partially observable stochastic domains. We begin by introducing the theory of Markov decision processes (mdps) and partially observable mdps (pomdps). We then outline a novel algorithm ..."
Abstract
-
Cited by 629 (24 self)
- Add to MetaCart
In this paper, we bring techniques from operations research to bear on the problem of choosing optimal actions in partially observable stochastic domains. We begin by introducing the theory of Markov decision processes (mdps) and partially observable mdps (pomdps). We then outline a novel algorithm for solving pomdps offline and show how, in some cases, a finite-memory controller can be extracted from the solution to a pomdp. We conclude with a discussion of how our approach relates to previous work, the complexity of finding exact solutions to pomdps, and of some possibilities for finding approximate solutions.
Decision-Theoretic Planning: Structural Assumptions and Computational Leverage
- JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH
, 1999
"... Planning under uncertainty is a central problem in the study of automated sequential decision making, and has been addressed by researchers in many different fields, including AI planning, decision analysis, operations research, control theory and economics. While the assumptions and perspectives ..."
Abstract
-
Cited by 342 (3 self)
- Add to MetaCart
Planning under uncertainty is a central problem in the study of automated sequential decision making, and has been addressed by researchers in many different fields, including AI planning, decision analysis, operations research, control theory and economics. While the assumptions and perspectives adopted in these areas often differ in substantial ways, many planning problems of interest to researchers in these fields can be modeled as Markov decision processes (MDPs) and analyzed using the techniques of decision theory. This paper presents an overview and synthesis of MDP-related methods, showing how they provide a unifying framework for modeling many classes of planning problems studied in AI. It also describes structural properties of MDPs that, when exhibited by particular classes of problems, can be exploited in the construction of optimal or approximately optimal policies or plans. Planning problems commonly possess structure in the reward and value functions used to de...
The Complexity of Decentralized Control of Markov Decision Processes
- Mathematics of Operations Research
, 2000
"... We consider decentralized control of Markov decision processes and give complexity bounds on the worst-case running time for algorithms that find optimal solutions. Generalizations of both the fullyobservable case and the partially-observable case that allow for decentralized control are described. ..."
Abstract
-
Cited by 198 (37 self)
- Add to MetaCart
We consider decentralized control of Markov decision processes and give complexity bounds on the worst-case running time for algorithms that find optimal solutions. Generalizations of both the fullyobservable case and the partially-observable case that allow for decentralized control are described. For even two agents, the finite-horizon problems corresponding to both of these models are hard for nondeterministic exponential time. These complexity results illustrate a fundamental difference between centralized and decentralized control of Markov decision processes. In contrast to the problems involving centralized control, the problems we consider provably do not admit polynomial-time algorithms. Furthermore, assuming EXP NEXP, the problems require super-exponential time to solve in the worst case.
Point-based value iteration: An anytime algorithm for POMDPs
, 2003
"... This paper introduces the Point-Based Value Iteration (PBVI) algorithm for POMDP planning. PBVI approximates an exact value iteration solution by selecting a small set of representative belief points, and planning for those only. By using stochastic trajectories to choose belief points, and by ..."
Abstract
-
Cited by 185 (20 self)
- Add to MetaCart
This paper introduces the Point-Based Value Iteration (PBVI) algorithm for POMDP planning. PBVI approximates an exact value iteration solution by selecting a small set of representative belief points, and planning for those only. By using stochastic trajectories to choose belief points, and by maintaining only one value hyperplane per point, it is able to successfully solve large problems, including the robotic Tag domain, a POMDP version of the popular game of lasertag.
Taming Decentralized POMDPs: Towards Efficient Policy Computation for Multiagent Settings
- In IJCAI
, 2003
"... The problem of deriving joint policies for a group of agents that maximize some joint reward function can be modeled as a decentralized partially observable Markov decision process (POMDP). ..."
Abstract
-
Cited by 122 (19 self)
- Add to MetaCart
The problem of deriving joint policies for a group of agents that maximize some joint reward function can be modeled as a decentralized partially observable Markov decision process (POMDP).
Perseus: Randomized point-based value iteration for POMDPs
- Journal of Artificial Intelligence Research
, 2005
"... Partially observable Markov decision processes (POMDPs) form an attractive and principled framework for agent planning under uncertainty. Point-based approximate techniques for POMDPs compute a policy based on a finite set of points collected in advance from the agent’s belief space. We present a ra ..."
Abstract
-
Cited by 111 (8 self)
- Add to MetaCart
Partially observable Markov decision processes (POMDPs) form an attractive and principled framework for agent planning under uncertainty. Point-based approximate techniques for POMDPs compute a policy based on a finite set of points collected in advance from the agent’s belief space. We present a randomized point-based value iteration algorithm called Perseus. The algorithm performs approximate value backup stages, ensuring that in each backup stage the value of each point in the belief set is improved; the key observation is that a single backup may improve the value of many belief points. Contrary to other point-based methods, Perseus backs up only a (randomly selected) subset of points in the belief set, sufficient for improving the value of each belief point in the set. We show how the same idea can be extended to dealing with continuous action spaces. Experimental results show the potential of Perseus in large scale POMDP problems. 1.
Dynamic Programming for Partially Observable Stochastic Games
- IN PROCEEDINGS OF THE NINETEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE
, 2004
"... We develop an exact dynamic programming algorithm for partially observable stochastic games (POSGs). The algorithm is a synthesis of dynamic programming for partially observable Markov decision processes (POMDPs) and iterated elimination of dominated strategies in normal form games. ..."
Abstract
-
Cited by 89 (18 self)
- Add to MetaCart
We develop an exact dynamic programming algorithm for partially observable stochastic games (POSGs). The algorithm is a synthesis of dynamic programming for partially observable Markov decision processes (POMDPs) and iterated elimination of dominated strategies in normal form games.
Point-Based POMDP Algorithms: Improved Analysis And Implementation
"... Existing complexity bounds for point-based POMDP value iteration algorithms focus either on the curse of dimensionality or the curse of history. We derive ..."
Abstract
-
Cited by 69 (1 self)
- Add to MetaCart
Existing complexity bounds for point-based POMDP value iteration algorithms focus either on the curse of dimensionality or the curse of history. We derive
Heuristic Search Value Iteration for POMDPs
, 2004
"... We present a novel POMDP planning algorithm called heuristic search value iteration (HSVI). HSVI is ..."
Abstract
-
Cited by 66 (1 self)
- Add to MetaCart
We present a novel POMDP planning algorithm called heuristic search value iteration (HSVI). HSVI is
MAXPLAN: A New Approach to Probabilistic Planning
, 1998
"... Classical arti#cial intelligence planning techniques can operate in large domains but traditionally assume a deterministic universe. Operations research planning techniques can operate in probabilistic domains but break when the domains approach realistic sizes. maxplan is a new probabilistic ..."
Abstract
-
Cited by 63 (5 self)
- Add to MetaCart
Classical arti#cial intelligence planning techniques can operate in large domains but traditionally assume a deterministic universe. Operations research planning techniques can operate in probabilistic domains but break when the domains approach realistic sizes. maxplan is a new probabilistic planning technique that aims at combining the best of these twoworlds. maxplan converts a planning instance into an E-Majsat instance, and then draws on techniques from Boolean satis#ability and dynamic programming to solve the E-Majsat instance. E-Majsat is an NP PP -complete problem that is essentially a probabilistic version of Sat. maxplan performs as much as an order of magnitude better on some standard stochastic test problems than buridan---a state-of-the-art probabilistic planner---and scales better on one test problem than two algorithms based on dynamic programming. INTRODUCTION Classical arti#cial intelligence planning techniques can operate in large domains but, t...

