Results 11  20
of
166
Exploiting Structure to Efficiently Solve Large Scale Partially Observable Markov Decision Processes
, 2005
"... Partially observable Markov decision processes (POMDPs) provide a natural and principled framework to model a wide range of sequential decision making problems under uncertainty. To date, the use of POMDPs in realworld problems has been limited by the poor scalability of existing solution algorithm ..."
Abstract

Cited by 91 (6 self)
 Add to MetaCart
Partially observable Markov decision processes (POMDPs) provide a natural and principled framework to model a wide range of sequential decision making problems under uncertainty. To date, the use of POMDPs in realworld problems has been limited by the poor scalability of existing solution algorithms, which can only solve problems with up to ten thousand states. In fact, the complexity of finding an optimal policy for a finitehorizon discrete POMDP is PSPACEcomplete. In practice, two important sources of intractability plague most solution algorithms: large policy spaces and large state spaces. On the other hand,
Finding Approximate POMDP Solutions Through Belief Compression
, 2003
"... Standard value function approaches to finding policies for Partially Observable Markov Decision Processes (POMDPs) are generally considered to be intractable for large models. The intractability of these algorithms is to a large extent a consequence of computing an exact, optimal policy over the ent ..."
Abstract

Cited by 85 (3 self)
 Add to MetaCart
(Show Context)
Standard value function approaches to finding policies for Partially Observable Markov Decision Processes (POMDPs) are generally considered to be intractable for large models. The intractability of these algorithms is to a large extent a consequence of computing an exact, optimal policy over the entire belief space. However, in realworld POMDP problems, computing the optimal policy for the full belief space is often unnecessary for good control even for problems with complicated policy classes. The beliefs experienced by the controller often lie near a structured, lowdimensional manifold embedded in the highdimensional belief space. Finding a good approximation to the optimal value function for only this manifold can be much easier than computing the full value function. We introduce a new method for solving largescale POMDPs by reducing the dimensionality of the belief space. We use Exponential family Principal Components Analysis (Collins, Dasgupta, & Schapire, 2002) to represent sparse, highdimensional belief spaces using lowdimensional sets of learned features of the belief state. We then plan only in terms of the lowdimensional belief features. By planning in this lowdimensional space, we can find policies for POMDP models that are orders of magnitude larger than models that can be handled by conventional techniques. We demonstrate the use of this algorithm on a synthetic problem and on mobile robot navigation tasks. 1.
Exponential Family PCA for Belief Compression in POMDPs
 IN ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 15
, 2003
"... Standard value function approaches to finding policies for Partially Observable Markov Decision Processes (POMDPs) are intractable for large models. The intractability of these algorithms is due to a great extent to their generating an optimal policy over the entire belief space. However, in real ..."
Abstract

Cited by 82 (11 self)
 Add to MetaCart
(Show Context)
Standard value function approaches to finding policies for Partially Observable Markov Decision Processes (POMDPs) are intractable for large models. The intractability of these algorithms is due to a great extent to their generating an optimal policy over the entire belief space. However, in real POMDP problems most belief states are unlikely, and there is a structured, lowdimensional manifold of plausible beliefs embedded in the highdimensional belief space. We introduce
Optimal and approximate Qvalue functions for decentralized POMDPs
 J. Artificial Intelligence Research
"... Decisiontheoretic planning is a popular approach to sequential decision making problems, because it treats uncertainty in sensing and acting in a principled way. In singleagent frameworks like MDPs and POMDPs, planning can be carried out by resorting to Qvalue functions: an optimal Qvalue functi ..."
Abstract

Cited by 62 (26 self)
 Add to MetaCart
(Show Context)
Decisiontheoretic planning is a popular approach to sequential decision making problems, because it treats uncertainty in sensing and acting in a principled way. In singleagent frameworks like MDPs and POMDPs, planning can be carried out by resorting to Qvalue functions: an optimal Qvalue function Q ∗ is computed in a recursive manner by dynamic programming, and then an optimal policy is extracted from Q ∗. In this paper we study whether similar Qvalue functions can be defined for decentralized POMDP models (DecPOMDPs), and how policies can be extracted from such value functions. We define two forms of the optimal Qvalue function for DecPOMDPs: one that gives a normative description as the Qvalue function of an optimal pure joint policy and another one that is sequentially rational and thus gives a recipe for computation. This computation, however, is infeasible for all but the smallest problems. Therefore, we analyze various approximate Qvalue functions that allow for efficient computation. We describe how they relate, and we prove that they all provide an upper bound to the optimal Qvalue function Q ∗. Finally, unifying some previous approaches for solving DecPOMDPs, we describe a family of algorithms for extracting policies from such Qvalue functions, and perform an experimental evaluation on existing test problems, including a new firefighting benchmark problem. 1.
An Improved GridBased Approximation Algorithm for POMDPs
, 2001
"... Although a partially observable Markov decision process (POMDP) provides an appealing model for problems of planning under uncertainty, exact algorithms for POMDPs are intractable. This motivates work on approximation algorithms, and gridbased approximation is a widelyused approach. We descri ..."
Abstract

Cited by 53 (0 self)
 Add to MetaCart
Although a partially observable Markov decision process (POMDP) provides an appealing model for problems of planning under uncertainty, exact algorithms for POMDPs are intractable. This motivates work on approximation algorithms, and gridbased approximation is a widelyused approach. We describe a novel approach to gridbased approximation that uses a variableresolution regular grid, and show that it outperforms previous gridbased approaches to approximation. 1
Planning under Uncertainty for Robotic Tasks with Mixed Observability
"... Partially observable Markov decision processes (POMDPs) provide a principled, general framework for robot motion planning in uncertain and dynamic environments. They have been applied to various robotic tasks. However, solving POMDPs exactly is computationally intractable. A major challenge is to sc ..."
Abstract

Cited by 45 (4 self)
 Add to MetaCart
(Show Context)
Partially observable Markov decision processes (POMDPs) provide a principled, general framework for robot motion planning in uncertain and dynamic environments. They have been applied to various robotic tasks. However, solving POMDPs exactly is computationally intractable. A major challenge is to scale up POMDP algorithms for complex robotic tasks. Robotic systems often have mixed observability: even when a robot’s state is not fully observable, some components of the state may still be so. We use a factored model to represent separately the fully and partially observable components of a robot’s state and derive a compact lowerdimensional representation of its belief space. This factored representation can be combined with any pointbased algorithm to compute approximate POMDP solutions. Experimental results show that on standard test problems, our approach improves the performance of a leading pointbased POMDP algorithm by many times. 1
A Survey of POMDP Solution Techniques
, 2000
"... this paper, we assume all actions take one unit of discrete time at some (unspecied) time scale. If we allow actions to take variable lengths of time, we end up with a semiMarkov model; see e.g., [SPS99]. ..."
Abstract

Cited by 34 (0 self)
 Add to MetaCart
(Show Context)
this paper, we assume all actions take one unit of discrete time at some (unspecied) time scale. If we allow actions to take variable lengths of time, we end up with a semiMarkov model; see e.g., [SPS99].
Solving POMDPs: RTDPBel vs. Pointbased Algorithms
"... Pointbased algorithms and RTDPBel are approximate methods for solving POMDPs that replace the full updates of parallel value iteration by faster and more effective updates at selected beliefs. An important difference between the two methods is that the former adopt Sondik’s representation of the v ..."
Abstract

Cited by 33 (7 self)
 Add to MetaCart
Pointbased algorithms and RTDPBel are approximate methods for solving POMDPs that replace the full updates of parallel value iteration by faster and more effective updates at selected beliefs. An important difference between the two methods is that the former adopt Sondik’s representation of the value function, while the latter uses a tabular representation and a discretization function. The algorithms, however, have not been compared up to now, because they target different POMDPs: discounted POMDPs on the one hand, and Goal POMDPs on the other. In this paper, we bridge this representational gap, showing how to transform discounted POMDPs into Goal POMDPs, and use the transformation to compare RTDPBel with pointbased algorithms over the existing discounted benchmarks. The results appear to contradict the conventional wisdom in the area showing that RTDPBel is competitive, and sometimes superior to pointbased algorithms in both quality and time. 1
POMDPs for Robotic Tasks with Mixed Observability
"... Partially observable Markov decision processes (POMDPs) provide a principled mathematical framework for motion planning of autonomous robots in uncertain and dynamic environments. They have been successfully applied to various robotic tasks, but a major challenge is to scale up POMDP algorithms fo ..."
Abstract

Cited by 32 (4 self)
 Add to MetaCart
Partially observable Markov decision processes (POMDPs) provide a principled mathematical framework for motion planning of autonomous robots in uncertain and dynamic environments. They have been successfully applied to various robotic tasks, but a major challenge is to scale up POMDP algorithms for more complex robotic systems. Robotic systems often have mixed observability: even when a robot’s state is not fully observable, some components of the state may still be fully observable. Exploiting this, we use a factored model to represent separately the fully and partially observable components of a robot’s state and derive a compact lowerdimensional representation of its belief space. We then use this factored representation in conjunction with a pointbased algorithm to compute approximate POMDP solutions. Separating fully and partially observable state components using a factored model opens up several opportunities to improve the efficiency of pointbased POMDP algorithms. Experiments show that on standard test problems, our new algorithm is many times faster than a leading pointbased POMDP algorithm.