Computer Science Department; University of Massachusetts
SVM HeaderParse 0.2
Amherst, MA 01003
SVM HeaderParse 0.1
Most algorithms for solving POMDPs iteratively improve a value function that implicitly represents a policy and are said to search in value function space. This paper presents an approach to solving POMDPs that represents a policy explicitly as a finite-state controller and iteratively improves the controller by search in policy space. Two related algorithms illustrate this approach. The first is a policy iteration algorithm that can outperform value iteration in solving infinitehorizon POMDPs. It provides the foundation for a new heuristic search algorithm that promises further speedup by focusing computational effort on regions of the problem space that are reachable, or likely to be reached, from a start state. 1 Introduction A partially observable Markov decision process (POMDP) provides an elegant mathematical model for planning and control problems for which there can be uncertainty about the effects of actions and about the current state. It is well-known that ...