Results 1  10
of
60
Planning and acting in partially observable stochastic domains
 ARTIFICIAL INTELLIGENCE
, 1998
"... In this paper, we bring techniques from operations research to bear on the problem of choosing optimal actions in partially observable stochastic domains. We begin by introducing the theory of Markov decision processes (mdps) and partially observable mdps (pomdps). We then outline a novel algorithm ..."
Abstract

Cited by 821 (30 self)
 Add to MetaCart
In this paper, we bring techniques from operations research to bear on the problem of choosing optimal actions in partially observable stochastic domains. We begin by introducing the theory of Markov decision processes (mdps) and partially observable mdps (pomdps). We then outline a novel algorithm for solving pomdps offline and show how, in some cases, a finitememory controller can be extracted from the solution to a pomdp. We conclude with a discussion of how our approach relates to previous work, the complexity of finding exact solutions to pomdps, and of some possibilities for finding approximate solutions.
Acting Optimally in Partially Observable Stochastic Domains
, 1994
"... In this paper, we describe the partially observable Markov decision process (pomdp) approach to finding optimal or nearoptimal control strategies for partially observable stochastic environments, given a complete model of the environment. The pomdp approach was originally developed in the oper ..."
Abstract

Cited by 274 (16 self)
 Add to MetaCart
In this paper, we describe the partially observable Markov decision process (pomdp) approach to finding optimal or nearoptimal control strategies for partially observable stochastic environments, given a complete model of the environment. The pomdp approach was originally developed in the operations research community and provides a formal basis for planning problems that have been of interest to the AI community. We found the existing algorithms for computing optimal control strategies to be highly computationally inefficient and have developed a new algorithm that is empirically more efficient. We sketch this algorithm and present preliminary results on several small problems that illustrate important properties of the pomdp approach.
Learning policies for partially observable environments: Scaling up
, 1995
"... Partially observable Markov decision processes (pomdp's) model decision problems in which an agent tries to maximize its reward in the face of limited and/or noisy sensor feedback. While the study of pomdp's is motivated by a need to address realistic problems, existing techniques for finding optim ..."
Abstract

Cited by 231 (11 self)
 Add to MetaCart
Partially observable Markov decision processes (pomdp's) model decision problems in which an agent tries to maximize its reward in the face of limited and/or noisy sensor feedback. While the study of pomdp's is motivated by a need to address realistic problems, existing techniques for finding optimal behavior do not appear to scale well and have been unable to find satisfactory policies for problems with more than a dozen states. After a brief review of pomdp's, this paper discusses several simple solution methods and shows that all are capable of finding nearoptimal policies for a selection of extremely small pomdp's taken from the learning literature. In contrast, we show that none are able to solve a slightly larger and noisier problem based on robot navigation. We find that a combination of two novel approaches performs well on these problems and suggest methods for scaling to even larger and more complicated domains. 1 Introduction Mobile robots must act on the basis of thei...
Acting under Uncertainty: Discrete Bayesian Models for MobileRobot Navigation
 In Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems
, 1996
"... Discrete Bayesian models have been used to model uncertainty for mobilerobot navigation, but the question of how actions should be chosen remains largely unexplored. This paper presents the optimal solution to the problem, formulated as a partially observable Markov decision process. Since solving ..."
Abstract

Cited by 183 (12 self)
 Add to MetaCart
Discrete Bayesian models have been used to model uncertainty for mobilerobot navigation, but the question of how actions should be chosen remains largely unexplored. This paper presents the optimal solution to the problem, formulated as a partially observable Markov decision process. Since solving for the optimal control policy is intractable, in general, it goes on to explore a variety of heuristic control strategies. The control strategies are compared experimentally, both in simulation and in runs on a robot. 1 Introduction A robot that delivers items and performs errands in an office environment needs to be able to navigate robustly. It should be able to overcome errors in perception and action, at worst getting lost for some period of time, but then being able to recover by relocalizing itself and continuing with its task. The Bayesian framework is particularly appropriate for modeling the robot's belief about its location (or, more generally, the state of the world). It suppl...
Algorithms for Sequential Decision Making
, 1996
"... Sequential decision making is a fundamental task faced by any intelligent agent in an extended interaction with its environment; it is the act of answering the question "What should I do now?" In this thesis, I show how to answer this question when "now" is one of a finite set of states, "do" is one ..."
Abstract

Cited by 175 (8 self)
 Add to MetaCart
Sequential decision making is a fundamental task faced by any intelligent agent in an extended interaction with its environment; it is the act of answering the question "What should I do now?" In this thesis, I show how to answer this question when "now" is one of a finite set of states, "do" is one of a finite set of actions, "should" is maximize a longrun measure of reward, and "I" is an automated planning or learning system (agent). In particular,
Incremental Pruning: A Simple, Fast, Exact Method for Partially Observable Markov Decision Processes
 In Proceedings of the Thirteenth Conference on Uncertainty in Artificial Intelligence
, 1997
"... Most exact algorithms for general partially observable Markov decision processes (pomdps) use a form of dynamic programming in which a piecewiselinear and convex representation of one value function is transformed into another. We examine variations of the "incremental pruning" method for solving t ..."
Abstract

Cited by 157 (10 self)
 Add to MetaCart
Most exact algorithms for general partially observable Markov decision processes (pomdps) use a form of dynamic programming in which a piecewiselinear and convex representation of one value function is transformed into another. We examine variations of the "incremental pruning" method for solving this problem and compare them to earlier algorithms from theoretical and empirical perspectives. We find that incremental pruning is presently the most efficient exact method for solving pomdps. 1 INTRODUCTION Partially observable Markov decision processes (pomdps) model decision theoretic planning problems in which an agent must make a sequence of decisions to maximize its utility given uncertainty in the effects of its actions and its current state (Cassandra, Kaelbling, & Littman 1994; White 1991). At any moment in time, the agent is in one of a finite set of possible states S and must choose one of a finite set of possible actions A. After taking action a 2 A from state s 2 S, the agent...
Perseus: Randomized pointbased value iteration for POMDPs
 Journal of Artificial Intelligence Research
, 2005
"... Partially observable Markov decision processes (POMDPs) form an attractive and principled framework for agent planning under uncertainty. Pointbased approximate techniques for POMDPs compute a policy based on a finite set of points collected in advance from the agent’s belief space. We present a ra ..."
Abstract

Cited by 141 (11 self)
 Add to MetaCart
Partially observable Markov decision processes (POMDPs) form an attractive and principled framework for agent planning under uncertainty. Pointbased approximate techniques for POMDPs compute a policy based on a finite set of points collected in advance from the agent’s belief space. We present a randomized pointbased value iteration algorithm called Perseus. The algorithm performs approximate value backup stages, ensuring that in each backup stage the value of each point in the belief set is improved; the key observation is that a single backup may improve the value of many belief points. Contrary to other pointbased methods, Perseus backs up only a (randomly selected) subset of points in the belief set, sufficient for improving the value of each belief point in the set. We show how the same idea can be extended to dealing with continuous action spaces. Experimental results show the potential of Perseus in large scale POMDP problems. 1.
Valuefunction approximations for partially observable Markov decision processes
 Journal of Artificial Intelligence Research
, 2000
"... Partially observable Markov decision processes (POMDPs) provide an elegant mathematical framework for modeling complex decision and planning problems in stochastic domains in which states of the system are observable only indirectly, via a set of imperfect or noisy observations. The modeling advanta ..."
Abstract

Cited by 127 (0 self)
 Add to MetaCart
Partially observable Markov decision processes (POMDPs) provide an elegant mathematical framework for modeling complex decision and planning problems in stochastic domains in which states of the system are observable only indirectly, via a set of imperfect or noisy observations. The modeling advantage of POMDPs, however, comes at a price — exact methods for solving them are computationally very expensive and thus applicable in practice only to very simple problems. We focus on efficient approximation (heuristic) methods that attempt to alleviate the computational problem and trade off accuracy for speed. We have two objectives here. First, we survey various approximation methods, analyze their properties and relations and provide some new insights into their differences. Second, we present a number of new approximation methods and novel refinements of existing techniques. The theoretical results are supported by experiments on a problem from the agent navigation domain. 1.
A POMDP Formulation of Preference Elicitation Problems
, 2002
"... Preference elicitation is a key problem facing the deployment of intelligent systems that make or recommend decisions on the behalf of users. Since not all aspects of a utility function have the same impact on objectlevel decision quality, determining which information to extract from a user i ..."
Abstract

Cited by 106 (23 self)
 Add to MetaCart
Preference elicitation is a key problem facing the deployment of intelligent systems that make or recommend decisions on the behalf of users. Since not all aspects of a utility function have the same impact on objectlevel decision quality, determining which information to extract from a user is itself a sequential decision problem, balancing the amount of elicitation effort and time with decision quality.
Bounded finite state controllers
 In NIPS
, 2004
"... We describe a new approximation algorithm for solving partially observable MDPs. Our bounded policy iteration approach searches through the space of boundedsize, stochastic finite state controllers, combining several advantages of gradient ascent (efficiency, search through restricted controller sp ..."
Abstract

Cited by 78 (12 self)
 Add to MetaCart
We describe a new approximation algorithm for solving partially observable MDPs. Our bounded policy iteration approach searches through the space of boundedsize, stochastic finite state controllers, combining several advantages of gradient ascent (efficiency, search through restricted controller space) and policy iteration (less vulnerability to local optima). 1