Results 1 - 10
of
178
Planning and acting in partially observable stochastic domains
- ARTIFICIAL INTELLIGENCE
, 1998
"... In this paper, we bring techniques from operations research to bear on the problem of choosing optimal actions in partially observable stochastic domains. We begin by introducing the theory of Markov decision processes (mdps) and partially observable mdps (pomdps). We then outline a novel algorithm ..."
Abstract
-
Cited by 629 (24 self)
- Add to MetaCart
In this paper, we bring techniques from operations research to bear on the problem of choosing optimal actions in partially observable stochastic domains. We begin by introducing the theory of Markov decision processes (mdps) and partially observable mdps (pomdps). We then outline a novel algorithm for solving pomdps offline and show how, in some cases, a finite-memory controller can be extracted from the solution to a pomdp. We conclude with a discussion of how our approach relates to previous work, the complexity of finding exact solutions to pomdps, and of some possibilities for finding approximate solutions.
Decision-Theoretic Planning: Structural Assumptions and Computational Leverage
- JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH
, 1999
"... Planning under uncertainty is a central problem in the study of automated sequential decision making, and has been addressed by researchers in many different fields, including AI planning, decision analysis, operations research, control theory and economics. While the assumptions and perspectives ..."
Abstract
-
Cited by 342 (3 self)
- Add to MetaCart
Planning under uncertainty is a central problem in the study of automated sequential decision making, and has been addressed by researchers in many different fields, including AI planning, decision analysis, operations research, control theory and economics. While the assumptions and perspectives adopted in these areas often differ in substantial ways, many planning problems of interest to researchers in these fields can be modeled as Markov decision processes (MDPs) and analyzed using the techniques of decision theory. This paper presents an overview and synthesis of MDP-related methods, showing how they provide a unifying framework for modeling many classes of planning problems studied in AI. It also describes structural properties of MDPs that, when exhibited by particular classes of problems, can be exploited in the construction of optimal or approximately optimal policies or plans. Planning problems commonly possess structure in the reward and value functions used to de...
Learning policies for partially observable environments: Scaling up
, 1995
"... Partially observable Markov decision processes (pomdp's) model decision problems in which an agent tries to maximize its reward in the face of limited and/or noisy sensor feedback. While the study of pomdp's is motivated by a need to address realistic problems, existing techniques for finding optim ..."
Abstract
-
Cited by 202 (10 self)
- Add to MetaCart
Partially observable Markov decision processes (pomdp's) model decision problems in which an agent tries to maximize its reward in the face of limited and/or noisy sensor feedback. While the study of pomdp's is motivated by a need to address realistic problems, existing techniques for finding optimal behavior do not appear to scale well and have been unable to find satisfactory policies for problems with more than a dozen states. After a brief review of pomdp's, this paper discusses several simple solution methods and shows that all are capable of finding near-optimal policies for a selection of extremely small pomdp's taken from the learning literature. In contrast, we show that none are able to solve a slightly larger and noisier problem based on robot navigation. We find that a combination of two novel approaches performs well on these problems and suggest methods for scaling to even larger and more complicated domains. 1 Introduction Mobile robots must act on the basis of thei...
Probabilistic Planning with Information Gathering and Contingent Execution
, 1994
"... Most AI representations and algorithms for plan generation have not included the concept of information-producing actions (also called diagnostics, or tests, in the decision making literature). We present a planning representation and algorithm that models information-producing actions and construct ..."
Abstract
-
Cited by 192 (16 self)
- Add to MetaCart
Most AI representations and algorithms for plan generation have not included the concept of information-producing actions (also called diagnostics, or tests, in the decision making literature). We present a planning representation and algorithm that models information-producing actions and constructs plans that exploit the information produced by those actions. We extend the buridan (Kushmerick et al. 1994) probabilistic planning algorithm, adapting the action representation to model the behavior of imperfect sensors, and combine it with a framework for contingent action that extends the cnlp algorithm (Peot and Smith 1992) for conditional execution. The result, c-buridan, is an implemented planner that builds plans with probabilistic information-producing actions and contingent execution.
Planning with Incomplete Information as Heuristic Search in Belief Space
, 2000
"... The formulation of planning as heuristic search with heuristics derived from problem representations has turned out to be a fruitful approach for classical planning. In this paper, we pursue a similar idea in the context planning with incomplete information. Planning with incomplete information ..."
Abstract
-
Cited by 174 (23 self)
- Add to MetaCart
The formulation of planning as heuristic search with heuristics derived from problem representations has turned out to be a fruitful approach for classical planning. In this paper, we pursue a similar idea in the context planning with incomplete information. Planning with incomplete information can be formulated as a problem of search in belief space, where belief states can be either sets of states or more generally probability distribution over states. While the formulation (as the formulation of classical planning as heuristic search) is not particularly novel, the contribution of this paper is to make it explicit, to test it over a number of domains, and to extend it to tasks like planning with sensing where the standard search algorithms do not apply. The resulting planner appears to be competitive with the most recent conformant and contingent planners (e.g., cgp, sgp, and cmbp) while at the same time is more general as it can handle probabilistic actions and se...
Algorithms for Sequential Decision Making
, 1996
"... Sequential decision making is a fundamental task faced by any intelligent agent in an extended interaction with its environment; it is the act of answering the question "What should I do now?" In this thesis, I show how to answer this question when "now" is one of a finite set of states, "do" is one ..."
Abstract
-
Cited by 158 (7 self)
- Add to MetaCart
Sequential decision making is a fundamental task faced by any intelligent agent in an extended interaction with its environment; it is the act of answering the question "What should I do now?" In this thesis, I show how to answer this question when "now" is one of a finite set of states, "do" is one of a finite set of actions, "should" is maximize a long-run measure of reward, and "I" is an automated planning or learning system (agent). In particular,
Incremental Pruning: A Simple, Fast, Exact Method for Partially Observable Markov Decision Processes
- In Proceedings of the Thirteenth Conference on Uncertainty in Artificial Intelligence
, 1997
"... Most exact algorithms for general partially observable Markov decision processes (pomdps) use a form of dynamic programming in which a piecewise-linear and convex representation of one value function is transformed into another. We examine variations of the "incremental pruning" method for solving t ..."
Abstract
-
Cited by 144 (9 self)
- Add to MetaCart
Most exact algorithms for general partially observable Markov decision processes (pomdps) use a form of dynamic programming in which a piecewise-linear and convex representation of one value function is transformed into another. We examine variations of the "incremental pruning" method for solving this problem and compare them to earlier algorithms from theoretical and empirical perspectives. We find that incremental pruning is presently the most efficient exact method for solving pomdps. 1 INTRODUCTION Partially observable Markov decision processes (pomdps) model decision theoretic planning problems in which an agent must make a sequence of decisions to maximize its utility given uncertainty in the effects of its actions and its current state (Cassandra, Kaelbling, & Littman 1994; White 1991). At any moment in time, the agent is in one of a finite set of possible states S and must choose one of a finite set of possible actions A. After taking action a 2 A from state s 2 S, the agent...
Stochastic Dynamic Programming with Factored Representations
, 1997
"... Markov decision processes(MDPs) have proven to be popular models for decision-theoretic planning, but standard dynamic programming algorithms for solving MDPs rely on explicit, state-based specifications and computations. To alleviate the combinatorial problems associated with such methods, we propo ..."
Abstract
-
Cited by 120 (9 self)
- Add to MetaCart
Markov decision processes(MDPs) have proven to be popular models for decision-theoretic planning, but standard dynamic programming algorithms for solving MDPs rely on explicit, state-based specifications and computations. To alleviate the combinatorial problems associated with such methods, we propose new representational and computational techniques for MDPs that exploit certain types of problem structure. We use dynamic Bayesian networks (with decision trees representing the local families of conditional probability distributions) to represent stochastic actions in an MDP, together with a decision-tree representation of rewards. Based on this representation, we develop versions of standard dynamic programming algorithms that directly manipulate decision-tree representations of policies and value functions. This generally obviates the need for state-by-state computation, aggregating states at the leaves of these trees and requiring computations only for each aggregate state. The key to these algorithms is a decision-theoretic generalization of classic regression analysis, in which we determine the features relevant to predicting expected value. We demonstrate the method empirically on several planning problems,

