In this paper, we bring techniques from operations research to bear on the problem of choosing optimal actions in partially observable stochastic domains. We begin by introducing the theory of Markov decision processes (mdps) and partially observable mdps (pomdps). We then outline a novel algorithm for solving pomdps off line and show how, in some cases, a finite-memory controller can be extracted from the solution to a pomdp. We conclude with a discussion of the complexity of finding exact solutions to pomdps and of some possibilities for finding approximate solutions. Consider the problem of a robot navigating in a large office building. The robot can move from hallway intersection to intersection and can make local observations of its world. Its actions are not completely reliable, however. Sometimes, when it intends to move, it stays where it is or goes too far; sometimes, when it intends to turn, it overshoots. It has similar problems with observation. Sometimes a corridor looks...
|
2372
|
A tutorial on hidden Markov Models and selected applications in speech recognition
– Rabiner
- 1989
|
|
1921
|
Genetic Programming I : On the Programming of Computers by Means of Natural Selection
– Koza
- 1992
|
|
986
|
Theory of Linear and Integer Programming
– Schrijver
- 1986
|
|
954
|
A new approach to linear filtering and prediction problems
– Kalman
- 1960
|
|
759
|
Fast planning through planning graph analysis
– Blum, Furst
- 1997
|
|
376
|
UCPOP: A sound, complete, partial order planner for adl
– Penberthy, Weld
- 1992
|
|
374
|
Markov Decision Processes
– Puterman
- 1994
|
|
359
|
Dynamic Programming and Markov Processes
– Howard
- 1960
|
|
353
|
Systematic nonlinear planning
– McAllester, Rosenblitt
- 1991
|
|
295
|
Universal plans for reactive robots in unpredictable environments
– Schoppers
- 1987
|
|
265
|
A formal theory of knowledge and action
– Moore
- 1985
|
|
224
|
An algorithm for probabilistic planning
– Kushmerick, Hanks, et al.
- 1995
|
|
222
|
Dynamic Programming and Optimal Control. Athena Scienti c
– Bertsekas
- 1995
|
|
221
|
The optimal Control of Partially Observable Markov processes
– Sondik
- 1971
|
|
209
|
Acting optimally in partially observable stochastic domains
– Cassandra, Kaelbling
- 1994
|
|
187
|
Probabilistic planning with information gathering and contingent execution
– Draper, Hanks, et al.
- 1994
|
|
186
|
Conditional non-linear planning
– Peot, Smith
- 1992
|
|
172
|
Kaelbling. Learning policies for partially observable environments: Scaling up
– Littman, Cassandra, et al.
- 1995
|
|
165
|
The optimal control of partially observable Markov decision processes over a finite horizon
– Smallwood, Sondik
- 1973
|
|
157
|
Reinforcement Learning with Perceptual Aliasing: The Predictive Distinctions Approach
– Chrisman
- 1992
|
|
141
|
A survey of algorithmic methods for partially observable Markov decision processes
– Lovejoy
- 1991
|
|
136
|
Planning under time constraints in stochastic domains
– Dean, Kaelbling, et al.
- 1995
|
|
134
|
A survey of partially observable Markov decision processes: Theory, models and algorithms, Management Science 28
– Monahan
- 1982
|
|
130
|
Algorithms for sequential decision making
– Littman
- 1996
|
|
120
|
Hidden Markov Model induction by Bayesian model merging
– Stolcke, Omohundro
- 1993
|
|
114
|
Incremental pruning: A simple, fast, exact method for partially observable Markov decision processes
– Cassandra, Littman
- 1997
|
|
100
|
Anytime synthetic projection: Maximizing probability of goal satisfaction
– Drummond, Bresina
- 1990
|
|
95
|
The complexity of stochastic games
– Condon
- 1992
|
|
89
|
Overcoming incomplete perception with utile distinction memory
– McCallum
- 1993
|
|
86
|
Utility models for goal-directed decision-theoretic planners
– Haddawy, Hanks
- 1993
|
|
81
|
Optimal control of Markov decision processes with incomplete state estimation
– Aström
- 1965
|
|
81
|
Exact and Approximate Algorithms for Partially Observable Markov Decision Processes
– Cassandra
- 1998
|
|
81
|
G.: Planning for contingencies: A decision-based approach
– Pryor, Collins
- 1996
|
|
80
|
The frame problem and knowledge-producing actions
– Scherl, Levesque
- 1993
|
|
79
|
Memoryless policies: Theoretical limitations and practical results
– Littman
- 1994
|
|
78
|
Computing optimal policies for partially observable decision processes using compact representations
– Boutilier, Poole
- 1996
|
|
78
|
Instance-based utile distinctions for reinforcement learning with hidden state
– McCallum
- 1995
|
|
74
|
Information value theory
– Howard
- 1966
|
|
62
|
Algorithms for partially observable Markov decision processes
– Cheng
- 1988
|
|
57
|
Maxplan: A new approach to probabilistic planning
– Majercik, Littman
- 1998
|
|
56
|
Knowledge preconditions for actions and plans
– Morgenstern
|
|
54
|
Tight performance bounds on greedy policies based on imperfect value functions
– Williams, Baird
- 1993
|
|
52
|
Markov Decision Processes-Discrete Stochastic Dynamic Programming
– Puterman
- 1994
|
|
47
|
Planning with external events
– Blythe
- 1994
|
|
35
|
Conditional linear planning
– Goldman, Boddy
- 1994
|
|
34
|
Control strategies for a stochastic planner AAAI-94
– Tash, Russell
|
|
34
|
The complexity of mean payoff games on graphs
– Zwick, Paterson
- 1996
|
|
34
|
The witness algorithm: solving partially observable Markov decision processes
– Littman
- 1994
|
|
31
|
Epsilon-safe planning
– Goldman, Boddy
- 1994
|
|
25
|
Rewarding behaviors
– Bacchus, Boutilier, et al.
- 1996
|