## Incremental Pruning: A Simple, Fast, Exact Method for Partially Observable Markov Decision Processes (1997)

Venue: | In Proceedings of the Thirteenth Conference on Uncertainty in Artificial Intelligence |

Citations: | 166 - 10 self |

### BibTeX

@INPROCEEDINGS{Cassandra97incrementalpruning:,

author = {Anthony Cassandra and Michael L. Littman and Nevin L. Zhang},

title = {Incremental Pruning: A Simple, Fast, Exact Method for Partially Observable Markov Decision Processes},

booktitle = {In Proceedings of the Thirteenth Conference on Uncertainty in Artificial Intelligence},

year = {1997},

pages = {54--61},

publisher = {Morgan Kaufmann Publishers}

}

### Years of Citing Articles

### OpenURL

### Abstract

Most exact algorithms for general partially observable Markov decision processes (pomdps) use a form of dynamic programming in which a piecewise-linear and convex representation of one value function is transformed into another. We examine variations of the "incremental pruning" method for solving this problem and compare them to earlier algorithms from theoretical and empirical perspectives. We find that incremental pruning is presently the most efficient exact method for solving pomdps. 1 INTRODUCTION Partially observable Markov decision processes (pomdps) model decision theoretic planning problems in which an agent must make a sequence of decisions to maximize its utility given uncertainty in the effects of its actions and its current state (Cassandra, Kaelbling, & Littman 1994; White 1991). At any moment in time, the agent is in one of a finite set of possible states S and must choose one of a finite set of possible actions A. After taking action a 2 A from state s 2 S, the agent...

### Citations

350 |
The optimal control of partially observable Markov decision processes
- Sondik
- 1971
(Show Context)
Citation Context ...ramming updates are critical to such a wide array of pomdp algorithms, identifying fast algorithms is crucial. Several algorithms for dynamic-programming updates have been proposed, such as one pass (=-=Sondik 1971-=-), exhaustive (Monahan 1982), linear support (Cheng 1988), and witness (Littman, Cassandra, & Kaelbling 1996). Cheng (1988) gave experimental evidence that the linear support algorithm is more efficie... |

314 |
The optimal control of partially observable markov processes over a finite horizon
- Smallwood, Sondik
- 1973
(Show Context)
Citation Context ... ways to approach this problem based on checking which information states can be reached (Washington 1996; Hansen 1994), searching for good controllers (Platzman 1981), and using dynamic programming (=-=Smallwood & Sondik 1973-=-; Cheng 1988; Monahan 1982; Littman, Cassandra, & Kaelbling 1996). Most exact algorithms for general pomdps use a form of dynamic programming in which a piecewiselinear and convex representation of on... |

290 | Acting optimally in partially observable stochastic domains - Cassandra, Kaelbling, et al. - 1994 |

277 | An algorithm for probabilistic planning - Kushmerick, Hanks, et al. - 1995 |

204 |
A survey of partially observable markov decisions processes: Theory, models, and algorithms
- Monahan
- 1982
(Show Context)
Citation Context ...on checking which information states can be reached (Washington 1996; Hansen 1994), searching for good controllers (Platzman 1981), and using dynamic programming (Smallwood & Sondik 1973; Cheng 1988; =-=Monahan 1982-=-; Littman, Cassandra, & Kaelbling 1996). Most exact algorithms for general pomdps use a form of dynamic programming in which a piecewiselinear and convex representation of one value function is transf... |

201 | Reinforcement Learning with Perceptual Aliasing - Chrisman - 1992 |

121 | Approximating optimal policies for partially observable stochastic domains - Parr, Russell - 1995 |

117 | Computing optimal policies for partially observable decision processes using compact representations
- Boutilier, Poole
- 1996
(Show Context)
Citation Context ...dps via value iteration (Sawaki & Ichikawa 1978; Cassandra, Kaelbling, & Littman 1994), policy iteration (Sondik 1978), accelerated value iteration (White & Scherer 1989), structured representations (=-=Boutilier & Poole 1996-=-), and approximation (Zhang & Liu 1996). Because dynamicprogramming updates are critical to such a wide array of pomdp algorithms, identifying fast algorithms is crucial. Several algorithms for dynami... |

109 | Overcoming Incomplete Perception with Utile Distinction Memory - McCallum - 1993 |

74 |
Algorithms for Partially Observable Markov Decision Processes
- Cheng
- 1988
(Show Context)
Citation Context ...oblem based on checking which information states can be reached (Washington 1996; Hansen 1994), searching for good controllers (Platzman 1981), and using dynamic programming (Smallwood & Sondik 1973; =-=Cheng 1988-=-; Monahan 1982; Littman, Cassandra, & Kaelbling 1996). Most exact algorithms for general pomdps use a form of dynamic programming in which a piecewiselinear and convex representation of one value func... |

27 |
Partially observed Markov decision processes: A survey
- White
- 1991
(Show Context)
Citation Context ...nning problems in which an agent must make a sequence of decisions to maximize its utility given uncertainty in the effects of its actions and its current state (Cassandra, Kaelbling, & Littman 1994; =-=White 1991-=-). At any moment in time, the agent is in one of a finite set of possible states S and must choose one of a finite set of possible actions A. After taking action a 2 A from state s 2 S, the agent rece... |

26 | Efficient dynamicprogramming updates in partially observable Markov decision processes,” Brown University - Littman, Cassandra, et al. - 1995 |

23 |
Solution procedures for partially observed Markov decision processes
- White, Scherer
- 1989
(Show Context)
Citation Context ...to another. This includes algorithms that solve pomdps via value iteration (Sawaki & Ichikawa 1978; Cassandra, Kaelbling, & Littman 1994), policy iteration (Sondik 1978), accelerated value iteration (=-=White & Scherer 1989-=-), structured representations (Boutilier & Poole 1996), and approximation (Zhang & Liu 1996). Because dynamicprogramming updates are critical to such a wide array of pomdp algorithms, identifying fast... |

19 | Cost-effective sensing during plan execution
- Hansen
- 1994
(Show Context)
Citation Context ... which controls how much future rewards count compared to near-term rewards) . There are many ways to approach this problem based on checking which information states can be reached (Washington 1996; =-=Hansen 1994-=-), searching for good controllers (Platzman 1981), and using dynamic programming (Smallwood & Sondik 1973; Cheng 1988; Monahan 1982; Littman, Cassandra, & Kaelbling 1996). Most exact algorithms for ge... |

13 | Incremental markov-model planning
- Washington
- 1996
(Show Context)
Citation Context ...he discount rate, which controls how much future rewards count compared to near-term rewards) . There are many ways to approach this problem based on checking which information states can be reached (=-=Washington 1996-=-; Hansen 1994), searching for good controllers (Platzman 1981), and using dynamic programming (Smallwood & Sondik 1973; Cheng 1988; Monahan 1982; Littman, Cassandra, & Kaelbling 1996). Most exact algo... |

11 |
A feasible computational approach to infinite-horizon partially-observed Markov decision problems
- Platzman
- 1981
(Show Context)
Citation Context ...compared to near-term rewards) . There are many ways to approach this problem based on checking which information states can be reached (Washington 1996; Hansen 1994), searching for good controllers (=-=Platzman 1981-=-), and using dynamic programming (Smallwood & Sondik 1973; Cheng 1988; Monahan 1982; Littman, Cassandra, & Kaelbling 1996). Most exact algorithms for general pomdps use a form of dynamic programming i... |

6 |
Optimal control for partially observable Markov decision processes over an in nite horizon
- Sawaki, Ichikawa
- 1978
(Show Context)
Citation Context ...e a form of dynamic programming in which a piecewiselinear and convex representation of one value function is transformed into another. This includes algorithms that solve pomdps via value iteration (=-=Sawaki & Ichikawa 1978-=-; Cassandra, Kaelbling, & Littman 1994), policy iteration (Sondik 1978), accelerated value iteration (White & Scherer 1989), structured representations (Boutilier & Poole 1996), and approximation (Zha... |