## SPUDD: Stochastic planning using decision diagrams (1999)

### Cached

### Download Links

- [www.cs.toronto.edu]
- [www.cs.utoronto.ca]
- [www.cs.tufts.edu]
- [www.cs.ubc.ca]
- [www.cs.ubc.ca]
- [www.cs.toronto.edu]
- [www.cs.toronto.edu]
- [www.cs.toronto.edu]
- [www.cs.toronto.edu]
- [www.cs.utoronto.ca]
- [www.cs.utoronto.ca]
- [www.cs.ubc.ca]
- [www.cs.ubc.ca]
- [www.cs.ubc.ca]
- DBLP

### Other Repositories/Bibliography

Venue: | In Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence |

Citations: | 179 - 17 self |

### BibTeX

@INPROCEEDINGS{Hoey99spudd:stochastic,

author = {Jesse Hoey and Robert St-aubin and Alan Hu and Craig Boutilier},

title = {SPUDD: Stochastic planning using decision diagrams},

booktitle = {In Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence},

year = {1999},

pages = {279--288},

publisher = {Morgan Kaufmann}

}

### Years of Citing Articles

### OpenURL

### Abstract

Recently, structured methods for solving factored Markov decisions processes (MDPs) with large state spaces have been proposed recently to allow dynamic programming to be applied without the need for complete state enumeration. We propose and examine a new value iteration algorithm for MDPs that uses algebraic decision diagrams (ADDs) to represent value functions and policies, assuming an ADD input representation of the MDP. Dynamic programming is implemented via ADD manipulation. We demonstrate our method on a class of large MDPs (up to 63 million states) and show that significant gains can be had when compared to tree-structured representations (with up to a thirty-fold reduction in the number of nodes required to represent optimal value functions). 1

### Citations

2930 | Graph-based algorithms for Boolean function manipulation
- Bryant
- 1986
(Show Context)
Citation Context ...dynamic abstraction method for solving MDPs using algebraic decision diagrams (ADDs) [1] to represent value functions and policies. ADDs are generalizations of ordered binary decision diagrams (BDDs) =-=[10]-=- that allow non-boolean labels at terminal nodes. This representational technique allows one to describe a value function (or policy) as a function of the variables describing the domain rather than i... |

2611 |
Dynamic Programming
- Bellman
- 1957
(Show Context)
Citation Context ...sentations. We conclude in Section 6 with a discussion of future work in using ADDs for DTP. 2 Markov Decision Processes We assume that the domain of interest can be modeled as a fully-observable MDP =-=[2, 19]-=- with a finite set of statesS and actionsA. Actions induce stochastic state transitions, withPr(s;a;t)denoting the probability with which tis reached when actionais executed at states. We also assume ... |

1198 |
Markov Decision Processes: Discrete Stochastic Dynamic Programming
- Puterman
- 1994
(Show Context)
Citation Context ...e the semantic model of choice for decision theoretic planning (DTP) in the AI planning community. While classical computational methods for solving MDPs, such as value iteration and policy iteration =-=[19]-=-, are often effective for small problems, typical AI planning problems fall prey to Bellman’s curse of dimensionality: the size of the state space grows exponentially with the number of domain feature... |

457 |
A model for reasoning about persistence and causation
- T, Kanazawa
- 1989
(Show Context)
Citation Context ..., 12]. These methods automatically generate abstract MDPs by exploitingstructured representations, such as probabilistic STRIPS rules [16] or dynamic Bayesian network (DBN) representations of actions =-=[13, 7]-=-. In this paper, we describe a dynamic abstraction method for solving MDPs using algebraic decision diagrams (ADDs) [1] to represent value functions and policies. ADDs are generalizations of ordered b... |

288 | Context-specific independence in Bayesian networks
- Boutilier, Friedman, et al.
- 1996
(Show Context)
Citation Context ...PTs, we use ADDs to capture regularities in the CPTs (i.e., to represent the functionsPaX0i(X1:::Xn)). This type of to:Pbolt representation exploits context-specific independence in the distributions =-=[9]-=-, and is related to the use of tree representations APU0(APU)=APU1:0 +PLAPUBPU]BO]0:9 forAPU0 [7] and rule representations [18] of CPTs in DBNs. Figure 2(b) illustrates the ADD representation of the C... |

262 | Algebraic decision diagrams and their applications
- Bahar, Frohm, et al.
- 1993
(Show Context)
Citation Context ...RIPS rules [16] or dynamic Bayesian network (DBN) representations of actions [13, 7]. In this paper, we describe a dynamic abstraction method for solving MDPs using algebraic decision diagrams (ADDs) =-=[1]-=- to represent value functions and policies. ADDs are generalizations of ordered binary decision diagrams (BDDs) [10] that allow non-boolean labels at terminal nodes. This representational technique al... |

224 | Exploiting structure in policy construction
- Dearden, R, et al.
- 1995
(Show Context)
Citation Context ...te state as if they were identical [3]. Within AI, abstractiontechniques have been widely studied as a form of aggregation, where states are (implicitly) grouped by ignoring certain problem variables =-=[14, 7, 12]-=-. These methods automatically generate abstract MDPs by exploitingstructured representations, such as probabilistic STRIPS rules [16] or dynamic Bayesian network (DBN) representations of actions [13, ... |

147 |
Decision theoretic planning: Structural assumptions and computational leverage
- Boutilier, Dean, et al.
- 1999
(Show Context)
Citation Context ...feasible for feature-based planning problems. Considerable effort has been devoted to developing representational and computational methods for MDPs that obviate the need to enumerate the state space =-=[5]-=-. Aggregation methods do this by aggregating a set of states and treating the states within any aggregate state as if they were identical [3]. Within AI, abstractiontechniques have been widely studied... |

147 | Stochastic Dynamic Programming with Factored Representations
- Boutilier, Dearden, et al.
- 2000
(Show Context)
Citation Context ...state space sizes. The first example class consists of various adaptations of a process planning problem taken from [14]. The second and third example classes consist of synthetic problems taken from =-=[7, 8]-=-. These are designed to test best- and worst-case behavior of SPUDD.4 of0:9 The first example class consists of process planning problems taken from [14], involving a factory agent which must paint tw... |

105 | Model minimization in markov decision processes
- Dean, Givan
- 1997
(Show Context)
Citation Context ...te state as if they were identical [3]. Within AI, abstractiontechniques have been widely studied as a form of aggregation, where states are (implicitly) grouped by ignoring certain problem variables =-=[14, 7, 12]-=-. These methods automatically generate abstract MDPs by exploitingstructured representations, such as probabilistic STRIPS rules [16] or dynamic Bayesian network (DBN) representations of actions [13, ... |

87 | Automatic OBDD-based generation of universal plans in non-deterministic domains
- Cimatti, Roveri, et al.
- 1998
(Show Context)
Citation Context ...s in both space and computational time. We first introduce the algorithm in a conceptually clear way, and then describe certain optimizations. OBDDs have been explored in previous work in AI planning =-=[11]-=-, where universal plans (much like policies) are generated for nondeterministic domains. The motivation in that work, avoiding the combinatorial explosion associated with state space enumeration, is s... |

67 | Abstraction and approximate decision theoretic planning
- Dearden, Boutilier
- 1997
(Show Context)
Citation Context ...te state as if they were identical [3]. Within AI, abstractiontechniques have been widely studied as a form of aggregation, where states are (implicitly) grouped by ignoring certain problem variables =-=[14, 7, 12]-=-. These methods automatically generate abstract MDPs by exploitingstructured representations, such as probabilistic STRIPS rules [16] or dynamic Bayesian network (DBN) representations of actions [13, ... |

55 | Fattah, Y.: Topological parameters for time-space tradeoff
- Dechter, El
- 2001
(Show Context)
Citation Context ... point we start on the next subset. We note that this space-time tradeoff bears some resemblance to the space-time tradeoffs that arise in probabilistic inference algorithms like variable elimination =-=[15]-=-. Although we have not implemented heuristics for variable ordering, there are some simple ordering methods that could improve space efficiency. For instance, if we order variables so that primed vari... |

41 | Modeling a dynamic and uncertain world i: Symbolic and probabilistic reasoning about change
- Hanks, McDermott
- 1994
(Show Context)
Citation Context ... (implicitly) grouped by ignoring certain problem variables [14, 7, 12]. These methods automatically generate abstract MDPs by exploitingstructured representations, such as probabilistic STRIPS rules =-=[16]-=- or dynamic Bayesian network (DBN) representations of actions [13, 7]. In this paper, we describe a dynamic abstraction method for solving MDPs using algebraic decision diagrams (ADDs) [1] to represen... |

36 | Approximating value trees in structured dynamic programming
- Boutilier, Dearden
- 1996
(Show Context)
Citation Context ...mber of expected value computations and maximizations required by dynamic programming are greatly reduced. The algorithm described here derives from the structured policy iteration (SPI) algorithm of =-=[7, 6, 4]-=-, where decision trees are used to represent value functions and policies. Given a DBN action representation (with decision trees used to represent conditional probability tables) and a decision tree ... |

26 |
Adaptive Aggregation for Infinite Horizon Dynamic Programming
- Bertsekas, Castafion
- 1989
(Show Context)
Citation Context ...or MDPs that obviate the need to enumerate the state space [5]. Aggregation methods do this by aggregating a set of states and treating the states within any aggregate state as if they were identical =-=[3]-=-. Within AI, abstractiontechniques have been widely studied as a form of aggregation, where states are (implicitly) grouped by ignoring certain problem variables [14, 7, 12]. These methods automatical... |

16 | Correlated action effects in decision theoretic regression
- Boutilier
- 1997
(Show Context)
Citation Context ...mber of expected value computations and maximizations required by dynamic programming are greatly reduced. The algorithm described here derives from the structured policy iteration (SPI) algorithm of =-=[7, 6, 4]-=-, where decision trees are used to represent value functions and policies. Given a DBN action representation (with decision trees used to represent conditional probability tables) and a decision tree ... |

12 | Exploiting the rule structure for decision making within the independent choice logic
- Poole
- 1995
(Show Context)
Citation Context ...epresentation exploits context-specific independence in the distributions [9], and is related to the use of tree representations APU0(APU)=APU1:0 +PLAPUBPU]BO]0:9 forAPU0 [7] and rule representations =-=[18]-=- of CPTs in DBNs. Figure 2(b) illustrates the ADD representation of the CPT for two variables,C0andAPU0. While the distributionoverC0is a functionof its seven parent variables, this function exhibits ... |

11 |
Probabilistic reasoningin intelligent systems
- Pearl
- 1988
(Show Context)
Citation Context ... the system before actionahas been executed, denoting the state afterahas been executed. Directed arcs from variables inXto variables inX0indicate direct causal influence and have the usual semantics =-=[17, 13]-=-.3The conditional probability table (CPT) for each post-action variableX0idefines a conditional distributionPaX0ioverX0i— i.e.,a’s effect onXi—for each instantiation of its parents. This can be viewed... |

2 |
CU decision diagram package. Available from ftp://vlsi.colorado.edu/pub
- CUDD
- 1998
(Show Context)
Citation Context ...diagrams during the iterative procedure. In this way, only the variables relevant to the domain would be added. 5 Data and Results The procedure described above was implemented using the CUDD package =-=[20]-=-, a library of C routines which provides support for manipulation of ADDs. Experimental results described in this section were all obtained using a dualprocessor SUN SPARC Ultra 60 running at 300Mhz w... |