## Exploiting structure in policy construction (1995)

### Cached

### Download Links

Venue: | IJCAI-95, pp.1104–1111 |

Citations: | 231 - 22 self |

### BibTeX

@INPROCEEDINGS{Boutilier95exploitingstructure,

author = {Craig Boutilier and Richard Dearden and Moisés Goldszmidt},

title = {Exploiting structure in policy construction},

booktitle = {IJCAI-95, pp.1104–1111},

year = {1995}

}

### Years of Citing Articles

### OpenURL

### Abstract

Markov decision processes (MDPs) have recently been applied to the problem of modeling decisiontheoretic planning. While traditional methods for solving MDPs are often practical for small states spaces, their effectiveness for large AI planning problems is questionable. We present an algorithm, called structured policy iteration (SPI), that constructs optimal policies without explicit enumeration of the state space. The algorithm retains the fundamental computational steps of the commonly used modified policy iteration algorithm, but exploitsthe variable and propositionalindependencies reflected in a temporal Bayesian network representation of MDPs. The principles behind SPI can be applied to any structured representation of stochastic actions, policies and value functions, and the algorithm itself can be used in conjunction with recent approximation methods. 1

### Citations

1329 |
Markov Decision Processes: Discrete Stochastic Programming
- Puterman
- 1994
(Show Context)
Citation Context ... as network algorithms have proven practical for reasoning under uncertainty, we expect SPI to be quite useful in practice. 1 In Section 2 we briefly describe MDPs and the MPI algorithm; we refer to (=-=Puterman 1994-=-) for a more detailed description of MDPs and solution techniques. In Section 3 we discuss our representation of MDPs using decision trees, and in Section 4 we describe the structured policy iteration... |

563 | Learning to act using real-time dynamic programming - Barto, Bradtke, et al. - 1995 |

485 |
A model for reasoning about persistence and causation
- Dean, Kanazawa
- 1989
(Show Context)
Citation Context ...(SPI) which uses the same computational mechanism as MPI. As in (Boutilier and Dearden 1994), we assume a compact representation of an MDP, in this case using a "two-slice" temporal Bayesian=-= network (Dean and Kanazawa 1989-=-; Darwiche and Goldszmidt 1994) to represent the dependence between variables before and after the occurrence of an action. In addition, we use a structureddecision tree representation of the conditio... |

392 | Learning Decision Lists
- Rivest
- 1987
(Show Context)
Citation Context ...f propositional formulae f/ i g partitions the state space. This induces the "explicit" policy (s) = a i iff s j= / i . Structured policies can be represented in many ways (e.g., with decisi=-=on lists (Rivest 1987-=-)). We adopt a decision tree representation similar to the representation of probability matrices above. Leaves are labeled with the action to be performed given the partial assignment corresponding t... |

323 |
Planning and Control
- Dean, Wellman
- 1991
(Show Context)
Citation Context ...ch multiple potentially conflicting objectives must be traded against one another to determine optimal courses of action. For this reason, there has been much interest in decision theoretic plannings(=-=Dean and Wellman 1991-=-). In particular, the theory of Markov decision processes (MDPs) has found considerable popularity recently both as a conceptual and computational model for DTP (Dean et al. 1993; Boutilier and Dearde... |

305 | Probabilistic Horn abduction and Bayesian networks
- Poole
- 1993
(Show Context)
Citation Context ...e consider the variables in Tree(V i ) individually. More precisely, explanations are generated by a process we call abductive repartitioning, quite similar in spirit to probabilistic Horn abduction (=-=Poole 1993). A given-=- traversal of Tree(V i ) induces an ordering of relevant (post-action) variables; we "explain" variables in Tree(V i ) according to this order (Step 3 of Figure 3). For each variable X in Tr... |

144 | Planning with deadlines in stochastic domains
- Dean, Kaelbling, et al.
- 1993
(Show Context)
Citation Context ...oretic planning (Dean and Wellman 1991). In particular, the theory of Markov decision processes (MDPs) has found considerable popularity recently both as a conceptual and computational model for DTP (=-=Dean et al. 1993-=-; Boutilier and Dearden 1994; Tash and Russell 1994). While MDPs provide firm semantic foundations for much of DTP, the question of their computational utility for AI remains. Many robust methods for ... |

141 | Dynamic Probabilistic Systems - Howard - 1971 |

139 | Input generalization in delayed reinforcement learning: An algorithm and performance comparisons
- Chapman, Kaelbling
- 1991
(Show Context)
Citation Context ...In subsequent iterations the initial estimate is the computed value-tree for the previous policy. 5 We note that tree representations of policies are sometimes used in reinforcement learning as well (=-=Chapman and Kaelbling 1991-=-); however, the motivation there is somewhat different. In addition, the ordering of variables in the tree can have a dramatic impact on the size of the representation (see Section 5). Input: Tree(V i... |

112 | Dynamic programming and influence diagrams
- Tatman, Shachter
- 1990
(Show Context)
Citation Context ...idered once for each partition, rather than for each state. Since many problems seem to exhibit tremendous structure, we expect SPI and similar algorithms to be extremely useful in practice (see also =-=[13]-=- for who use influence diagrams to represent MDPs). We briefly describe MDPs and the MPI algorithm in Sect. 2, followed by our representation of MDPs in Sect. 3. We then describe the SPI algorithm in ... |

88 | An algorithm for probabilistic least-commitment planning - Kushmerick, Hanks, et al. - 1994 |

70 | Using abstractions for decision-theoretic planning with time constraints
- Boutilier, Dearden
- 1994
(Show Context)
Citation Context ...ean and Wellman 1991). In particular, the theory of Markov decision processes (MDPs) has found considerable popularity recently both as a conceptual and computational model for DTP (Dean et al. 1993; =-=Boutilier and Dearden 1994-=-; Tash and Russell 1994). While MDPs provide firm semantic foundations for much of DTP, the question of their computational utility for AI remains. Many robust methods for optimal policy construction ... |

56 |
Modified policy iteration algorithms for discounted markov decision problems
- Puterman, Shin
- 1978
(Show Context)
Citation Context ...uch as abstraction often require that one optimally solve a smaller problem. In this paper, we describe our investigations of a commonly used algorithm from OR called modified policy iterations(MPI) (=-=Puterman and Shin 1978-=-). We present a new algorithm called structured policy iteration (SPI) which uses the same computational mechanism as MPI. As in (Boutilier and Dearden 1994), we assume a compact representation of an ... |

46 |
Structuring conditional relationships in influence diagrams
- Smith, Holtzman, et al.
- 1993
(Show Context)
Citation Context ...ion tree representation of the conditional probability matrices quantifying the network to exploit “propositional” independence, that is, independence given a particular variable assignment (see also =-=[11]-=-). Such representations allow problems to be specified in a natural and concise fashion; and they have the added advantage of allowing problem structure to be easily identified. Using this representat... |

36 | Control strategies for a stochastic planner
- Tash, Russell
- 1994
(Show Context)
Citation Context ...rticular, the theory of Markov decision processes (MDPs) has found considerable popularity recently both as a conceptual and computational model for DTP (Dean et al. 1993; Boutilier and Dearden 1994; =-=Tash and Russell 1994-=-). While MDPs provide firm semantic foundations for much of DTP, the question of their computational utility for AI remains. Many robust methods for optimal policy construction have been developed in ... |

28 | Integrating planning and execution in stochastic domains
- Dearden, Boutilier
- 1994
(Show Context)
Citation Context ...e user to specify the problem. Much emphasis in DTP research has been placed on the issue of speeding up computation, and several solutions proposed, including local search methods (Dean et al. 1993; =-=Dearden and Boutilier 1994-=-; Barto, Bradtke and Singh 1995; Tash and Russell 1994) or reducing the state space via abstraction (Boutilier and Dearden 1994). Both approaches reduce the state space in a way that allows MDP soluti... |

27 |
Action networks: A framework for reasoning about actions and change under uncertainty
- Darwiche, Goldszmidt
- 1994
(Show Context)
Citation Context ...e computational mechanism as MPI. As in (Boutilier and Dearden 1994), we assume a compact representation of an MDP, in this case using a "two-slice" temporal Bayesian network (Dean and Kanaz=-=awa 1989; Darwiche and Goldszmidt 1994-=-) to represent the dependence between variables before and after the occurrence of an action. In addition, we use a structureddecision tree representation of the conditionalprobability matrices quanti... |

23 | Representing diagnostic knowledge for probabilistic horn abduction
- Poole
- 1991
(Show Context)
Citation Context ...cated by the branches (and the values in Tree(V 0 )). More precisely, explanations are generated by a process we call abductive repartitioning, quite similar in spirit to probabilistic Horn abduction =-=[8]-=-. The algorithm repeats the following procedure for each policy partition (with action a) in Tree( ). Tree(V i ) determines an ordering of the relevant (post-action) variables. We construct Tree(V i+1... |

6 |
Planning with deadlines in stochastic domains. AAAI-93
- Dean, Kaelbling, et al.
- 1993
(Show Context)
Citation Context ...oretic plannings(Dean and Wellman 1991). In particular, the theory of Markov decision processes (MDPs) has found considerable popularity recently both as a conceptual and computational model for DTP (=-=Dean et al. 1993-=-; Boutilier and Dearden 1994; Tash and Russell 1994). While MDPs provide firm semantic foundations for much of DTP, the question of their computational utility for AI remains. Many robust methods for ... |

2 | Exploiting structure in optimal policy construction - Boutilier, Dearden, et al. - 1994 |