## Model Minimization in Markov Decision Processes (1997)

Venue: | In Proceedings of the Fourteenth National Conference on Artificial Intelligence |

Citations: | 109 - 7 self |

### BibTeX

@INPROCEEDINGS{Dean97modelminimization,

author = {Thomas Dean and Robert Givan},

title = {Model Minimization in Markov Decision Processes},

booktitle = {In Proceedings of the Fourteenth National Conference on Artificial Intelligence},

year = {1997},

pages = {106--111},

publisher = {AAAI}

}

### Years of Citing Articles

### OpenURL

### Abstract

We use the notion of stochastic bisimulation homogeneity to analyze planning problems represented as Markov decision processes (MDPs). Informally, a partition of the state space for an MDP is said to be homogeneous if for each action, states in the same block have the same probability of being carried to each other block. We provide an algorithm for finding the coarsest homogeneous refinement of any partition of the state space of an MDP. The resulting partition can be used to construct a reduced MDP which is minimal in a well defined sense and can be used to solve the original MDP. Our algorithm is an adaptation of known automata minimization algorithms, and is designed to operate naturally on factored or implicit representations in which the full state space is never explicitly enumerated. We show that simple variations on this algorithm are equivalent or closely similar to several different recently published algorithms for finding optimal solutions to (partially ...

### Citations

7413 |
Probabilistic reasoning in intelligent systems: Networks of plausible inference
- Pearl
- 1988
(Show Context)
Citation Context ...hod of implicit representation which is well suited to MDPs and then use this as a basis for our discussion. Factored Representations In the remainder of this paper, we make use of Bayesian networks (=-=Pearl 1988-=-) to encode implicit (or factored) representations; however, our methods apply to other factored representations such as probabilistic STRIPS operators (Kushmerick, Hanks, & Weld 1995). Let X = fX 1 ;... |

829 |
Finite Markov Chains
- Kemeny, Snell
- 1960
(Show Context)
Citation Context ...on 1 Stochastic bisimulation homogeneity is closely related to the substitution property for finite automata developed by Hartmanis and Stearns (1966) and the notion of lumpabilitysfor Markov chains (=-=Kemeny & Snell 1960-=-). consisting of two blocks of states: those that satisfy the goal and those that do not. In solving an MDP, we distinguish states that differ on the basis of reward. Given the distinctions implied by... |

622 |
Markov Decision Processes
- Puterman
- 1994
(Show Context)
Citation Context ... given policy: Vs(p) = R((p); p) + fl X q2Q f pq ((p))Vs(q) where fl is the discount rate, 0sfl ! 1, and we assume for simplicity that the objective function is expected discounted cumulative reward (=-=Puterman 1994-=-). Let P = fB 1 ; : : : ; Bn g be a partition of Q. P has the property of stochastic bisimulation homogeneity with respect to M if and only if for each B i ; B j 2 P , for each ff 2 A, for each p; q 2... |

370 |
Influence diagrams
- Howard, Matheson
- 1984
(Show Context)
Citation Context ... we do in this paper following (Boutilier, Dearden, & Goldszmidt 1995). We enhance the 2TBN representation to include actions and reward functions; the resulting graph is called an influence diagrams(=-=Howard & Matheson 1984-=-). Figure 2 illustrates a factored representation with three state variables, X = fA; B; Cg, and describes the transition probabilities and rewards for one action. The factored form of the transition ... |

273 | An algorithm for probabilistic planning - Kushmerick, Hanks, et al. - 1995 |

236 | The parti-game algorithm for variable resolution reinforcement learning in multidimensional statespaces
- Moore, Atkeson
- 1995
(Show Context)
Citation Context ...the application of a standard RL or MDP algorithm to the reduced model. We suspect that the rest of their algorithms as well as other RL and MDP algorithms for handling multidimensional state spaces (=-=Moore 1993-=-; Tsitsiklis & Van Roy 1996) can be profitably analyzed in terms of model reduction. Partially Observable MDPs The simplest way of using model reduction techniques to solve partially observable MDPs (... |

204 | A survey of partially observable Markov decision processes: Theory, models, and algorithms - Monahan - 1982 |

198 | Learning and sequential decision making - Barto, Sutton, et al. - 1990 |

170 | Planning under time constraints in stochastic domains - Dean, Kaelbling, et al. - 1995 |

116 | Computing optimal policies for partially observable markov decision processes using compact representations
- Coutilier, Poole
- 1996
(Show Context)
Citation Context ... to the resulting reduced model. We suspect that some existing POMDP algorithms can be partially understood in such terms. In particular, we conjecture that the factored POMDP algorithm described in (=-=Boutilier & Poole 1996-=-) is asymptotically equivalent to minimizing the underlying MDP and then using Monahan's (1982) POMDP algorithm. Conclusion This paper is primarily concerned with introducing the method of model minim... |

108 | Algebraic Structure Theory of Sequential Machines - Hartmanis, Stearns - 1966 |

91 | Planning under uncertainty: structural assumptions and computational leverage - Boutilier, Dean, et al. - 1995 |

61 | Online minimization of transition systems - LEE, YANNAKAKIS - 1992 |

50 | Explanation-based learning and reinforcement learning: A unified view - Dietterich, Flann - 1995 |

24 |
A model for reasoning about persistence and causation. Computational Intelligence 5:142–150
- Dean, Kanazawa
- 1989
(Show Context)
Citation Context ... also as fluents. The state at time t is now represented as a vector X t = hX 1;t ; : : : ; X m;t i where X i;t denotes the ith state variable at time t. A two-stage temporal Bayesian network (2TBN) (=-=Dean & Kanazawa 1989-=-) is a directed acyclic graph consisting of two sets of variables fX i;t g and fX i;t+1 g in which directed arcs indicating dependence are allowed from the variables in the first set to variables in t... |

17 |
Using abstractions for decision theoretic planning with time constraints. AAAI-94
- Boutilier, Dearden
- 1994
(Show Context)
Citation Context ...ize our arguments; hence, the arguments provided in this paper are only sketches of the formal arguments provided in the longer version of this paper. State-Space Abstraction State-space abstraction (=-=Boutilier & Dearden 1994-=-) is a means of solving a factored MDP by generating an equivalent reduced MDP by determining with a superficial analysis which fluents' values are necessarily irrelevant to the solution. The reduced ... |

7 | Minimal state graph generation. Science of Computer Programming - Bouajjani, Fernandez, et al. - 1992 |