## Solving Concurrent Markov Decision Processes (2004)

### Cached

### Download Links

- [www.cs.washington.edu]
- [ai.cs.washington.edu]
- [homes.cs.washington.edu]
- [homes.cs.washington.edu]
- [www.aaai.org]
- DBLP

### Other Repositories/Bibliography

Citations: | 20 - 2 self |

### BibTeX

@MISC{Mausam04solvingconcurrent,

author = {Mausam and Daniel S. Weld},

title = {Solving Concurrent Markov Decision Processes},

year = {2004}

}

### OpenURL

### Abstract

Typically, Markov decision problems (MDPs) assume a single action is executed per decision epoch, but in the real world one may frequently execute certain actions in parallel. This paper explores concurrent MDPs, MDPs which allow multiple non-conflicting actions to be executed simultaneously, and presents two new algorithms. Our first approach exploits two provably sound pruning rules, and thus guarantees solution optimality. Our second technique is a fast, samplingbased algorithm, which produces close-to-optimal solutions extremely quickly. Experiments show that our approaches outperform the existing algorithms producing up to two orders of magnitude speedup.

### Citations

955 | Fast planning through planning graph analysis
- Blum, Furst
- 1997
(Show Context)
Citation Context ...wing multiple parallel actions, each of unit duration, requires several changes. Clearly, certain actions can’t be executed in parallel; so we adopt the classical planning notion of mutual exclusion (=-=Blum & Furst 1997-=-) and apply it to a factored action representation: probabilistic STRIPS (Boutilier, Dean, & Hanks 1999). Two actions are mutex (may not be executed concurrently) if in any state 1) they have inconsis... |

526 | Learning to act using real-time dynamic programming - Barto, Bradtke, et al. - 1995 |

456 |
Dynamic Programming and Optimal Control,” Athena Scientific
- Bertsekas
- 1995
(Show Context)
Citation Context ...teration. In contrast, our second rule, combo-elimination, prunes irrelevant combinations altogether. 718 UNCERTAINTYCombo Elimination: We adapt the action elimination theorem from traditional MDPs (=-=Bertsekas 1995-=-) to prove a similar theorem for concurrent MDPs. Theorem 3 Let A be an action combination which is applicable in state s. Let ⌊Q ∗ (s, A)⌋ denote a lower bound of Q ∗ (s, A). If ⌊Q ∗ (s, A)⌋ > ⌈J ∗ (... |

147 | Decision theoretic planning: Structural assumptions and computational leverage - Boutilier, Dean, et al. - 1999 |

143 | Multiagent Planning with Factored MDPs - Guestrin, Koller, et al. - 2001 |

143 | LAO*: A heuristic search algorithm that finds solutions with loops - Hansen, Zilberstein - 2001 |

102 | Planning under continuous time and resource uncertainty: A challenge for AI
- Bresina, Dearden, et al.
- 2002
(Show Context)
Citation Context ... individually, many of the classical assumptions. However, in order to apply automated planning to many real-world domains we must eliminate larger groups of the assumptions in concert. For example, (=-=Bresina et al. 2002-=-) notes that optimal control for a NASA Mars rover requires reasoning about uncertain, concurrent, durative actions and a mixture of discrete and metric fluents. While today’s planners can handle larg... |

101 | Labeled RTDP: Improving the convergence of real-time dynamic programming - Bonet, Geffner - 2003 |

80 | Solving very large weakly coupled Markov decision processes
- Meuleau, Hauskrecht, et al.
- 1998
(Show Context)
Citation Context ...s inferior when sampling only a few combinations, it quickly approaches the optimal on increasing the number of samples. In all other experiments we sample 40 combinations per state. 7. Related Work (=-=Meuleau et al. 1998-=-) and (Singh & Cohn 1998) deal with a special type of MDP (called a factorial MDP) 5 that can be represented as a set of smaller weakly coupled MDPs — the separate MDPs are completely independent exce... |

62 | How to dynamically merge Markov decision processes
- Singh, Cohn
- 1998
(Show Context)
Citation Context ...nly a few combinations, it quickly approaches the optimal on increasing the number of samples. In all other experiments we sample 40 combinations per state. 7. Related Work (Meuleau et al. 1998) and (=-=Singh & Cohn 1998-=-) deal with a special type of MDP (called a factorial MDP) 5 that can be represented as a set of smaller weakly coupled MDPs — the separate MDPs are completely independent except for some common resou... |

43 | Taming numbers and durations in the model checking integrated planning system - Edelkamp |

29 |
2003), Incremental contingency planning
- Dearden, Meuleau, et al.
(Show Context)
Citation Context ...dds branches to a straight-line plan. While their work is more general than ours, their solution is heuristic and it is unclear how closely their policies approximate optimality (Bresina et al. 2002; =-=Dearden et al. 2003-=-). It would be exciting to combine their methods with ours, perhaps by using their heuristic to guide S-RTDP. Recently, Younes and Simmons (2004) have developed a generic test and debug approach which... |

23 | Decision-theoretic planning with concurrent temporally extended actions
- Rohanimanesh, Mahadevan
- 2001
(Show Context)
Citation Context ...ntations in the context of multiagent planning. MDP is a hard problem in itself. In contrast, our algorithm can handle strongly coupled MDPs and does not require any sub-task decomposition as input. (=-=Rohanimanesh & Mahadevan 2001-=-) investigate a special class of semi-MDPs in which the action space can be partitioned by (possibly concurrent) Markov options. They propose an algorithm based on value-iteration, but their focus is ... |

21 | Policy generation for continuous-time stochastic domains with concurrency - Younes, Simmons |

6 | Altalt-p: Online parallelization of plans with heuristic state search
- Nigenda, Kambhampati
- 2003
(Show Context)
Citation Context ...p builds greedy parallelisations within the state space heuristic regression search coupled with pushing up the current actions if they can be parallelised with some earlier nodes of the search tree (=-=Nigenda & Kambhampati 2003-=-). Unfortunately, its heuristics draw heavily from planning graph constructions that have not been as effective 6 In their model, two options, oa and ob, may not be executed concurrently if there exis... |