## Planning with Durative Actions in Stochastic Domains

### Cached

### Download Links

Citations: | 9 - 1 self |

### BibTeX

@MISC{Weld_planningwith,

author = {Daniel S. Weld},

title = {Planning with Durative Actions in Stochastic Domains},

year = {}

}

### OpenURL

### Abstract

Probabilistic planning problems are typically modeled as a Markov Decision Process (MDP). MDPs, while an otherwise expressive model, allow only for sequential, non-durative actions. This poses severe restrictions in modeling and solving a real world planning problem. We extend the MDP model to incorporate — 1) simultaneous action execution, 2) durative actions, and 3) stochatic durations. We develop several algorithms to combat the computational explosion introduced by these features. The key theoretical ideas used in building these algorithms are — modeling a complex problem as an MDP in extended state/action space, pruning of irrelevant actions, sampling of relevant actions, using informed heuristics to guide the search, hybridizing different planners to achieve benefits of both, approximating the problem and replanning. Our empirical evaluation illuminates the different merits in using various algorithms, viz., optimality, empirical closeness to optimality, theoretical error bounds, and speed. 1.

### Citations

1021 | Fast Planning through Plangraph Analysis
- Blum, Furst
- 1995
(Show Context)
Citation Context ...CTION We first discuss how to compute the sets of actions that can be executed in parallel since some actions may conflict with each other. We adopt the classical planning notion of mutual exclusion (=-=Blum & Furst, 1997-=-) and apply it to the factored action representation of probabilistic STRIPS. Two distinct actions are mutex (may not be executed concurrently) if in any state one of the following occurs: 1. they hav... |

559 | Learning to act using real-time dynamic programming
- Barto, Bradtke, et al.
- 1995
(Show Context)
Citation Context ...zation restricts search to the part of state space reachable from the initial state s0. Two algorithms exploiting this reachability analysis are LAO* (Hansen & Zilberstein, 2001) and our focus: RTDP (=-=Barto, Bradtke, & Singh, 1995-=-). RTDP, conceptually, is a lazy version of value iteration in which the states get updated in proportion to the frequency with which they are visited by the repeated executions of the greedy policy. ... |

499 |
Dynamic programming and optimal control. Athena Scientific
- Bertsekas
- 2007
(Show Context)
Citation Context ...gh, the methods proposed in the paper are applicable to all the variants of these models, for clarity of explanation we assume a particular formulation, known as the stochastic shortest path problem (=-=Bertsekas, 1995-=-). We define a Markov decision process (M) as a tuple 〈S, A, Ap, Pr, C, G, s0〉 in which • S is a finite set of discrete states. We use factored MDPs, i.e., S is compactly represented in terms of a set... |

483 | PDDL 2.1: An extension to PDDL for expressing temporal planning domains
- Fox, Long
- 2003
(Show Context)
Citation Context ...e 0 then all timepoints 0, 1, 2,. . ., 10 are pivots. If in a certain execution it finishes at time 4 then 4 (and 0) is a happening (for this execution). ✷ Definition 2 An action is a PDDL2.1 action (=-=Fox & Long, 2003-=-) if the following hold: • The effects are realized instantaneously either (at start) or (at end), i.e., at the beginning or the at the completion of the action (respectively). • The preconditions may... |

273 | An Algorithm for Probabilistic Planning - Kushmerick, Hanks, et al. - 1995 |

191 | SPUDD: Stochastic planning using decision diagrams
- Hoey, St-Aubin, et al.
- 1999
(Show Context)
Citation Context .... (Chen, Wah, & Hsu, 2006; Do & Kambhampati, 2003) are able to model actions that take time, and probabilistic planners like GPT, LAO*, SPUDD, etc. (Bonet & Geffner, 2005; Hansen & Zilberstein, 2001; =-=Hoey, St-Aubin, Hu, & Boutilier, 1999-=-) can deal with actions with probabilistic outcomes, etc. However, in order to apply automated planning to many real-world domains we must eliminate larger groups of the assumptions in concert. For ex... |

154 |
Decision theoretic planning: Structural assumptions and computational leverage
- Boutilier, Dean, et al.
- 1999
(Show Context)
Citation Context ...n for the future. In this work, we consider a special class of compact CoMDP – one that is defined naturally via a domain description very similar to the probabilistic STRIPS representation for MDPs (=-=Boutilier, Dean, & Hanks, 1999-=-). Given a domain encoded in probabilistic STRIPS we can compute a safe set of co-executable actions. Under this safe semantics, the probabilistic dynamics gets defined in a consistent way as we descr... |

152 | LAO*: A heuristic search algorithm that finds solutions with loops
- Hansen, Zilberstein
(Show Context)
Citation Context ...nners like SGPlan, SAPA, etc. (Chen, Wah, & Hsu, 2006; Do & Kambhampati, 2003) are able to model actions that take time, and probabilistic planners like GPT, LAO*, SPUDD, etc. (Bonet & Geffner, 2005; =-=Hansen & Zilberstein, 2001-=-; Hoey, St-Aubin, Hu, & Boutilier, 1999) can deal with actions with probabilistic outcomes, etc. However, in order to apply automated planning to many real-world domains we must eliminate larger group... |

126 |
Planning with Sharable Resource Constraints
- Laborie, Ghallab
- 1995
(Show Context)
Citation Context ..., since they consider only limited contingencies, their solutions are not guaranteed to reach the goal. IxTeT is a temporal planner that uses constraint based reasoning within partial order planning (=-=Laborie & Ghallab, 1995-=-). It embeds temporal properties of actions as constraints and does not optimize make-span. CIRCA is an example of a system that plans with uncertain durations where each action is associated with an ... |

125 | The 3rd international planning competition: Results and analysis
- Long, Fox
- 2003
(Show Context)
Citation Context ...model problems with durative actions. We tested our algorithms on problems in three domains. The first domain was a probabilistic variant of NASA Rover domain from the 2002 AIPS Planning Competition (=-=Long & Fox, 2003-=-), in which there are multiple objects to be photographed and various rocks to be tested with resulting data communicated back to the base station. Cameras need to be focused, and arms need to be posi... |

119 | Temporal planning with continuous change
- Penberthy, Weld
- 1994
(Show Context)
Citation Context ...oral planners may be classified as having one of two architectures: constraint-posting approaches in which the times of action execution are gradually constrained during planning (e.g., Zeno and LPG (=-=Penberthy & Weld, 1994-=-; Gerevini & Serina, 2002)) and extended state-space methods (e.g., TP4 and SAPA (Haslum & Geffner, 2001; Do & Kambhampati, 2001)). Theorem 5 holds for both architectures but has strong computational ... |

118 | A reinforcement learning approach to job-shop scheduling
- Zhang, Dietterich
- 1995
(Show Context)
Citation Context ...these are always backed-up 1. A similar action sampling approach was also used in the context of space shuttle scheduling to reduce the number of actions considered during value function computation (=-=Zhang & Dietterich, 1995-=-).sin a Bellman update. Other combinations are constructed by an incremental probabilistic process, which builds a combination by first randomly choosing an initial action (weighted by its individual ... |

110 | Labeled rtdp: Improving the convergence of real-time dynamic programming
- Bonet, Geffner
(Show Context)
Citation Context ...thermore, RTDP is not guaranteed to terminate. Labeled RTDP (LRTDP) fixes these problems with a clever labeling scheme that focuses attention on states where the value function has not yet converged (=-=Bonet & Geffner, 2003-=-). Labeled RTDP is guaranteed to terminate, and is guaranteed to converge to the ɛ-approximation of the optimal cost function (for states reachable using the optimal policy) if the initial cost functi... |

107 | Planning under continuous time and resource uncertainty: A challenge for AI
- Bresina, Dearden, et al.
- 2002
(Show Context)
Citation Context ...n concert. For example, NASA researchers note that optimal control for a NASA Mars rover requires reasoning about uncertain, concurrent, durative actions and a mixture of discrete and metric fluents (=-=Bresina, Dearden, Meuleau, Smith, & Washington, 2002-=-). While today’s planners can handle large problems with deterministic concurrent durative actions, and MDPs provide a clear framework for non-concurrent durative actions in the face of uncertainty, f... |

104 | LPG: A planner based on local search for planning graphs
- Gerevini, Serina
(Show Context)
Citation Context ...ssified as having one of two architectures: constraint-posting approaches in which the times of action execution are gradually constrained during planning (e.g., Zeno and LPG (Penberthy & Weld, 1994; =-=Gerevini & Serina, 2002-=-)) and extended state-space methods (e.g., TP4 and SAPA (Haslum & Geffner, 2001; Do & Kambhampati, 2001)). Theorem 5 holds for both architectures but has strong computational implications for state-sp... |

99 | Heuristic Planning with Time and Resources
- Haslum, Geffner
- 2001
(Show Context)
Citation Context ...ch the times of action execution are gradually constrained during planning (e.g., Zeno and LPG (Penberthy & Weld, 1994; Gerevini & Serina, 2002)) and extended state-space methods (e.g., TP4 and SAPA (=-=Haslum & Geffner, 2001-=-; Do & Kambhampati, 2001)). Theorem 5 holds for both architectures but has strong computational implications for state-space planners because limiting attention to a subset of decision epochs can spee... |

99 | World modeling for the dynamic construction of real-time control plans. A rtificiallntelligence
- Musliner, Durfee, et al.
- 1995
(Show Context)
Citation Context ...ies of actions as constraints and does not optimize make-span. CIRCA is an example of a system that plans with uncertain durations where each action is associated with an unweighted set of durations (=-=Musliner, Murphy, & Shin, 1991-=-). 7.5 Deterministic, Concurrent, Durative Actions Planning with deterministic actions is a comparitively simpler problem and much of the work in planning under uncertainty is based on the previous, d... |

83 | Sapa: A Domain-Independent Heuristic Metric Temporal Planner
- Do, Kambhampati
- 2001
(Show Context)
Citation Context ...xecution are gradually constrained during planning (e.g., Zeno and LPG (Penberthy & Weld, 1994; Gerevini & Serina, 2002)) and extended state-space methods (e.g., TP4 and SAPA (Haslum & Geffner, 2001; =-=Do & Kambhampati, 2001-=-)). Theorem 5 holds for both architectures but has strong computational implications for state-space planners because limiting attention to a subset of decision epochs can speed these planners. (The t... |

82 | Solving very large weakly coupled Markov Decision Processes
- Meuleau, Hauskrecht, et al.
- 1998
(Show Context)
Citation Context ... be represented as a set of smaller weakly coupled MDPs — the separate MDPs are completely independent except for some common resource constraints, and the reward and cost models are purely additive (=-=Meuleau, Hauskrecht, Kim, Peshkin, Kaelbling, Dean, & Boutilier, 1998-=-; Singh & Cohn, 1998). They describe solutions in which these sub-MDPs are independently solved and the sub-policies are merged to create a global policy. Thus, concurrency of actions of different sub... |

70 | Planning with resources and concurrency: A forward chaining approach
- Bacchus, Ady
- 2001
(Show Context)
Citation Context ...ng. Hence our search space reduces considerably. 5.1.2 INTERWOVEN EPOCH SEARCH SPACE We adapt the search space representation of Haslum and Geffner (2001), which is similar to that in other research (=-=Bacchus & Ady, 2001-=-; Do & Kambhampati, 2001). Our original state space S in Section 2 is augmented by including the set of actions currently executing and the times passed since they were started. Formally, let the new ... |

70 | Temporal planning using subgoal partitioning and resolution in SGPlan
- Chen, Wah, et al.
- 2006
(Show Context)
Citation Context ...t progress achieved by planning researchers has yielded new algorithms which relax, individually, many of the classical assumptions. For example, successful temporal planners like SGPlan, SAPA, etc. (=-=Chen, Wah, & Hsu, 2006-=-; Do & Kambhampati, 2003) are able to model actions that take time, and probabilistic planners like GPT, LAO*, SPUDD, etc. (Bonet & Geffner, 2005; Hansen & Zilberstein, 2001; Hoey, St-Aubin, Hu, & Bou... |

70 | Max-norm projections for factored MDPs
- Guestrin, Koller, et al.
- 2001
(Show Context)
Citation Context ...rmulation and expressing the value function as a linear combination of basis functions. By assuming that these basis functions depend only on a few agents, they are able to reduce the size of the LP (=-=Guestrin, Koller, & Parr, 2001-=-).s7.4 Stochastic, Non-concurrent, Durative Actions Many researchers have studied planning with stochastic, durative actions in absence of concurrency. For example, Foss and Onder (2005) use simple te... |

65 | How to dynamically merge markov decision processes
- Singh, Cohn
- 1998
(Show Context)
Citation Context ...MDPs are completely independent except for some common resource constraints, and the reward and cost models are purely additive (Meuleau, Hauskrecht, Kim, Peshkin, Kaelbling, Dean, & Boutilier, 1998; =-=Singh & Cohn, 1998-=-). They describe solutions in which these sub-MDPs are independently solved and the sub-policies are merged to create a global policy. Thus, concurrency of actions of different sub-MDPs is a by-produc... |

63 | Branching and pruning: An optimal temporal pocl planner based on constraint programming - Vidal, Geffner |

59 | OBDD-based universal planning for synchronized agents in non-deterministic domains - Jensen, Veloso |

57 | Exact solutions to timedependent MDPs - Boyan, Littman - 2001 |

44 | Taming numbers and durations in the model checking integrated planning system
- Edelkamp
(Show Context)
Citation Context ...lan (Haslum & Geffner, 2001; Do & Kambhampati, 2003; Bacchus & Ady, 2001). Other planners, like MIPS and AltAlt p , have also investigated fast generation of parallel plans in deterministic settings (=-=Edelkamp, 2003-=-; Nigenda & Kambhampati, 2003) and Jensen and Veloso (2000) extend it to problems with disjunctive uncertainty. 8. Future Work Having presented a comprehensive set of techniques to handle probabilisti... |

42 |
2004. Dynamic programming for structured continuous Markov decision problems
- Feng, Dearden, et al.
(Show Context)
Citation Context ...1 (Fox & Long, 2003), all old, deterministic state-space planners are incomplete. For the same reasons, our algorithms are 12. This idea has been exploited in order to plan with continuous resources (=-=Feng, Dearden, Meuleau, & Washington, 2004-=-).sincomplete for problems in PPDDL2.1. Recently, Cushing et al. has introduced Tempo, a state-space planner, which uses lifting over time in to achieve completeness (Cushing, Kambhampati, Mausam, & W... |

36 | Decisiontheoretic military operations planning
- Aberdeen, Thiebaux, et al.
- 2004
(Show Context)
Citation Context ...with concurrent, durative actions with deterministic durations in a specific military operations domain. They apply various domain-dependent heuristics to speed the search in an extended state space (=-=Aberdeen, Thiebaux, & Zhang, 2004-=-). 7.3 Concurrency and Stochastic, Non-durative Actions Meuleau et al. and Singh & Cohn deal with a special type of MDP (called a factorial MDP) that can be represented as a set of smaller weakly coup... |

33 | mGPT: A probabilistic planner based on heuristic search
- Bonet, Geffner
(Show Context)
Citation Context ...successful temporal planners like SGPlan, SAPA, etc. (Chen, Wah, & Hsu, 2006; Do & Kambhampati, 2003) are able to model actions that take time, and probabilistic planners like GPT, LAO*, SPUDD, etc. (=-=Bonet & Geffner, 2005-=-; Hansen & Zilberstein, 2001; Hoey, St-Aubin, Hu, & Boutilier, 1999) can deal with actions with probabilistic outcomes, etc. However, in order to apply automated planning to many real-world domains we... |

32 |
Incremental Contingency Planning
- Dearden, Meuleau, et al.
- 2003
(Show Context)
Citation Context ...e developed techniques for generating non-concurrent plans with uncertain continuous durations using a greedy algorithm which incrementally adds branches to a straightline plan (Bresina et al., 2002; =-=Dearden, Meuleau, Ramakrishnan, Smith, & Washington, 2003-=-). While they handle continuous variables and uncertain continuous effects, their solution is heuristic and the quality of their policies is unknown. Also, since they consider only limited contingenci... |

29 |
Sapa: A scalable multi-objective metric temporal planner
- Do, Kambhampati
(Show Context)
Citation Context ...lanning researchers has yielded new algorithms which relax, individually, many of the classical assumptions. For example, successful temporal planners like SGPlan, SAPA, etc. (Chen, Wah, & Hsu, 2006; =-=Do & Kambhampati, 2003-=-) are able to model actions that take time, and probabilistic planners like GPT, LAO*, SPUDD, etc. (Bonet & Geffner, 2005; Hansen & Zilberstein, 2001; Hoey, St-Aubin, Hu, & Boutilier, 1999) can deal w... |

26 | Prottle: A probabilistic temporal planner
- Little, Aberdeen, et al.
- 2005
(Show Context)
Citation Context ...ng computational implications for state-space planners because limiting attention to a subset of decision epochs can speed these planners. (The theorem also shows that planners like SAPA and Prottle (=-=Little, Aberdeen, & Thiebaux, 2005-=-) are incomplete.) Fortunately, an assumption restricts the set of decision epochs considerably. Definition 3 An action is a TGP-style action 4 if all of the following hold: • The effects are realized... |

23 | Decision-Theoretic planning with concurrent temporally extended actions - Rohanimanesh, Mahadevan - 2001 |

22 | Concurrent probabilistic temporal planning - Mausam, Weld - 2005 |

21 | Solving concurrent Markov decision processes
- Mausam, Weld
- 2004
(Show Context)
Citation Context ...hieve planning time. ∆DURarch exhibits a good balance of planning time and solution quality. 7. Related Work This paper extends our prior work, originally reported in several conference publications (=-=Mausam & Weld, 2004-=-, 2005, 2006a, 2006b). Temporal planners may be classified as using constraint-posting or extended state-space methods (discussed earlier in Section 4). While the constraint approach is promising, few... |

21 | Policy generation for continuous-time stochastic domains with concurrency
- Younes, Simmons
(Show Context)
Citation Context ...kov Decision Processes We define a new model, Concurrent MDP (CoMDP), which allows multiple actions to be executed in parallel. This model is different from semi-MDPs and generalized state semi-MDPs (=-=Younes & Simmons, 2004-=-b) in that it does not incorporate action durations explicitly. CoMDPs focus on adding concurrency in an MDP framework. The input to a CoMDP is slightly different from that of an MDP – 〈S, A, Ap �, Pr... |

17 |
Concurrent probabilistic planning in the graphplan framework
- Little, Thiebaux
- 2006
(Show Context)
Citation Context ...s in Meuleau et al.have no such guarantees. All of the work in Factorial MDPs assumes that a weak coupling exists and has been identified, but factoring an MDP is a hard problem in itself. Paragraph (=-=Little & Thiebaux, 2006-=-) formulates the planning with concurrency as a regression search over the probabilistic planning graph. It uses techniques like nogood learning and mutex reasoning to speed policy construction. Guest... |

17 | Solving generalized semi-Markov decision processes using continuous phase-type distributions
- Younes, Simmons
(Show Context)
Citation Context ...kov Decision Processes We define a new model, Concurrent MDP (CoMDP), which allows multiple actions to be executed in parallel. This model is different from semi-MDPs and generalized state semi-MDPs (=-=Younes & Simmons, 2004-=-b) in that it does not incorporate action durations explicitly. CoMDPs focus on adding concurrency in an MDP framework. The input to a CoMDP is slightly different from that of an MDP – 〈S, A, Ap �, Pr... |

14 | Challenges for temporal planning with uncertain durations - Mausam, Weld - 2006 |

13 | When is temporal planning really temporal
- Cushing, Kambhampati, et al.
- 2007
(Show Context)
Citation Context ...rden, Meuleau, & Washington, 2004).sincomplete for problems in PPDDL2.1. Recently, Cushing et al. has introduced Tempo, a state-space planner, which uses lifting over time in to achieve completeness (=-=Cushing, Kambhampati, Mausam, & Weld, 2007-=-). In pursuit of finding a complete, state-space, probabilistic planner for complex action models, a natural step is to consider a Tempo-like representation in a probabilistic setting. While working o... |

12 | A hybridized planner for stochastic domains - Mausam, P, et al. - 2007 |

12 |
Temporal Graphplan with mutual exclusion reasoning
- Smith, Weld
- 1999
(Show Context)
Citation Context ...ailed proof is discussed in the Appendix. ✷ In the case of deterministic durations, the set of happenings is same as the set of pivots; hence the following corollary holds: 4. While the original TGP (=-=Smith & Weld, 1999-=-) considered only deterministic actions of fixed duration, we use the phrase “TGP-style” in a more general way, without these restrictions.sProbabillity: 0.5 s0 s0 a1 Probability 0.5 Time a1 a0 a2 Mak... |

7 | Altalt-p: Online parallelization of plans with heuristic state search
- Nigenda, Kambhampati
- 2003
(Show Context)
Citation Context ...ffner, 2001; Do & Kambhampati, 2003; Bacchus & Ady, 2001). Other planners, like MIPS and AltAlt p , have also investigated fast generation of parallel plans in deterministic settings (Edelkamp, 2003; =-=Nigenda & Kambhampati, 2003-=-) and Jensen and Veloso (2000) extend it to problems with disjunctive uncertainty. 8. Future Work Having presented a comprehensive set of techniques to handle probabilistic outcomes, concurrent and du... |

6 | Generating temporally contingent plans - Foss, Onder - 2005 |

1 | Concurrent probabilistic temporal planning with policygradients
- Aberdeen, Buffet
- 2007
(Show Context)
Citation Context ...ptimizes a different objective function (probability of reaching a goal), outputs a finite-length conditional plan as opposed to a cyclic plan or policy, and is not guaranteed to reach the goal. FPG (=-=Aberdeen & Buffet, 2007-=-) learns a separate neural network for each action individually based on the current state. In the execution phase the decision, i.e., whether an action needs to be executed or not, is taken independe... |

1 |
Stochastic planning with concurrent, durative actions
- Mausam
- 2007
(Show Context)
Citation Context ...en state space during planning. The domains contained as many as 18 actions, and some actions had as many as 13 possible durations. For more details on the domains please refer to the longer version (=-=Mausam, 2007-=-).sPlanning Time (in sec) 6000 5000 4000 3000 2000 1000 0 Rover Machine-Shop ΔDUR Pruned prun ΔDURsamp Sampled ΔDUR hyb Hybrid ΔDUR exp Exp-Dur 21 22 23 24 25 26 27 28 29 30 Problems Figure 15: Planni... |