## Planning with Noisy Probabilistic Relational Rules

### Cached

### Download Links

Citations: | 13 - 5 self |

### BibTeX

@MISC{Lang_planningwith,

author = {Tobias Lang and Marc Toussaint},

title = {Planning with Noisy Probabilistic Relational Rules},

year = {}

}

### OpenURL

### Abstract

Noisy probabilistic relational rules are a promising world model representation for several reasons. They are compact and generalize over world instantiations. They are usually interpretable and they can be learned effectively from the action experiences in complex worlds. We investigate reasoning with such rules in grounded relational domains. Our algorithms exploit the compactness of rules for efficient and flexible decision-theoretic planning. As a first approach, we combine these rules with the Upper Confidence Bounds applied to Trees (UCT) algorithm based on look-ahead trees. Our second approach converts these rules into a structured dynamic Bayesian network representation and predicts the effects of action sequences using approximate inference and beliefs over world states. We evaluate the effectiveness of our approaches for planning in a simulated complex 3D robot manipulation scenario with an articulated manipulator and realistic physics and in domains of the probabilistic planning competition. Empirical results show that our methods can solve problems where existing methods fail. 1.

### Citations

3827 | Reinforcement Learning: An Introduction
- Sutton, Barto
- 1998
(Show Context)
Citation Context ...function over states, which yields high reward for desirable states (for simplicity, here we assume rewards do not depend on actions). This is the approach taken in reinforcement learning formalisms (=-=Sutton & Barto, 1998-=-). Classical planning goals can easily be formalized with such a reward function. We cast the scenario of planning in a stochastic relational domain in a relational Markov decision process (RMDP) fram... |

914 |
An Introduction to Bayesian Networks
- Jensen
- 1996
(Show Context)
Citation Context ...the stochastic effects of action sequences in factored grounded relational world states. Therefore, we will briefly review Bayesian networks and their dynamic extension here. A Bayesian network (BN) (=-=Jensen, 1996-=-) is a compact representation of the joint probability distribution over a set of random variables X by means of a directed acyclic graph G. The nodes in G represent the random variables, while the ed... |

604 | The FF planning system: Fast plan generation through heuristic search
- Hoffmann, Nebel
(Show Context)
Citation Context ...ant of FF-Replan considers each probabilistic outcome of an action as a separate deterministic action, ignoring the respective probabilities. It then runs the deterministic Fast-Forward (FF) planner (=-=Hoffmann & Nebel, 2001-=-) on the determinized problem. FF uses a relaxation of the planning problem: it ignores the delete effects of actions and applies clever heuristics to prune the search space. FF-Replan outputs a seque... |

569 | Dynamic bayesian networks: representation, inference and learning
- Murphy
- 2002
(Show Context)
Citation Context ...few parents or their conditional probability functions have significant local structure. This will play a crucial role in our development of the graphical models for PRADA. 11Lang & Toussaint A DBN (=-=Murphy, 2002-=-) extends the BN formalism to model a dynamic system evolving over time. Usually, the focus is on discrete-time stochastic processes. The underlying system itself (in our case, a world state) is repre... |

424 | S.: Decision-theoretic planning: Structural assumptions and computational leverage
- Boutilier, Dean, et al.
- 1999
(Show Context)
Citation Context ...g domains. 3.3 Decision-Theoretic Planning The problem of decision-theoretic planning is to find actions a ∈ A in a given state s which are expected to maximize future rewards for states and actions (=-=Boutilier et al., 1999-=-). In classical planning, this reward is usually defined in terms of a clear-cut goal which is either fulfilled or not fulfilled in a state. This can be expressed by means of a logical formula φ. Typi... |

258 | An algorithm for probabilistic planning
- Kushmerick, Hanks, et al.
- 1995
(Show Context)
Citation Context ... transform the world state into one where the goal is fulfilled with a probability exceeding a given threshold θ, in the spirit of conformant planning or probabilistic planning with no observability (=-=Kushmerick, Hanks, & Weld, 1995-=-). Due to their outcome-sampling, SST and UCT cannot return such a plan in a straightforward way. Instead, they provide a policy for many successor states based on their estimates of the action-values... |

218 | Bandit based monte-carlo planning
- Kocsis, L, et al.
(Show Context)
Citation Context ... the grounded domain using learned NID rules, enabling fast planning in complex environments with varying goals. First, we apply the existing Upper Confidence bounds applied to Trees (UCT) algorithm (=-=Kocsis & Szepesvari, 2006-=-) with NID rules. In contrast to full-grown look-ahead trees, UCT samples actions selectively, thereby cutting suboptimal parts of the tree early. Second, we introduce the Probabilistic Relational Act... |

174 | A sparse sampling algorithm for near-optimal planning in large Markov decision processes
- Kearns, Mansour, et al.
- 2002
(Show Context)
Citation Context ...more general reward-maximization context which avoids explicitly dealing with uncertainty is to build look-ahead trees by sampling successor states. Two algorithms which follow this idea, namely SST (=-=Kearns, Mansour, & Ng, 2002-=-) and UCT (Kocsis & Szepesvari, 2006), are investigated in this paper. Another approach by Buffet and Aberdeen (2009) directly optimizes a parameterized policy using gradient descent. They factor the ... |

136 | Symbolic dynamic programming for first-order MDPs
- Boutilier, Reiter, et al.
- 2001
(Show Context)
Citation Context ...sical planning goals can easily be formalized with such a reward function. We cast the scenario of planning in a stochastic relational domain in a relational Markov decision process (RMDP) framework (=-=Boutilier et al., 2001-=-). We follow the notation of van Otterlo (2009) and define an RMDP as a 4-tuple (S, A, T, R). In contrast to enumerated state spaces, here the state space S has a relational structure defined by logic... |

129 |
Probabilistic Inference and Influence Diagrams
- Shachter
- 1988
(Show Context)
Citation Context ...imate inference techniques to propagate the effects of actions. In particular, we make three contributions with PRADA: (i) Following the idea of framing planning as a probabilistic inference problem (=-=Shachter, 1988-=-; Toussaint, Storkey, & Harmeling, 2010), we convert NID rules into a dynamic Bayesian network (DBN) representation. (ii) We derive an approximate inference method to cope with the state complexity of... |

114 | Approximate policy iteration with a policy language bias - Fern, Yoon, et al. - 2004 |

109 | Recent advances in ai planning
- Weld
- 1999
(Show Context)
Citation Context ... rules. When grounding an RMDP which specifies rewards only for a set of goal states, one might in principle apply any of the traditional A.I. planning methods used for propositional representations (=-=Weld, 1999-=-; Boutilier, Dean, & Hanks, 1999). Traditionally, planning is often cast as a search problem through a state and action space, restricting oneself to the portion of the state space that is considered ... |

101 |
Conscious Thought as Simulation of Behaviour and Perception
- Hesslow
- 2002
(Show Context)
Citation Context ...ligent mammals is driven by internal simulation or emulation: it has been found that motor structures in the cortex are activated during planning, while the execution of motor commands is suppressed (=-=Hesslow, 2002-=-; Grush, 2004). Probabilistic relational world model representations have received significant attention over the last years. They enable to generalize over object identities to unencountered situatio... |

96 |
Sound and efficient inference with probabilistic and deterministic dependencies
- Poon, Domingos
- 2006
(Show Context)
Citation Context ...nteresting open question whether there are ways to alternate between propagating deterministic information and running LBP on the remaining parts of the network, e.g., whether methods such as MC-SAT (=-=Poon & Domingos, 2007-=-) can be successfully applied in decision-making contexts as ours. In the next subsection, we propose a different approximate inference scheme using a factored frontier (FF). The FF algorithm describe... |

82 | Probabilistic planning in the Graphplan framework
- Blum, Langford
- 1999
(Show Context)
Citation Context ...ional model of the transition dynamics of probabilistic actions. Besides allowing for negative literals in the preconditions, NID rules extend probabilistic STRIPS operators (Kushmerick et al., 1995; =-=Blum & Langford, 1999-=-) by two special constructs, namely deictic references and noise outcomes, which are crucial for learning compact rule-sets. An alternative language to specify probabilistic relational planning proble... |

80 | Combining online and offline knowledge in UCT
- Gelly, Silver
- 2007
(Show Context)
Citation Context ...al number of episodes controls the accuracy of UCT’s estimates and has to be balanced with its overall running time. UCT has achieved remarkable results in challenging domains such as the game of Go (=-=Gelly & Silver, 2007-=-). To the best of our knowledge, we are the first to apply UCT for planning in stochastic relational domains, using NID rules as a generative model. We adapt UCT to cope with noise outcomes in the sam... |

77 | The computational complexity of probabilistic planning
- Littman, Goldsmith, et al.
- 1998
(Show Context)
Citation Context ...saint result in a world state s ′ with s ′ |= φ. In stochastic domains, however, the outcomes of actions are uncertain. Probabilistic planning is inherently harder than its deterministic counterpart (=-=Littman, Goldsmith, & Mundhenk, 1997-=-). In particular, achieving a goal state with certainty is typically unrealistic. Instead, one may define a lower bound θ on the probability for achieving a goal state. A second source of uncertainty ... |

75 |
A method for using belief networks as influence diagrams
- Cooper
- 1988
(Show Context)
Citation Context ...dicates in A. The reward gained in a state is represented by U and may depend only on a subset of the state variables. It is possible to express arbitrary reward expectations P (U | S) with binary U (=-=Cooper, 1988-=-). How can we define the transition dynamics using NID rules in this naive model? Assume we are given a set of fully abstract NID rules. We compute all groundings of these rules w.r.t. the objects of ... |

58 | Relational reinforcement learning - Dˇzeroski, Raedt, et al. |

52 | FF-replan: a baseline for probabilistic planning”, paper presented at
- Yoon, Fern, et al.
- 2007
(Show Context)
Citation Context ...astic worlds. A common approach for probabilistic planning, however, is to determinize the planning problem and apply deterministic planners (Kuter, Nau, Reisner, & Goldman, 2008). Indeed, FF-Replan (=-=Yoon, Fern, & Givan, 2007-=-) and its extension using hindsight optimization (Yoon, Fern, Givan, & Kambhampati, 2008) have shown impressive performance on many probabilistic planning competition domains. The common variant of FF... |

48 | The Factored Frontier Algorithm for Approximate Inference in DBNs
- Murphy, Weiss
- 2001
(Show Context)
Citation Context ... exploiting the factorization of NID rules. We focus on the mathematical derivations. An illustrative example will be provided in Sec. 5.4. We follow the idea of the factored frontier (FF) algorithm (=-=Murphy & Weiss, 2001-=-) and approximate the belief with a product of marginals: We define P (s t | a 0:t−1 ) ≈ ∏ P (s t i | a 0:t−1 ) . (21) i α(s t i) := P (s t i | a 0:t−1 ) and (22) α(s t ) := P (s t | a 0:t−1 ) ≈ N∏ α(... |

45 | Probabilistic inference for solving discrete and continuous state Markov decision processes
- Toussaint, Storkey
- 2006
(Show Context)
Citation Context ... inadequate for the type of stochastic relational rules that we use, for the same reasons why the naive DBN model which we will discuss in Sec. 5.1 is inappropriate. Planning by inference approaches (=-=Toussaint & Storkey, 2006-=-) spread information also backwards through DBNs and calculate posteriors over actions (resulting in policies over complete state spaces). How to use backward propagation or even full planning by infe... |

44 | Kaelbling. Learning symbolic models of stochastic domains
- Pasula, Zettlemoyer, et al.
(Show Context)
Citation Context ...e art in reasoning, planning, learning, perception, locomotion, and manipulation is so far removed from human-level abilities, that we cannot yet contemplate working in an actual domain of interest” (=-=Pasula, Zettlemoyer, & Kaelbling, 2007-=-). Performing common object manipulations is indeed a challenging task in the real world: we can choose from a very large number of distinct actions with uncertain outcomes and the number of possible ... |

40 | Graph kernels and gaussian processes for relational reinforcement learning
- Gartner, Driessens, et al.
- 2003
(Show Context)
Citation Context ...om experience. Q-function estimators include relational regression trees (Dˇzeroski et al., 2001) and instance-based regression using distance metrics between relational states such as graph kernels (=-=Driessens, Ramon, & Gärtner, 2006-=-). Model-free approaches enable planning for the specific problem type used in the training examples, e.g. on(X, Y ), and thus may be inappropriate in situations where the goals of the agent change qu... |

36 | Bellman goes relational - Kersting, Otterlo, et al. - 2004 |

27 | Probabilistic planning via determinization in hindsight
- Yoon, Fern, et al.
- 2008
(Show Context)
Citation Context ...determinize the planning problem and apply deterministic planners (Kuter, Nau, Reisner, & Goldman, 2008). Indeed, FF-Replan (Yoon, Fern, & Givan, 2007) and its extension using hindsight optimization (=-=Yoon, Fern, Givan, & Kambhampati, 2008-=-) have shown impressive performance on many probabilistic planning competition domains. The common variant of FF-Replan considers each probabilistic outcome of an action as a separate deterministic ac... |

24 | Envelope-based planning in relational MDPs
- Gardiol, Kaelbling
- 2003
(Show Context)
Citation Context ...th complementary methods that reduce the state and action space complexity in relational domains. For instance, one can focus on envelopes of states which are high-utility subsets of the state space (=-=Gardiol & Kaelbling, 2003-=-), one can ground the representation only with respect to relevant objects (Lang & Toussaint, 2009b), or one can exploit the equivalence of actions (Gardiol & Kaelbling, 2007), which is particularly u... |

24 | Probabilistic planning vs. replanning - Little, Thiébaux - 2007 |

21 | The factored policy-gradient planner - Buffet, Aberdeen |

20 | Practical solution techniques for first-order MDPs - Sanner, Boutilier - 2009 |

15 | Online learning and exploiting relational models in reinforcement learning
- Croonenborghs, Ramon, et al.
- 2007
(Show Context)
Citation Context ...rst learn a relational world model from the state transition experiences and then use this model for planning, for example in the form of relational probability trees for individual state attributes (=-=Croonenborghs, Ramon, Blockeel, & Bruynooghe, 2007-=-) or SVMs using graph kernels (Halbritter & Geibel, 2007). The stochastic relational NID rules of Pasula et al. (2007) are a particularly appealing action model representation, as it has been shown em... |

14 | A logic-based approach to dynamic programming - Hölldobler, Skvortsova - 2004 |

14 | Non-parametric policy gradients: a unified treatment of propositional and relational domains - Kersting, Driessens - 2008 |

12 | Goaldirected decision making in prefrontal cortex: a computational framework
- Botvinick, An
- 2008
(Show Context)
Citation Context ...contrast to modelfree approaches which compute policies directly from experience with respect to fixed goals (also called habit-based decision making), we follow a purposive decision-making approach (=-=Botvinick & An, 2009-=-) and use learned models to plan for the goal and current state at hand. In particular, we simulate the probabilistic effects of action sequences. This approach has interesting parallels in recent neu... |

11 | Approximate solution techniques for factored first-order MDPs - Sanner, Boutilier - 2007 |

10 | A Heuristic Search Algorithm for Solving FirstOrder MDPs - Karabaev, Skvortsova - 2005 |

10 | Approximate inference for planning in stochastic relational worlds
- Lang, Toussaint
- 2009
(Show Context)
Citation Context ...te policies for starting each action and sample trajectories to cope with probabilistic effects. Instead of sampling state transitions, we propose the planning algorithm PRADA in this paper (based on =-=Lang & Toussaint, 2009-=-a) which accounts for uncertainty in a principled way using approximate inference. Domshlak and Hoffmann (2007) propose an interesting planning approach which comes closest to our work. They introduce... |

9 | Using classical planners to solve nondeterministic planning problems
- Kuter, Nau, et al.
- 2008
(Show Context)
Citation Context ...mains and thus can’t be applied straightforwardly in stochastic worlds. A common approach for probabilistic planning, however, is to determinize the planning problem and apply deterministic planners (=-=Kuter, Nau, Reisner, & Goldman, 2008-=-). Indeed, FF-Replan (Yoon, Fern, & Givan, 2007) and its extension using hindsight optimization (Yoon, Fern, Givan, & Kambhampati, 2008) have shown impressive performance on many probabilistic plannin... |

8 | Probabilistic planning via heuristic forward search and weighted model counting - Domshlak, Hoffmann - 2007 |

8 | Action-space partitioning for planning
- Gardiol, Kaelbling
- 2007
(Show Context)
Citation Context ...bsets of the state space (Gardiol & Kaelbling, 2003), one can ground the representation only with respect to relevant objects (Lang & Toussaint, 2009b), or one can exploit the equivalence of actions (=-=Gardiol & Kaelbling, 2007-=-), which is particularly useful in combination with ignoring certain predicates and functions of the relational logic language (Gardiol & Kaelbling, 2008). 5Lang & Toussaint 3. Background In this sec... |

5 | Efficient learning of relational models for sequential decision making
- Walsh
- 2010
(Show Context)
Citation Context ... for autonomous agents. The problem of learning rule-sets is in general NP-hard, but efficiency guarantees on the sample complexity can be given for many learning subtasks with suitable restrictions (=-=Walsh, 2010-=-). Pasula et al. (2007) have proposed a supervised batch learning algorithm for complete NID rules. This algorithm learns the structure of rules as well as their parameters from experience triples (s,... |

5 | The State University of New - Rutgers |

4 | Relevance grounding for planning in relational domains
- Lang, Toussaint
(Show Context)
Citation Context ...te policies for starting each action and sample trajectories to cope with probabilistic effects. Instead of sampling state transitions, we propose the planning algorithm PRADA in this paper (based on =-=Lang & Toussaint, 2009-=-a) which accounts for uncertainty in a principled way using approximate inference. Domshlak and Hoffmann (2007) propose an interesting planning approach which comes closest to our work. They introduce... |

2 |
Adaptive envelope MDPs for relational equivalence-based planning
- Gardiol, Kaelbling
- 2008
(Show Context)
Citation Context ...or one can exploit the equivalence of actions (Gardiol & Kaelbling, 2007), which is particularly useful in combination with ignoring certain predicates and functions of the relational logic language (=-=Gardiol & Kaelbling, 2008-=-). 5Lang & Toussaint 3. Background In this section, we set up the theoretical background for the planning algorithms we will present in subsequent sections. First, we describe relational representati... |

2 | Generalized First-Order decision diagrams for First-Order Markov decision processes - Joshi, Kersting, et al. - 2009 |

2 | The Logic of Adaptive Behavior - Otterlo - 2009 |

2 | Stochastic enforced hill-climbing
- Wu, Kalyanam, et al.
- 2008
(Show Context)
Citation Context ...roblem to divide it into smaller manageable problems. HMDPP’s strategy is similar to the all-outcomes-determinization of FF-Replan, but accounts for the probability associated with each outcome. SEH (=-=Wu, Kalyanam, & Givan, 2008-=-) extends a heuristic function of FF-Replan to cope with local optima in plans by using stochastic enforced hill-climbing. A common approach to reasoning in a more general reward-maximization context ... |

1 | Exploiting first-order rgeression in inductive policy selection - Gretton, Thiébaux - 2004 |

1 |
Conscious thought as simulation of behaviour and perception
- Grush
- 2004
(Show Context)
Citation Context ...is driven by internal simulation or emulation: it has been found that motor structures in the cortex are activated during planning, while the execution of motor commands is suppressed (Hesslow, 2002; =-=Grush, 2004-=-). Probabilistic relational world model representations have received significant attention over the last years. They enable to generalize over object identities to unencountered situations and object... |

1 |
46 with Noisy Probabilistic Relational Rules
- Halbritter, Geibel
- 2007
(Show Context)
Citation Context ... use this model for planning, for example in the form of relational probability trees for individual state attributes (Croonenborghs, Ramon, Blockeel, & Bruynooghe, 2007) or SVMs using graph kernels (=-=Halbritter & Geibel, 2007-=-). The stochastic relational NID rules of Pasula et al. (2007) are a particularly appealing action model representation, as it has been shown empirically that they can learn the dynamics of complex en... |