## Exploration in Relational Worlds

### Cached

### Download Links

Citations: | 3 - 0 self |

### BibTeX

@MISC{Lang_explorationin,

author = {Tobias Lang and Marc Toussaint and Kristian Kersting},

title = {Exploration in Relational Worlds},

year = {}

}

### OpenURL

### Abstract

Abstract. One of the key problems in model-based reinforcement learning is balancing exploration and exploitation. Another is learning and acting in large relational domains, in which there is a varying number of objects and relations between them. We provide a solution to exploring large relational Markov decision processes by developing relational extensions of the concepts of the Explicit Explore or Exploit (E 3) algorithm. A key insight is that the inherent generalization of learnt knowledge in the relational representation has profound implications also on the exploration strategy: what in a propositional setting would be considered a novel situation and worth exploration may in the relational setting be an instance of a well-known context in which exploitation is promising. Our experimental evaluation shows the effectiveness and benefit of relational exploration over several propositional benchmark approaches on noisy 3D simulated robot manipulation problems. 1

### Citations

561 | Active learning with statistical models
- Cohn, Ghahramani, et al.
- 1996
(Show Context)
Citation Context ...n a density estimation view. This is also inspired by the work on active learning which typically selects points that, according to some density model of previously seen points, are novel (see, e.g., =-=[4]-=- where the density model is an implicit mixture of Gaussians). In the following we first discuss different approaches to model a distribution of known states and actions in a relational setting. These... |

249 | Near-optimal reinforcement learning in polynomial time
- Kearns, Singh
(Show Context)
Citation Context ...ing its current world model. This exploration/exploitation tradeoff has received a lot of attention in propositional and continuous domains. Several powerful technique have been developed such as E 3 =-=[14]-=-, R max [3] and Bayesian reinforcement learning [19]. Another key problem in reinforcement learning is learning and acting in large relational domains, in which there is a varying number of objects an... |

248 | R-max – a general polynomial time algorithm for near-optimal reinforcement learning
- Brafman, Tennenholtz
(Show Context)
Citation Context ...ent world model. This exploration/exploitation tradeoff has received a lot of attention in propositional and continuous domains. Several powerful technique have been developed such as E 3 [14], R max =-=[3]-=- and Bayesian reinforcement learning [19]. Another key problem in reinforcement learning is learning and acting in large relational domains, in which there is a varying number of objects and relations... |

225 |
An Introduction to Statistical Relational Learning
- Getoor, Tasker
- 2007
(Show Context)
Citation Context ...ent learning is learning and acting in large relational domains, in which there is a varying number of objects and relations among them. Nowadays, relational approaches become more and more important =-=[9]-=-: information about one object can help the agent to reach conclusionsabout other, related objects. Such relational domains are hard – or even impossible – to represent meaningfully using an enumerat... |

143 |
The role of exploration in learning control
- Thrun
- 1992
(Show Context)
Citation Context ...is translates directly to a density estimate P (s) ∝ cE(s) , with cE(s) = ∑ (se,ae,s ′ e )∈E I(se = s) , (2) where cE(s) counts the number of occasions state s has been visited in E (in the spirit of =-=[22]-=-) and I(·) is the indicator function which is 1 if the argument evaluates to true and 0 otherwise. This density implies that all states with low P (s) are considered novel and should be explored, as i... |

96 | An analytic solution to discrete bayesian reinforcement learning
- Poupart, Vlassis, et al.
- 2006
(Show Context)
Citation Context ...tation tradeoff has received a lot of attention in propositional and continuous domains. Several powerful technique have been developed such as E 3 [14], R max [3] and Bayesian reinforcement learning =-=[19]-=-. Another key problem in reinforcement learning is learning and acting in large relational domains, in which there is a varying number of objects and relations among them. Nowadays, relational approac... |

71 | Efficient reinforcement learning in factored MDPs
- Kearns, Koller
- 1999
(Show Context)
Citation Context ...al exploration strategies. Before concluding, we present the results of our extensive experimental evaluation. 2 Related Work Several exploration approaches such as E 3 [14], R max [3] and extensions =-=[13, 10]-=- have been developed for propositional and continuous domains, i.e., assuming the environment to be representable as an enumerated or vector space. In recent years, there has been a growing interest i... |

58 |
Relational reinforcement learning
- Dˇzeroski, Raedt, et al.
(Show Context)
Citation Context ...t developed relational exploration strategies. Essentially, a number of relational regression algorithms have been developed for use in these relational RL systems such as relational regression trees =-=[8]-=- or graph kernels and Gaussian processes [7]. Kersting and Driessens [15] have proposed a relational policy gradient approach. These approaches use some form of ɛ-greedy strategy to handle exploration... |

51 | Kaelbling. Learning symbolic models of stochastic domains
- Pasula, Zettlemoyer, et al.
(Show Context)
Citation Context ... identities. The principle ideas of relational exploration we develop in this paper work with any type of relational model. In this paper, however, we employ noisy indeterministic deictic (NID) rules =-=[18]-=- to illustrate and empirically evaluate our ideas. A NID rule r is given as ⎧ ar(X ) : φr(X ) → ⎪⎨ ⎪⎩ pr,1 : Ωr,1(X ) pr,mr pr,0 . , (1) : Ωr,mr (X ) : Ωr,0where X is a set of logic variables in the ... |

50 | Curriculum learning
- Bengio, Louradour, et al.
(Show Context)
Citation Context .... Beyond that, again in contrast to the propositional explorers, the relational explorers are able to transfer the learned knowledge from simple to difficult tasks in the sense of curriculum learning =-=[1]-=-, answering the second part of question Q3. To see that, one has to compare the results of round 4 (where the second task of piling two objects in worlds of boxes is given the first time) with the res... |

43 | Graph kernels and gaussian processes for relational reinforcement learning
- Driessens, Ramon, et al.
- 2006
(Show Context)
Citation Context ...s. Essentially, a number of relational regression algorithms have been developed for use in these relational RL systems such as relational regression trees [8] or graph kernels and Gaussian processes =-=[7]-=-. Kersting and Driessens [15] have proposed a relational policy gradient approach. These approaches use some form of ɛ-greedy strategy to handle explorations; no special attention has been paid to the... |

30 | Integrating guidance into relational reinforcement learning
- Driessens, Dzeroski
- 2004
(Show Context)
Citation Context ... approaches use some form of ɛ-greedy strategy to handle explorations; no special attention has been paid to the exploration-exploitation problem as done in the current paper. Driessens and Dˇzeroski =-=[6]-=- have proposed the use of “reasonable policies” to provide guidance, i.e., to increase the chance to discover sparse rewards in large relational state spaces. This is orthogonal to exploration. Ramon ... |

19 | Algorithm-directed exploration for model-based reinforcement learning in factored MDPs
- Guestrin, Patrascu, et al.
- 2002
(Show Context)
Citation Context ...al exploration strategies. Before concluding, we present the results of our extensive experimental evaluation. 2 Related Work Several exploration approaches such as E 3 [14], R max [3] and extensions =-=[13, 10]-=- have been developed for propositional and continuous domains, i.e., assuming the environment to be representable as an enumerated or vector space. In recent years, there has been a growing interest i... |

16 | Online learning and exploiting relational models in reinforcement learning
- Croonenborghs, Ramon, et al.
- 2007
(Show Context)
Citation Context ...ional Qlearner to transfer knowledge from one task to another. They, however, do not learn a model of the domain and, again, relational exploration strategies were not developed. Croonenborghs et al. =-=[5]-=- learn a relational world model online and additionally use lookahead trees to give the agent more informed Q-values by looking some steps into the future when selecting an action. Exploration is base... |

14 | Non-parametric policy gradients: a unified treatment of propositional and relational domains
- Kersting, Driessens
- 2008
(Show Context)
Citation Context ...relational regression algorithms have been developed for use in these relational RL systems such as relational regression trees [8] or graph kernels and Gaussian processes [7]. Kersting and Driessens =-=[15]-=- have proposed a relational policy gradient approach. These approaches use some form of ɛ-greedy strategy to handle explorations; no special attention has been paid to the exploration-exploitation pro... |

12 | Transfer Learning in Reinforcement Learning Problems through Partial Policy Recycling
- Ramon, Driessens, et al.
- 2007
(Show Context)
Citation Context ...roposed the use of “reasonable policies” to provide guidance, i.e., to increase the chance to discover sparse rewards in large relational state spaces. This is orthogonal to exploration. Ramon et al. =-=[20]-=- presented an incremental relational regression tree algorithm that is capable of dealing with concept drift and showed that it enables a relational Qlearner to transfer knowledge from one task to ano... |

11 | Self-taught decision theoretic planning with first order decision diagrams
- Joshi, Kersting, et al.
- 2010
(Show Context)
Citation Context ...ramming approaches for solving relational MDPs, see e.g. [2, 21]. In contrast to the current paper, however, they assume a given model of the world. Recently, Lang and Toussaint [17] and Joshi et al. =-=[12]-=- have shown that successful planning typically involves only a small subset of relevant objects respectively states and how to make use of this fact to speed up symbolic dynamic programming significan... |

11 | Approximate inference for planning in stochastic relational worlds
- Lang, Toussaint
(Show Context)
Citation Context ...ploit, discussed in detail below). By developing a similar family of strategies for the relational case and integrating it into the state-of-the-art model-based relational reinforcement learner PRADA =-=[16]-=-, we provide a practical solution to the exploration problem in relational worlds. Based on actively generated training trajectories, the exploration strategy and the relational planner together produ... |

6 |
Practical solution techniques for first order mdps. AIJ 173:748–788
- Sanner, Boutilier
- 2009
(Show Context)
Citation Context ...ain models and use them online to adapt our relational exploration strategies. There is also an increasing number of (approximate) dynamic programming approaches for solving relational MDPs, see e.g. =-=[2, 21]-=-. In contrast to the current paper, however, they assume a given model of the world. Recently, Lang and Toussaint [17] and Joshi et al. [12] have shown that successful planning typically involves only... |

5 | Efficient learning of relational models for sequential decision making
- Walsh
- 2010
(Show Context)
Citation Context ...d trees to give the agent more informed Q-values by looking some steps into the future when selecting an action. Exploration is based on sampling random actions instead of informed exploration. Walsh =-=[23]-=- provides the first principled investigation into the exploration-exploitation tradeoff in relational domains and establishes sample complexity bounds for specific relational MDP learning subproblems.... |

4 | Learning models of relational MDPs using graph kernels
- Halbritter, Geibel
- 2007
(Show Context)
Citation Context ...ensity estimation (in the sense of 1-class SVMs) which, when applied in our context, would readily imply alternative notions of novelty and thereby exploration strategies. To give an example, [7] and =-=[11]-=- present relational reinforcement learning approaches which use relational graph kernels to estimate the similarity of relational states. Applying such a method to model P (s) from E would imply that ... |

4 | Relevance grounding for planning in relational domains
- Lang, Toussaint
(Show Context)
Citation Context ...roximate) dynamic programming approaches for solving relational MDPs, see e.g. [2, 21]. In contrast to the current paper, however, they assume a given model of the world. Recently, Lang and Toussaint =-=[17]-=- and Joshi et al. [12] have shown that successful planning typically involves only a small subset of relevant objects respectively states and how to make use of this fact to speed up symbolic dynamic ... |