Results 1 
3 of
3
Stochastic enforced hillclimbing
, 2008
"... Abstract Enforced hillclimbing is an effective deterministic hillclimbing technique that deals with local optima using breadthfirst search (a process called "basin flooding"). We propose and evaluate a stochastic generalization of enforced hillclimbing for online use in goaloriented ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
Abstract Enforced hillclimbing is an effective deterministic hillclimbing technique that deals with local optima using breadthfirst search (a process called "basin flooding"). We propose and evaluate a stochastic generalization of enforced hillclimbing for online use in goaloriented probabilistic planning problems. We assume a provided heuristic function estimating expected cost to the goal with flaws such as local optima and plateaus that thwart straightforward greedy action choice. While breadthfirst search is effective in exploring basins around local optima in deterministic problems, for stochastic problems we dynamically build and solve a heuristicbased Markov decision process (MDP) model of the basin in order to find a good escape policy exiting the local optimum. We note that building this model involves integrating the heuristic into the MDP problem because the local goal is to improve the heuristic. We evaluate our proposal in twentyfour recent probabilistic planningcompetition benchmark domains and twelve probabilistically interesting problems from recent literature. For evaluation, we show that stochastic enforced hillclimbing (SEH) produces better policies than greedy heuristic following for value/cost functions derived in two very different ways: one type derived by using deterministic heuristics on a deterministic relaxation and a second type derived by automatic learning of Bellmanerror features from domainspecific experience. Using the first type of heuristic, SEH is shown to generally outperform all planners from the first three international probabilistic planning competitions.
An Algorithmic Survey of Parametric Value Function Approximation
"... Reinforcement learning is a machine learning answer to the optimal control problem. It consists in learning an optimal control policy through interactions with the system to be controlled, the quality of this policy being quantified by the socalled value function. A recurrent subtopic of reinforce ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
Reinforcement learning is a machine learning answer to the optimal control problem. It consists in learning an optimal control policy through interactions with the system to be controlled, the quality of this policy being quantified by the socalled value function. A recurrent subtopic of reinforcement learning is to compute an approximation of this value function when the system is too large for an exact representation. This survey reviews stateoftheart methods for (parametric) value function approximation by grouping them into three main categories: bootstrapping, residual and projected fixedpoint approaches. Related algorithms are derived by considering one of the associated cost functions and a specific minimization method, generally a stochastic gradient descent or a recursive leastsquares approach.
Learning to Predict Action Outcomes in Continuous, Relational Environments
"... Abstract—We present a method for predicting action outcomes in unstructured environments with variable numbers of participants and hidden relationships between them. For example, when pouring flour from a cup into a mixing bowl, important relations must exist between the cup and the bowl. The actio ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract—We present a method for predicting action outcomes in unstructured environments with variable numbers of participants and hidden relationships between them. For example, when pouring flour from a cup into a mixing bowl, important relations must exist between the cup and the bowl. The action Pour(x, y) might depend on the precondition Above(x, y). How well the predicate Above actually predicts action success often depends on complicated world dynamics and perhaps other objects in the scene. While such predicates are commonly handcrafted, we present in this paper a method for learning physically grounded predicates directly from the continuous data. In this manner, an agent’s own developmental experience can drive its world representations. Here, we learn such representations as ensembles (or forests) of probability trees using the Spatiotemporal Multidimensional Relational Framework (SMRF). By reasoning