Results 1  10
of
11
Approximate Policy Iteration with a Policy Language Bias
 Journal of Artificial Intelligence Research
, 2003
"... We explore approximate policy iteration (API), replacing the usual costfunction learning step with a learning step in policy space. We give policylanguage biases that enable solution of very large relational Markov decision processes (MDPs) that no previous technique can solve. ..."
Abstract

Cited by 141 (18 self)
 Add to MetaCart
(Show Context)
We explore approximate policy iteration (API), replacing the usual costfunction learning step with a learning step in policy space. We give policylanguage biases that enable solution of very large relational Markov decision processes (MDPs) that no previous technique can solve.
Learning domainspecific control knowledge from random walks
 In Proceedings of the fourteenth international
, 2004
"... We describe and evaluate a system for learning domainspecific control knowledge. In particular, given a planning domain, the goal is to output a control policy that performs well on “long random walk ” problem distributions. The system is based on viewing planning domains as very large Markov decisi ..."
Abstract

Cited by 39 (4 self)
 Add to MetaCart
(Show Context)
We describe and evaluate a system for learning domainspecific control knowledge. In particular, given a planning domain, the goal is to output a control policy that performs well on “long random walk ” problem distributions. The system is based on viewing planning domains as very large Markov decision processes and then applying a recent variant of approximate policy iteration that is bootstrapped with a new technique based on random walks. We evaluate the system on the AIPS2000 planning domains (among others) and show that often the learned policies perform well on problems drawn from the long–randomwalk distribution. In addition, we show that these policies often perform well on the original problem distributions from the domains involved. Our evaluation also uncovers limitations of our current system that point to future challenges.
To Max or not to Max: Online Learning for Speeding Up Optimal Planning
, 2010
"... It is well known that there cannot be a single “best ” heuristic for optimal planning in general. One way of overcoming this is by combining admissible heuristics (e.g. by using their maximum), which requires computing numerous heuristic estimates at each state. However, there is a tradeoff between ..."
Abstract

Cited by 17 (7 self)
 Add to MetaCart
(Show Context)
It is well known that there cannot be a single “best ” heuristic for optimal planning in general. One way of overcoming this is by combining admissible heuristics (e.g. by using their maximum), which requires computing numerous heuristic estimates at each state. However, there is a tradeoff between the time spent on computing these heuristic estimates for each state, and the time saved by reducing the number of expanded states. We present a novel method that reduces the cost of combining admissible heuristics for optimal search, while maintaining its benefits. Based on an idealized search space model, we formulate a decision rule for choosing the best heuristic to compute at each state. We then present an active online learning approach for that decision rule, and employ the learned model to decide which heuristic to compute at each state. We evaluate this technique empirically, and show that it substantially outperforms each of the individual heuristics that were used, as well as their regular maximum.
Bootstrap Learning of Heuristic Functions
"... We investigate the use of machine learning to create effective heuristics for search algorithms such as IDA * or heuristicsearch planners. Our method aims to generate a strong heuristic from a given weak heuristic h0 through bootstrapping. The “easy ” problem instances that can be solved using h0 pr ..."
Abstract

Cited by 11 (2 self)
 Add to MetaCart
We investigate the use of machine learning to create effective heuristics for search algorithms such as IDA * or heuristicsearch planners. Our method aims to generate a strong heuristic from a given weak heuristic h0 through bootstrapping. The “easy ” problem instances that can be solved using h0 provide training examples for a learning algorithm that produces a heuristic h1 that is expected to be stronger than h0. If h0 is too weak to solve any of the given instances we use a random walk technique to create a sequence of successively more difficult instances starting with ones that are solvable by h0. The bootstrap process is then repeated using hi in lieu of hi−1 until a sufficiently strong heuristic is produced. We test our method on the 15 and 24sliding tile puzzles, the 17 and 24pancake puzzles, and the 15 and 20blocks world. In every case our method produces a heuristic that allows IDA * to solve randomly generated problem instances extremely quickly with solutions very close to optimal.
Online Speedup Learning for Optimal Planning
"... Domainindependent planning is one of the foundational areas in the field of Artificial Intelligence. A description of a planning task consists of an initial world state, a goal, and a set of actions for modifying the world state. The objective is to find a sequence of actions, that is, a plan, that ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
(Show Context)
Domainindependent planning is one of the foundational areas in the field of Artificial Intelligence. A description of a planning task consists of an initial world state, a goal, and a set of actions for modifying the world state. The objective is to find a sequence of actions, that is, a plan, that transforms the initial world state into a goal state. In optimal planning, we are interested in finding not just a plan, but one of the cheapest plans. A prominent approach to optimal planning these days is heuristic statespace search, guided by admissible heuristic functions. Numerous admissible heuristics have been developed, each with its own strengths and weaknesses, and it is well known that there is no single “best ” heuristic for optimal planning in general. Thus, which heuristic to choose for a given planning task is a difficult question. This difficulty can be avoided by combining several heuristics, but that requires computing numerous heuristic estimates at each state, and the tradeoff between the time spent doing so and the time saved by the combined advantages of the different heuristics might be high. We present a novel method that reduces the cost of combining admissible heuristics for optimal planning, while maintaining its benefits. Using an idealized search space model, we formulate a decision rule for choosing the best heuristic to compute at each state. We then present an active online learning approach for learning a classifier with that decision rule as the target concept, and employ the learned classifier to decide which heuristic to compute at each state. We evaluate this technique empirically, and show that it substantially outperforms the standard method for combining several heuristics via their pointwise maximum. 1.
Automatic move pruning in general singleplayer games
 In Proceedings of the 4th Symposium on Combinatorial Search (SoCS
, 2011
"... Move pruning is a lowoverhead technique for reducing the size of a depth first search tree. The existing algorithm for automatically discovering move pruning information is restricted to games where all moves can be applied to every state. This paper demonstrates an algorithm which handles a genera ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
(Show Context)
Move pruning is a lowoverhead technique for reducing the size of a depth first search tree. The existing algorithm for automatically discovering move pruning information is restricted to games where all moves can be applied to every state. This paper demonstrates an algorithm which handles a general class of single player games. It gives experimental results for our technique, demonstrating both the applicability to a range of games, and the reduction in search tree size. We also provide some conditions under which move pruning is safe, and when it may interfere with other search reduction techniques.
Proceedings, The Fourth International Symposium on Combinatorial Search (SoCS2011) Automatic Move Pruning in General SinglePlayer Games
"... Move pruning is a lowoverhead technique for reducing the size of a depth first search tree. The existing algorithm for automatically discovering move pruning information is restricted to games where all moves can be applied to every state. This paper demonstrates an algorithm which handles a genera ..."
Abstract
 Add to MetaCart
(Show Context)
Move pruning is a lowoverhead technique for reducing the size of a depth first search tree. The existing algorithm for automatically discovering move pruning information is restricted to games where all moves can be applied to every state. This paper demonstrates an algorithm which handles a general class of single player games. It gives experimental results for our technique, demonstrating both the applicability to a range of games, and the reduction in search tree size. We also provide some conditions under which move pruning is safe, and when it may interfere with other search reduction techniques.
Learning and Applying Competitive Strategies
"... Learning reusable sequences can support the development of expertise in many domains, either by improving decisionmaking quality or decreasing execution speed. This paper introduces and evaluates a method to learn action sequences for generalized states from prior problem experience. From experienc ..."
Abstract
 Add to MetaCart
(Show Context)
Learning reusable sequences can support the development of expertise in many domains, either by improving decisionmaking quality or decreasing execution speed. This paper introduces and evaluates a method to learn action sequences for generalized states from prior problem experience. From experienced sequences, the method induces the context that underlies a sequence of actions. Empirical results indicate that the sequences and contexts learned for a class of problems are actually those deemed important by experts for that particular class, and can be used to select appropriate action sequences when solving problems there. Repeated problem solving can provide salient, reusable data to a learner. This paper focuses on programs that acquire expertise in a particular domain. The thesis of our
Speedup Learning for Repairbased Search by Identifying Redundant Steps
, 2003
"... Repairbased search algorithms start with an initial solution and attempt to improve it by iteratively applying repair operators. Such algorithms can often handle largescale problems that may be difficult for systematic search algorithms. Nevertheless, the computational cost of solving such prob ..."
Abstract
 Add to MetaCart
(Show Context)
Repairbased search algorithms start with an initial solution and attempt to improve it by iteratively applying repair operators. Such algorithms can often handle largescale problems that may be difficult for systematic search algorithms. Nevertheless, the computational cost of solving such problems is still very high. We observed that many of the repair steps applied by such algorithms are redundant in the sense that they do not eventually contribute to finding a solution. Such redundant steps are particularly harmful in repairbased search, where each step carries high cost due to the very high branching factor typically associated with it.