Results 1  10
of
10
Learning domainspecific control knowledge from random walks
 In Proceedings of the fourteenth international
, 2004
"... We describe and evaluate a system for learning domainspecific control knowledge. In particular, given a planning domain, the goal is to output a control policy that performs well on “long random walk ” problem distributions. The system is based on viewing planning domains as very large Markov decisi ..."
Abstract

Cited by 32 (4 self)
 Add to MetaCart
We describe and evaluate a system for learning domainspecific control knowledge. In particular, given a planning domain, the goal is to output a control policy that performs well on “long random walk ” problem distributions. The system is based on viewing planning domains as very large Markov decision processes and then applying a recent variant of approximate policy iteration that is bootstrapped with a new technique based on random walks. We evaluate the system on the AIPS2000 planning domains (among others) and show that often the learned policies perform well on problems drawn from the long–randomwalk distribution. In addition, we show that these policies often perform well on the original problem distributions from the domains involved. Our evaluation also uncovers limitations of our current system that point to future challenges.
To Max or not to Max: Online Learning for Speeding Up Optimal Planning
, 2010
"... It is well known that there cannot be a single “best ” heuristic for optimal planning in general. One way of overcoming this is by combining admissible heuristics (e.g. by using their maximum), which requires computing numerous heuristic estimates at each state. However, there is a tradeoff between ..."
Abstract

Cited by 12 (7 self)
 Add to MetaCart
It is well known that there cannot be a single “best ” heuristic for optimal planning in general. One way of overcoming this is by combining admissible heuristics (e.g. by using their maximum), which requires computing numerous heuristic estimates at each state. However, there is a tradeoff between the time spent on computing these heuristic estimates for each state, and the time saved by reducing the number of expanded states. We present a novel method that reduces the cost of combining admissible heuristics for optimal search, while maintaining its benefits. Based on an idealized search space model, we formulate a decision rule for choosing the best heuristic to compute at each state. We then present an active online learning approach for that decision rule, and employ the learned model to decide which heuristic to compute at each state. We evaluate this technique empirically, and show that it substantially outperforms each of the individual heuristics that were used, as well as their regular maximum.
Bootstrap Learning of Heuristic Functions
"... We investigate the use of machine learning to create effective heuristics for search algorithms such as IDA * or heuristicsearch planners. Our method aims to generate a strong heuristic from a given weak heuristic h0 through bootstrapping. The “easy ” problem instances that can be solved using h0 pr ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
We investigate the use of machine learning to create effective heuristics for search algorithms such as IDA * or heuristicsearch planners. Our method aims to generate a strong heuristic from a given weak heuristic h0 through bootstrapping. The “easy ” problem instances that can be solved using h0 provide training examples for a learning algorithm that produces a heuristic h1 that is expected to be stronger than h0. If h0 is too weak to solve any of the given instances we use a random walk technique to create a sequence of successively more difficult instances starting with ones that are solvable by h0. The bootstrap process is then repeated using hi in lieu of hi−1 until a sufficiently strong heuristic is produced. We test our method on the 15 and 24sliding tile puzzles, the 17 and 24pancake puzzles, and the 15 and 20blocks world. In every case our method produces a heuristic that allows IDA * to solve randomly generated problem instances extremely quickly with solutions very close to optimal.
Automatic move pruning in general singleplayer games
 In Proceedings of the 4th Symposium on Combinatorial Search (SoCS
, 2011
"... Move pruning is a lowoverhead technique for reducing the size of a depth first search tree. The existing algorithm for automatically discovering move pruning information is restricted to games where all moves can be applied to every state. This paper demonstrates an algorithm which handles a genera ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
Move pruning is a lowoverhead technique for reducing the size of a depth first search tree. The existing algorithm for automatically discovering move pruning information is restricted to games where all moves can be applied to every state. This paper demonstrates an algorithm which handles a general class of single player games. It gives experimental results for our technique, demonstrating both the applicability to a range of games, and the reduction in search tree size. We also provide some conditions under which move pruning is safe, and when it may interfere with other search reduction techniques.
Online Speedup Learning for Optimal Planning
"... Domainindependent planning is one of the foundational areas in the field of Artificial Intelligence. A description of a planning task consists of an initial world state, a goal, and a set of actions for modifying the world state. The objective is to find a sequence of actions, that is, a plan, that ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Domainindependent planning is one of the foundational areas in the field of Artificial Intelligence. A description of a planning task consists of an initial world state, a goal, and a set of actions for modifying the world state. The objective is to find a sequence of actions, that is, a plan, that transforms the initial world state into a goal state. In optimal planning, we are interested in finding not just a plan, but one of the cheapest plans. A prominent approach to optimal planning these days is heuristic statespace search, guided by admissible heuristic functions. Numerous admissible heuristics have been developed, each with its own strengths and weaknesses, and it is well known that there is no single “best ” heuristic for optimal planning in general. Thus, which heuristic to choose for a given planning task is a difficult question. This difficulty can be avoided by combining several heuristics, but that requires computing numerous heuristic estimates at each state, and the tradeoff between the time spent doing so and the time saved by the combined advantages of the different heuristics might be high. We present a novel method that reduces the cost of combining admissible heuristics for optimal planning, while maintaining its benefits. Using an idealized search space model, we formulate a decision rule for choosing the best heuristic to compute at each state. We then present an active online learning approach for learning a classifier with that decision rule as the target concept, and employ the learned classifier to decide which heuristic to compute at each state. We evaluate this technique empirically, and show that it substantially outperforms the standard method for combining several heuristics via their pointwise maximum. 1.
Learning DomainSpecific Control Knowledge from Random Walks
 In ICAPS
, 2004
"... We describe and evaluate a system for learning domainspecific control knowledge. In particular, given a planning domain, the goal is to output a control policy that performs well on "long random walk" problem distributions. The system is based on viewing planning domains as very large Markov de ..."
Abstract
 Add to MetaCart
We describe and evaluate a system for learning domainspecific control knowledge. In particular, given a planning domain, the goal is to output a control policy that performs well on "long random walk" problem distributions. The system is based on viewing planning domains as very large Markov decision processes and then applying a recent variant of approximate policy iteration that is bootstrapped with a new technique based on random walks. We evaluate the system on the AIPS2000 planning domains (among others) and show that often the learned policies perform well on problems drawn from the longrandomwalk distribution. In addition, we show that these policies often perform well on the original problem distributions from the domains involved. Our evaluation also uncovers limitations of our current system that point to future challenges.
Proceedings, The Fourth International Symposium on Combinatorial Search (SoCS2011) Automatic Move Pruning in General SinglePlayer Games
"... Move pruning is a lowoverhead technique for reducing the size of a depth first search tree. The existing algorithm for automatically discovering move pruning information is restricted to games where all moves can be applied to every state. This paper demonstrates an algorithm which handles a genera ..."
Abstract
 Add to MetaCart
Move pruning is a lowoverhead technique for reducing the size of a depth first search tree. The existing algorithm for automatically discovering move pruning information is restricted to games where all moves can be applied to every state. This paper demonstrates an algorithm which handles a general class of single player games. It gives experimental results for our technique, demonstrating both the applicability to a range of games, and the reduction in search tree size. We also provide some conditions under which move pruning is safe, and when it may interfere with other search reduction techniques.
unknown title
"... (This is a sample cover image for this issue. The actual cover is not yet available at this time.) This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal noncommercial research and education use, including for instruction at the authors i ..."
Abstract
 Add to MetaCart
(This is a sample cover image for this issue. The actual cover is not yet available at this time.) This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal noncommercial research and education use, including for instruction at the authors institution and sharing with colleagues. Other uses, including reproduction and distribution, or selling or licensing copies, or posting to personal, institutional or third party websites are prohibited. In most cases authors are permitted to post their version of the article (e.g. in Word or Tex form) to their personal website or institutional repository. Authors requiring further information regarding Elsevier’s archiving and manuscript policies are encouraged to visit:
Proceedings of the TwentyFourth AAAI Conference on Artificial Intelligence (AAAI10) To Max or Not to Max: Online Learning for Speeding Up Optimal Planning ∗
"... It is well known that there cannot be a single “best ” heuristic for optimal planning in general. One way of overcoming this is by combining admissible heuristics (e.g. by using their maximum), which requires computing numerous heuristic estimates at each state. However, there is a tradeoff between ..."
Abstract
 Add to MetaCart
It is well known that there cannot be a single “best ” heuristic for optimal planning in general. One way of overcoming this is by combining admissible heuristics (e.g. by using their maximum), which requires computing numerous heuristic estimates at each state. However, there is a tradeoff between the time spent on computing these heuristic estimates for each state, and the time saved by reducing the number of expanded states. We present a novel method that reduces the cost of combining admissible heuristics for optimal search, while maintaining its benefits. Based on an idealized search space model, we formulate a decision rule for choosing the best heuristic to compute at each state. We then present an active online learning approach for that decision rule, and employ the learned model to decide which heuristic to compute at each state. We evaluate this technique empirically, and show that it substantially outperforms each of the individual heuristics that were used, as well as their regular maximum.