• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Labeled RTDP: Improving the convergence of real-time dynamic programming. (2003)

by B Bonet, H Geffner
Venue:In ICAPS,
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 132
Next 10 →

Faster Heuristic Search Algorithms for Planning with Uncertainty and Full Feedback

by Blai Bonet, Hector Geffner - Proc. 18th International Joint Conf. on Artificial Intelligence , 2003
"... Recent algorithms like RTDP and LAO* combine the strength of Heuristic Search (HS) and Dynamic Programming (DP) methods by exploiting knowledge of the initial state and an admissible heuristic function for producing optimal policies without evaluating the entire space. In this paper, we introdu ..."
Abstract - Cited by 54 (7 self) - Add to MetaCart
Recent algorithms like RTDP and LAO* combine the strength of Heuristic Search (HS) and Dynamic Programming (DP) methods by exploiting knowledge of the initial state and an admissible heuristic function for producing optimal policies without evaluating the entire space. In this paper, we introduce and analyze three new HS/DP algorithms.

Exploiting First-Order Regression in Inductive Policy Selection

by Charles Gretton, Sylvie Thiébaux - Proceedings of the Twentieth Conference on Uncertainty in Artificial Intelligence (UAI’04 , 2004
"... We consider the problem of computing optimal generalised policies for relational Markov decision processes. We describe an approach combining some of the benefits of purely inductive techniques with those of symbolic dynamic programming methods. The latter reason about the optimal value function usi ..."
Abstract - Cited by 47 (2 self) - Add to MetaCart
We consider the problem of computing optimal generalised policies for relational Markov decision processes. We describe an approach combining some of the benefits of purely inductive techniques with those of symbolic dynamic programming methods. The latter reason about the optimal value function using first-order decisiontheoretic regression and formula rewriting, while the former, when provided with a suitable hypotheses language, are capable of generalising value functions or policies for small instances. Our idea is to use reasoning and in particular classical first-order regression to automatically generate a hypotheses language dedicated to the domain at hand, which is then used as input by an inductive solver. This approach avoids the more complex reasoning of symbolic dynamic programming while focusing the inductive solver’s attention on concepts that are specifically relevant to the optimal value function for the domain considered. 1
(Show Context)

Citation Context

...ocesses (MDPs) are now widely accepted as the preferred model for decisiontheoretic planning, state of the art MDP algorithms operate on either state-based or propositionally factored representations =-=[9, 1, 10, 5]-=-, thereby failing to exploit the relational structure of planning domains. Due to the size of these representations, such approaches do not scale very well as the number of objects increases. Furtherm...

The Joy of Forgetting: Faster Anytime Search via Restarting

by Silvia Richter, Jordan T. Thayer, Wheeler Ruml
"... {jtd7, ruml} at cs.unh.edu Anytime search algorithms solve optimisation problems by quickly finding a usually suboptimal solution and then finding improved solutions when given additional time. To deliver a solution quickly, they are typically greedy with respect to the heuristic cost-to-go estimate ..."
Abstract - Cited by 47 (15 self) - Add to MetaCart
{jtd7, ruml} at cs.unh.edu Anytime search algorithms solve optimisation problems by quickly finding a usually suboptimal solution and then finding improved solutions when given additional time. To deliver a solution quickly, they are typically greedy with respect to the heuristic cost-to-go estimate h. In this paper, we first show that this low-h bias can cause poor performance if the heuristic is inaccurate. Building on this observation, we then present a new anytime approach that restarts the search from the initial state every time a new solution is found. We demonstrate the utility of our method via experiments in PDDL planning as well as other domains. We show that it is particularly useful for hard optimisation problems like planning where heuristics may be quite inaccurate and inadmissible, and where the greedy solution makes early mistakes.
(Show Context)

Citation Context

...t. The procedures do have significant differences: RTDP makes irrevocable moves in the style of a local search, while RWA* keeps an open list. RTDP requires a labeling procedure to detect optimality (=-=Bonet and Geffner 2003-=-), while RWA* can terminate after a trial with weight 1 (when being used with an admissible heuristic; otherwise it can be extended easily to prove optimality by exhausting those states in Open that h...

Decision-Theoretic Military Operations Planning

by Douglas Aberdeen , et al. , 2004
"... Military operations planning involves concurrent actions, resource assignment, and conflicting costs. Individual tasks sometimes fail with a known probability, promoting a decision-theoretic approach. The planner must choose between multiple tasks that achieve similar outcomes but have different cos ..."
Abstract - Cited by 42 (7 self) - Add to MetaCart
Military operations planning involves concurrent actions, resource assignment, and conflicting costs. Individual tasks sometimes fail with a known probability, promoting a decision-theoretic approach. The planner must choose between multiple tasks that achieve similar outcomes but have different costs. The military domain is particularly suited to automated methods because hundreds of tasks, specified by many planning staff, need to be quickly and robustly coordinated. The authors

mGPT: A probabilistic planner based on heuristic search

by Blai Bonet - Journal of Artificial Intelligence Research , 2005
"... We describe the version of the GPT planner to be used in the planning competition. This version, called mGPT, solves mdps specified in the ppddl language by extracting and using different classes of lower bounds, along with various heuristic-search algorithms. The lower bounds are extracted from det ..."
Abstract - Cited by 39 (0 self) - Add to MetaCart
We describe the version of the GPT planner to be used in the planning competition. This version, called mGPT, solves mdps specified in the ppddl language by extracting and using different classes of lower bounds, along with various heuristic-search algorithms. The lower bounds are extracted from deterministic relaxations of the mdp where alternative probabilistic effects of an action are mapped into different, independent, deterministic actions. The heuristic-search algorithms, on the other hand, use these lower bounds for focusing the updates and delivering a consistent value function over all states reachable from the initial state with the greedy policy.

Learning Depth-First Search: A Unified Approach to Heuristic Search in Deterministic and Non-Deterministic Settings, and its application to MDPs

by Blai Bonet - In Proceedings of ICAPS’06 , 2006
"... Dynamic Programming provides a convenient and unified framework for studying many state models used in AI but no algorithms for handling large spaces. Heuristic-search methods, on the other hand, can handle large spaces but lack a common foundation. In this work, we combine the benefits of a general ..."
Abstract - Cited by 34 (0 self) - Add to MetaCart
Dynamic Programming provides a convenient and unified framework for studying many state models used in AI but no algorithms for handling large spaces. Heuristic-search methods, on the other hand, can handle large spaces but lack a common foundation. In this work, we combine the benefits of a general dynamic programming formulation with the power of heuristic-search techniques for developing an algorithmic framework, that we call Learning in Depth-First Search, that aims to be both general and effective. The basic LDFS algorithm searches for solutions by combining iterative, bounded depth-first searches, with learning in the sense of Korf’s LRTA * and Barto’s et al. RTDP. In each iteration, if there is a solution with cost not exceeding a lower bound, then the solution is found, else the process restarts with the lower bound and the value function updated. LDFS reduces to IDA * with Transposition Tables over deterministic models, but solves also non-deterministic, probabilistic, and game tree models, over which a slight variation reduces to the stateof-the-art MTD algorithm. Over Max AND/OR graphs, on the other hand, LDFS is a new algorithm which appears to be quite competitive with AO*.

Adaptive Multi-Robot Wide-Area Exploration and Mapping

by Kian Hsiang Low, John M. Dolan, Pradeep Khosla
"... The exploration problem is a central issue in mobile robotics. A complete terrain coverage is not practical if the environment is large with only a few small hotspots. This paper presents an adaptive multi-robot exploration strategy that is novel in performing both wide-area coverage and hotspot sam ..."
Abstract - Cited by 34 (22 self) - Add to MetaCart
The exploration problem is a central issue in mobile robotics. A complete terrain coverage is not practical if the environment is large with only a few small hotspots. This paper presents an adaptive multi-robot exploration strategy that is novel in performing both wide-area coverage and hotspot sampling using non-myopic path planning. As a result, the environmental phenomena can be accurately mapped. It is based on a dynamic programming formulation, which we call the Multi-robot Adaptive Sampling Problem (MASP). A key feature of MASP is in covering the entire adaptivity spectrum, thus allowing strategies of varying adaptivity to be formed and theoretically analyzed in their performance; a more adaptive strategy improves mapping accuracy. We apply MASP to sampling the Gaussian and log-
(Show Context)

Citation Context

...n-trivial issue arises with generalizing RTDP to handle the non-Markov structure of aMASP: the state space of MDP is often assumed to be tractable. Based on this assumption, RTDP has been enhanced in =-=[2, 3]-=- with additional procedures to improve convergence, which require time complexity linear in the state size. More importantly, improvements of RTDP [2, 3, 10, 19] emphasize the use of informed heuristi...

A survey of point-based POMDP solvers

by Guy Shani, Joelle Pineau, Robert Kaplow - AUTON AGENT MULTI-AGENT SYST , 2012
"... The past decade has seen a significant breakthrough in research on solving partially observable Markov decision processes (POMDPs). Where past solvers could not scale beyond perhaps a dozen states, modern solvers can handle complex domains with many thousands of states. This breakthrough was mainly ..."
Abstract - Cited by 33 (5 self) - Add to MetaCart
The past decade has seen a significant breakthrough in research on solving partially observable Markov decision processes (POMDPs). Where past solvers could not scale beyond perhaps a dozen states, modern solvers can handle complex domains with many thousands of states. This breakthrough was mainly due to the idea of restricting value function computations to a finite subset of the belief space, permitting only local value updates for this subset. This approach, known as point-based value iteration, avoids the exponential growth of the value function, and is thus applicable for domains with longer horizons, even with relatively large state spaces. Many extensions were suggested to this basic idea, focusing on various aspects of the algorithm—mainly the selection of the belief space subset, and the order of value function updates. In this survey, we walk the reader through the fundamentals of point-based value iteration, explaining the main concepts and ideas. Then, we survey the major extensions to the basic algorithm, discussing their merits. Finally, we include an extensive empirical analysis using well known benchmarks, in order to shed light on the strengths and limitations of the various approaches.

Prottle: A probabilistic temporal planner

by Iain Little, Douglas Aberdeen, Sylvie Thiébaux - In AAAI’05 , 2005
"... Planning with concurrent durative actions and probabilistic effects, or probabilistic temporal planning, is a relatively new area of research. The challenge is to replicate the success of modern temporal and probabilistic planners with domains that exhibit an interaction between time and uncertainty ..."
Abstract - Cited by 29 (5 self) - Add to MetaCart
Planning with concurrent durative actions and probabilistic effects, or probabilistic temporal planning, is a relatively new area of research. The challenge is to replicate the success of modern temporal and probabilistic planners with domains that exhibit an interaction between time and uncertainty. We present a general framework for probabilistic temporal planning in which effects, the time at which they occur, and action durations are all probabilistic. This framework includes a search space that is designed for solving probabilistic temporal planning problems via heuristic search, an algorithm that has been tailored to work with it, and an effective heuristic based on an extension of the planning graph data structure. Prottle is a planner that implements this framework, and can solve problems expressed in an extension of PDDL.
(Show Context)

Citation Context

...achieves this level of expressiveness while still maintaining a close alignment with existing work in probabilistic and temporal planning (Smith & Weld 1999; Blum & Langford 1999; Bacchus & Ady 2001; =-=Bonet & Geffner 2003-=-). We start with a brief description of the framework’s probabilistic durative actions, define the search space for our probabilistic temporal planning problem, present a trialbased search algorithm t...

Solving Concurrent Markov Decision Processes

by Mausam , Daniel S. Weld , 2004
"... Typically, Markov decision problems (MDPs) assume a single action is executed per decision epoch, but in the real world one may frequently execute certain actions in parallel. This paper explores concurrent MDPs, MDPs which allow multiple non-conflicting actions to be executed simultaneously, a ..."
Abstract - Cited by 25 (3 self) - Add to MetaCart
Typically, Markov decision problems (MDPs) assume a single action is executed per decision epoch, but in the real world one may frequently execute certain actions in parallel. This paper explores concurrent MDPs, MDPs which allow multiple non-conflicting actions to be executed simultaneously, and presents two new algorithms. Our first approach exploits two provably sound pruning rules, and thus guarantees solution optimality. Our second technique is a fast, samplingbased algorithm, which produces close-to-optimal solutions extremely quickly. Experiments show that our approaches outperform the existing algorithms producing up to two orders of magnitude speedup.
(Show Context)

Citation Context

...ntial blowup on all of these techniques. In this paper we investigate techniques to counter this combinatorial explosion. Specifically, we extend the technique of real-time dynamic programming (RTDP) =-=[1, 4]-=- to handle concurrency, making the following contributions: • We empirically illustrate the exponential blowup suffered by the existing MDP algorithms. • We describe two pruning strategies (combo-elim...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University