Results 1 - 10
of
132
Faster Heuristic Search Algorithms for Planning with Uncertainty and Full Feedback
- Proc. 18th International Joint Conf. on Artificial Intelligence
, 2003
"... Recent algorithms like RTDP and LAO* combine the strength of Heuristic Search (HS) and Dynamic Programming (DP) methods by exploiting knowledge of the initial state and an admissible heuristic function for producing optimal policies without evaluating the entire space. In this paper, we introdu ..."
Abstract
-
Cited by 54 (7 self)
- Add to MetaCart
Recent algorithms like RTDP and LAO* combine the strength of Heuristic Search (HS) and Dynamic Programming (DP) methods by exploiting knowledge of the initial state and an admissible heuristic function for producing optimal policies without evaluating the entire space. In this paper, we introduce and analyze three new HS/DP algorithms.
Exploiting First-Order Regression in Inductive Policy Selection
- Proceedings of the Twentieth Conference on Uncertainty in Artificial Intelligence (UAI’04
, 2004
"... We consider the problem of computing optimal generalised policies for relational Markov decision processes. We describe an approach combining some of the benefits of purely inductive techniques with those of symbolic dynamic programming methods. The latter reason about the optimal value function usi ..."
Abstract
-
Cited by 47 (2 self)
- Add to MetaCart
(Show Context)
We consider the problem of computing optimal generalised policies for relational Markov decision processes. We describe an approach combining some of the benefits of purely inductive techniques with those of symbolic dynamic programming methods. The latter reason about the optimal value function using first-order decisiontheoretic regression and formula rewriting, while the former, when provided with a suitable hypotheses language, are capable of generalising value functions or policies for small instances. Our idea is to use reasoning and in particular classical first-order regression to automatically generate a hypotheses language dedicated to the domain at hand, which is then used as input by an inductive solver. This approach avoids the more complex reasoning of symbolic dynamic programming while focusing the inductive solver’s attention on concepts that are specifically relevant to the optimal value function for the domain considered. 1
The Joy of Forgetting: Faster Anytime Search via Restarting
"... {jtd7, ruml} at cs.unh.edu Anytime search algorithms solve optimisation problems by quickly finding a usually suboptimal solution and then finding improved solutions when given additional time. To deliver a solution quickly, they are typically greedy with respect to the heuristic cost-to-go estimate ..."
Abstract
-
Cited by 47 (15 self)
- Add to MetaCart
(Show Context)
{jtd7, ruml} at cs.unh.edu Anytime search algorithms solve optimisation problems by quickly finding a usually suboptimal solution and then finding improved solutions when given additional time. To deliver a solution quickly, they are typically greedy with respect to the heuristic cost-to-go estimate h. In this paper, we first show that this low-h bias can cause poor performance if the heuristic is inaccurate. Building on this observation, we then present a new anytime approach that restarts the search from the initial state every time a new solution is found. We demonstrate the utility of our method via experiments in PDDL planning as well as other domains. We show that it is particularly useful for hard optimisation problems like planning where heuristics may be quite inaccurate and inadmissible, and where the greedy solution makes early mistakes.
Decision-Theoretic Military Operations Planning
, 2004
"... Military operations planning involves concurrent actions, resource assignment, and conflicting costs. Individual tasks sometimes fail with a known probability, promoting a decision-theoretic approach. The planner must choose between multiple tasks that achieve similar outcomes but have different cos ..."
Abstract
-
Cited by 42 (7 self)
- Add to MetaCart
Military operations planning involves concurrent actions, resource assignment, and conflicting costs. Individual tasks sometimes fail with a known probability, promoting a decision-theoretic approach. The planner must choose between multiple tasks that achieve similar outcomes but have different costs. The military domain is particularly suited to automated methods because hundreds of tasks, specified by many planning staff, need to be quickly and robustly coordinated. The authors
mGPT: A probabilistic planner based on heuristic search
- Journal of Artificial Intelligence Research
, 2005
"... We describe the version of the GPT planner to be used in the planning competition. This version, called mGPT, solves mdps specified in the ppddl language by extracting and using different classes of lower bounds, along with various heuristic-search algorithms. The lower bounds are extracted from det ..."
Abstract
-
Cited by 39 (0 self)
- Add to MetaCart
We describe the version of the GPT planner to be used in the planning competition. This version, called mGPT, solves mdps specified in the ppddl language by extracting and using different classes of lower bounds, along with various heuristic-search algorithms. The lower bounds are extracted from deterministic relaxations of the mdp where alternative probabilistic effects of an action are mapped into different, independent, deterministic actions. The heuristic-search algorithms, on the other hand, use these lower bounds for focusing the updates and delivering a consistent value function over all states reachable from the initial state with the greedy policy.
Learning Depth-First Search: A Unified Approach to Heuristic Search in Deterministic and Non-Deterministic Settings, and its application to MDPs
- In Proceedings of ICAPS’06
, 2006
"... Dynamic Programming provides a convenient and unified framework for studying many state models used in AI but no algorithms for handling large spaces. Heuristic-search methods, on the other hand, can handle large spaces but lack a common foundation. In this work, we combine the benefits of a general ..."
Abstract
-
Cited by 34 (0 self)
- Add to MetaCart
Dynamic Programming provides a convenient and unified framework for studying many state models used in AI but no algorithms for handling large spaces. Heuristic-search methods, on the other hand, can handle large spaces but lack a common foundation. In this work, we combine the benefits of a general dynamic programming formulation with the power of heuristic-search techniques for developing an algorithmic framework, that we call Learning in Depth-First Search, that aims to be both general and effective. The basic LDFS algorithm searches for solutions by combining iterative, bounded depth-first searches, with learning in the sense of Korf’s LRTA * and Barto’s et al. RTDP. In each iteration, if there is a solution with cost not exceeding a lower bound, then the solution is found, else the process restarts with the lower bound and the value function updated. LDFS reduces to IDA * with Transposition Tables over deterministic models, but solves also non-deterministic, probabilistic, and game tree models, over which a slight variation reduces to the stateof-the-art MTD algorithm. Over Max AND/OR graphs, on the other hand, LDFS is a new algorithm which appears to be quite competitive with AO*.
Adaptive Multi-Robot Wide-Area Exploration and Mapping
"... The exploration problem is a central issue in mobile robotics. A complete terrain coverage is not practical if the environment is large with only a few small hotspots. This paper presents an adaptive multi-robot exploration strategy that is novel in performing both wide-area coverage and hotspot sam ..."
Abstract
-
Cited by 34 (22 self)
- Add to MetaCart
(Show Context)
The exploration problem is a central issue in mobile robotics. A complete terrain coverage is not practical if the environment is large with only a few small hotspots. This paper presents an adaptive multi-robot exploration strategy that is novel in performing both wide-area coverage and hotspot sampling using non-myopic path planning. As a result, the environmental phenomena can be accurately mapped. It is based on a dynamic programming formulation, which we call the Multi-robot Adaptive Sampling Problem (MASP). A key feature of MASP is in covering the entire adaptivity spectrum, thus allowing strategies of varying adaptivity to be formed and theoretically analyzed in their performance; a more adaptive strategy improves mapping accuracy. We apply MASP to sampling the Gaussian and log-
A survey of point-based POMDP solvers
- AUTON AGENT MULTI-AGENT SYST
, 2012
"... The past decade has seen a significant breakthrough in research on solving partially observable Markov decision processes (POMDPs). Where past solvers could not scale beyond perhaps a dozen states, modern solvers can handle complex domains with many thousands of states. This breakthrough was mainly ..."
Abstract
-
Cited by 33 (5 self)
- Add to MetaCart
The past decade has seen a significant breakthrough in research on solving partially observable Markov decision processes (POMDPs). Where past solvers could not scale beyond perhaps a dozen states, modern solvers can handle complex domains with many thousands of states. This breakthrough was mainly due to the idea of restricting value function computations to a finite subset of the belief space, permitting only local value updates for this subset. This approach, known as point-based value iteration, avoids the exponential growth of the value function, and is thus applicable for domains with longer horizons, even with relatively large state spaces. Many extensions were suggested to this basic idea, focusing on various aspects of the algorithm—mainly the selection of the belief space subset, and the order of value function updates. In this survey, we walk the reader through the fundamentals of point-based value iteration, explaining the main concepts and ideas. Then, we survey the major extensions to the basic algorithm, discussing their merits. Finally, we include an extensive empirical analysis using well known benchmarks, in order to shed light on the strengths and limitations of the various approaches.
Prottle: A probabilistic temporal planner
- In AAAI’05
, 2005
"... Planning with concurrent durative actions and probabilistic effects, or probabilistic temporal planning, is a relatively new area of research. The challenge is to replicate the success of modern temporal and probabilistic planners with domains that exhibit an interaction between time and uncertainty ..."
Abstract
-
Cited by 29 (5 self)
- Add to MetaCart
(Show Context)
Planning with concurrent durative actions and probabilistic effects, or probabilistic temporal planning, is a relatively new area of research. The challenge is to replicate the success of modern temporal and probabilistic planners with domains that exhibit an interaction between time and uncertainty. We present a general framework for probabilistic temporal planning in which effects, the time at which they occur, and action durations are all probabilistic. This framework includes a search space that is designed for solving probabilistic temporal planning problems via heuristic search, an algorithm that has been tailored to work with it, and an effective heuristic based on an extension of the planning graph data structure. Prottle is a planner that implements this framework, and can solve problems expressed in an extension of PDDL.
Solving Concurrent Markov Decision Processes
, 2004
"... Typically, Markov decision problems (MDPs) assume a single action is executed per decision epoch, but in the real world one may frequently execute certain actions in parallel. This paper explores concurrent MDPs, MDPs which allow multiple non-conflicting actions to be executed simultaneously, a ..."
Abstract
-
Cited by 25 (3 self)
- Add to MetaCart
(Show Context)
Typically, Markov decision problems (MDPs) assume a single action is executed per decision epoch, but in the real world one may frequently execute certain actions in parallel. This paper explores concurrent MDPs, MDPs which allow multiple non-conflicting actions to be executed simultaneously, and presents two new algorithms. Our first approach exploits two provably sound pruning rules, and thus guarantees solution optimality. Our second technique is a fast, samplingbased algorithm, which produces close-to-optimal solutions extremely quickly. Experiments show that our approaches outperform the existing algorithms producing up to two orders of magnitude speedup.