Results 1  10
of
103
DecisionTheoretic Planning: Structural Assumptions and Computational Leverage
 JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH
, 1999
"... Planning under uncertainty is a central problem in the study of automated sequential decision making, and has been addressed by researchers in many different fields, including AI planning, decision analysis, operations research, control theory and economics. While the assumptions and perspectives ..."
Abstract

Cited by 417 (4 self)
 Add to MetaCart
Planning under uncertainty is a central problem in the study of automated sequential decision making, and has been addressed by researchers in many different fields, including AI planning, decision analysis, operations research, control theory and economics. While the assumptions and perspectives adopted in these areas often differ in substantial ways, many planning problems of interest to researchers in these fields can be modeled as Markov decision processes (MDPs) and analyzed using the techniques of decision theory. This paper presents an overview and synthesis of MDPrelated methods, showing how they provide a unifying framework for modeling many classes of planning problems studied in AI. It also describes structural properties of MDPs that, when exhibited by particular classes of problems, can be exploited in the construction of optimal or approximately optimal policies or plans. Planning problems commonly possess structure in the reward and value functions used to de...
Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition
 Journal of Artificial Intelligence Research
, 2000
"... This paper presents a new approach to hierarchical reinforcement learning based on decomposing the target Markov decision process (MDP) into a hierarchy of smaller MDPs and decomposing the value function of the target MDP into an additive combination of the value functions of the smaller MDPs. Th ..."
Abstract

Cited by 367 (6 self)
 Add to MetaCart
This paper presents a new approach to hierarchical reinforcement learning based on decomposing the target Markov decision process (MDP) into a hierarchy of smaller MDPs and decomposing the value function of the target MDP into an additive combination of the value functions of the smaller MDPs. The decomposition, known as the MAXQ decomposition, has both a procedural semanticsas a subroutine hierarchyand a declarative semanticsas a representation of the value function of a hierarchical policy. MAXQ unifies and extends previous work on hierarchical reinforcement learning by Singh, Kaelbling, and Dayan and Hinton. It is based on the assumption that the programmer can identify useful subgoals and define subtasks that achieve these subgoals. By defining such subgoals, the programmer constrains the set of policies that need to be considered during reinforcement learning. The MAXQ value function decomposition can represent the value function of any policy that is consisten...
The Computational Complexity of Propositional STRIPS Planning
 Artificial Intelligence
, 1994
"... I present several computational complexity results for propositional STRIPS planning, i.e., STRIPS planning restricted to ground formulas. Different planning problems can be defined by restricting the type of formulas, placing limits on the number of pre and postconditions, by restricting negation ..."
Abstract

Cited by 299 (3 self)
 Add to MetaCart
I present several computational complexity results for propositional STRIPS planning, i.e., STRIPS planning restricted to ground formulas. Different planning problems can be defined by restricting the type of formulas, placing limits on the number of pre and postconditions, by restricting negation in pre and postconditions, and by requiring optimal plans. For these types of restrictions, I show when planning is tractable (polynomial) and intractable (NPhard) . In general, it is PSPACEcomplete to determine if a given planning instance has any solutions. Extremely severe restrictions on both the operators and the formulas are required to guarantee polynomial time or even NPcompleteness. For example, when only ground literals are permitted, determining plan existence is PSPACEcomplete even if operators are limited to two preconditions and two postconditions. When definite Horn ground formulas are permitted, determining plan existence is PSPACEcomplete even if operators are limited t...
Automatically Generating Abstractions for Planning
 Artificial Intelligence
, 1994
"... This article presents a completely automated approach to generating abstractions for planning. The abstractions are generated using a tractable, domainindependent algorithm whose only input is the definition of a problem to be solved and whose output is an abstraction hierarchy that is tailored ..."
Abstract

Cited by 178 (3 self)
 Add to MetaCart
This article presents a completely automated approach to generating abstractions for planning. The abstractions are generated using a tractable, domainindependent algorithm whose only input is the definition of a problem to be solved and whose output is an abstraction hierarchy that is tailored to the particular problem. The algorithm generates abstraction hierarchies by dropping literals from the original problem definition. It forms abstractions that satisfy the ordered monotonicity property, which guarantees that the structure of an abstract solution is not changed in the process of refining it. The algorithm for generating abstractions is implemented in a system called alpine, which generates abstractions for a hierarchical version of the prodigy problem solver. The abstractions generated by alpine are tested in multiple domains on large problem sets and are shown to produce shorter solutions with significantly less search than planning without using abstraction. 1 1 ...
Finding Optimal Solutions to Rubik's Cube Using Pattern Databases
, 1997
"... We have found the first optimal solutions to random instances of Rubik's Cube. The median optimal solution length appears to be 18 moves. The algorithm used is iterativedeepeningA* (IDA*), with a lowerbound heuristic function based on large memorybased lookup tables, or "pattern databases" (Culbe ..."
Abstract

Cited by 132 (6 self)
 Add to MetaCart
We have found the first optimal solutions to random instances of Rubik's Cube. The median optimal solution length appears to be 18 moves. The algorithm used is iterativedeepeningA* (IDA*), with a lowerbound heuristic function based on large memorybased lookup tables, or "pattern databases" (Culberson and Schaeffer 1996). These tables store the exact numberofmoves required to solve various subgoals of the problem, in this case subsets of the individual movable cubies. We characterize the effectiveness of an admissible heuristic function by its expected value, and hypothesize that the overall performance of the program obeys a relation in which the product of the time and space used equals the size of the state space. Thus, the speed of the program increases linearly with the amount of memory available. As computer memories become larger and cheaper, we believe that this approach will become increasingly costeffective.
Hierarchical solution of Markov decision processes using macroactions
 In Proc. of Uncertainty in Artificial Intelligence (UAI
, 1998
"... actions, or macroactions, in the solution of Markov decision processes. Unlike current models that combine both primitive actions and macroactions and leave the state space unchanged, we propose a hierarchical model (using an abstract MDP) that works with macroactions only, and that significantly ..."
Abstract

Cited by 125 (10 self)
 Add to MetaCart
actions, or macroactions, in the solution of Markov decision processes. Unlike current models that combine both primitive actions and macroactions and leave the state space unchanged, we propose a hierarchical model (using an abstract MDP) that works with macroactions only, and that significantly reduces the size of the state space. This is achieved by treating macroactions as local policies that act in certain regions MDP to those at the boundaries of regions. The abstract MDP approximates the original and can be solved more efficiently. We discuss several ways in which macroactions can be generated to ensure good solution quality. Finally, we consider ways in which macroactions can be reused to solve multiple, related MDPs; and we show that this can justify the computational overhead of macroaction generation. 1
Automatic Discovery of Subgoals in Reinforcement Learning using Diverse Density
 In Proceedings of the eighteenth international conference on machine learning
, 2001
"... This paper presents a method by which a reinforcement learning agent can automatically discover certain types of subgoals online. By creating useful new subgoals while learning, the agent is able to accelerate learning on the current task and to transfer its expertise to other, related tasks t ..."
Abstract

Cited by 116 (18 self)
 Add to MetaCart
This paper presents a method by which a reinforcement learning agent can automatically discover certain types of subgoals online. By creating useful new subgoals while learning, the agent is able to accelerate learning on the current task and to transfer its expertise to other, related tasks through the reuse of its ability to attain subgoals. The agent discovers subgoals based on commonalities across multiple paths to a solution. We cast the task of finding these commonalities as a multipleinstance learning problem and use the concept of diverse density to find solutions. We illustrate this approach using several gridworld tasks. 1.
Decomposition Techniques for Planning in Stochastic Domains
 IN PROCEEDINGS OF THE FOURTEENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI95
, 1995
"... This paper is concerned with modeling planning problems involving uncertainty as discretetime, finitestate stochastic automata. Solving planning problems is reduced to computing policies for Markov decision processes. Classical methods for solving Markov decision processes cannot cope with the siz ..."
Abstract

Cited by 109 (7 self)
 Add to MetaCart
This paper is concerned with modeling planning problems involving uncertainty as discretetime, finitestate stochastic automata. Solving planning problems is reduced to computing policies for Markov decision processes. Classical methods for solving Markov decision processes cannot cope with the size of the state spaces for typical problems encountered in practice. As an alternative, we investigate methods that decompose global planning problems into a number of local problems, solve the local problems separately, and then combine the local solutions to generate a global solution. We present algorithms that decompose planning problems into smaller problems given an arbitrary partition of the state space. The local problems are interpreted as Markov decision processes and solutions to the local problems are interpreted as policies restricted to the subsets of the state space defined by the partition. One algorithm relies on constructing and solving an abstract version of the original de...
Derivational Analogy in prodigy: Automating Case Acquisition
 Storage, and Utilization. Machine Learning
, 1993
"... Abstract. Expertise consists of rapid selection and application of compiled experience. Robust reasoning, however, requires adaptation to new contingencies and intelligent modification of past experience. And novel or creative reasoning, by its real nature, necessitates general problemsolving abili ..."
Abstract

Cited by 106 (15 self)
 Add to MetaCart
Abstract. Expertise consists of rapid selection and application of compiled experience. Robust reasoning, however, requires adaptation to new contingencies and intelligent modification of past experience. And novel or creative reasoning, by its real nature, necessitates general problemsolving abilities unconstrained by past behavior. This article presents a comprehensive computational model of analogical (casebased) reasoning that transitions smoothly between case replay, case adaptation, and general problem solving, exploiting and modifying past experience when available and resorting to general problemsolving methods when required. Learning occurs by accumulation of new cases, especially in situations that required extensive problem solving, and by tuning the indexing structure of the memory model to retrieve progressively more appropriate cases. The derivational replay mechanism is discussed in some detail, and extensive results of the first full implementation are presented. These results show up to a large performance improvement in a simple transportation domain for structurally similar problems, and smaller improvements when less strict similarity metrics are used for problems that share partial structure in a processjob planning domain and in an extended version of the STRIPS robot domain.
Disjoint pattern database heuristics
 Artificial Intelligence
, 2002
"... We explore a method for computing admissible heuristic evaluation functions for search problems. It utilizes pattern databases (Culberson & Schaeffer, 1998), which are precomputed tables of the exact cost of solving various subproblems of an existing problem. Unlike standard pattern database heurist ..."
Abstract

Cited by 104 (24 self)
 Add to MetaCart
We explore a method for computing admissible heuristic evaluation functions for search problems. It utilizes pattern databases (Culberson & Schaeffer, 1998), which are precomputed tables of the exact cost of solving various subproblems of an existing problem. Unlike standard pattern database heuristics, however, we partition our problems into disjoint subproblems, so that the costs of solving the different subproblems can be added together without overestimating the cost of solving the original problem. Previously (Korf & Felner, 2002) we showed how to statically partition the slidingtile puzzles into disjoint groups of tiles to compute an admissible heuristic, using the same partition for each state and problem instance. Here we extend the method and show that it applies to other domains as well. We also present another method for additive heuristics which we call dynamically partitioned pattern databases. Here we partition the problem into disjoint subproblems for each state of the search dynamically. We discuss the pros and cons of each of these methods and apply both methods to three different problem domains: the slidingtile puzzles, the 4peg Towers of Hanoi problem, and finding an optimal vertex cover of a graph. We find that in some problem domains, static partitioning is most effective, while in others dynamic partitioning is a better choice. In each of these problem domains, either statically partitioned or dynamically partitioned pattern database heuristics are the best known heuristics for the problem.