Results 1  10
of
126
Planning and acting in partially observable stochastic domains
 ARTIFICIAL INTELLIGENCE
, 1998
"... In this paper, we bring techniques from operations research to bear on the problem of choosing optimal actions in partially observable stochastic domains. We begin by introducing the theory of Markov decision processes (mdps) and partially observable mdps (pomdps). We then outline a novel algorithm ..."
Abstract

Cited by 822 (31 self)
 Add to MetaCart
In this paper, we bring techniques from operations research to bear on the problem of choosing optimal actions in partially observable stochastic domains. We begin by introducing the theory of Markov decision processes (mdps) and partially observable mdps (pomdps). We then outline a novel algorithm for solving pomdps offline and show how, in some cases, a finitememory controller can be extracted from the solution to a pomdp. We conclude with a discussion of how our approach relates to previous work, the complexity of finding exact solutions to pomdps, and of some possibilities for finding approximate solutions.
Between MDPs and SemiMDPs: A Framework for Temporal Abstraction in Reinforcement Learning
 Artificial Intelligence
, 1999
"... Learning, planning, and representing knowledge at multiple levels of temporal abstraction are key, longstanding challenges for AI. In this paper we consider how these challenges can be addressed within the mathematical framework of reinforcement learning and Markov decision processes (MDPs). We ..."
Abstract

Cited by 426 (29 self)
 Add to MetaCart
Learning, planning, and representing knowledge at multiple levels of temporal abstraction are key, longstanding challenges for AI. In this paper we consider how these challenges can be addressed within the mathematical framework of reinforcement learning and Markov decision processes (MDPs). We extend the usual notion of action in this framework to include optionsclosedloop policies for taking action over a period of time. Examples of options include picking up an object, going to lunch, and traveling to a distant city, as well as primitive actions such as muscle twitches and joint torques. Overall, we show that options enable temporally abstract knowledge and action to be included in the reinforcement learning framework in a natural and general way. In particular, we show that options may be used interchangeably with primitive actions in planning methods such as dynamic programming and in learning methods such as Qlearning.
DecisionTheoretic Planning: Structural Assumptions and Computational Leverage
 JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH
, 1999
"... Planning under uncertainty is a central problem in the study of automated sequential decision making, and has been addressed by researchers in many different fields, including AI planning, decision analysis, operations research, control theory and economics. While the assumptions and perspectives ..."
Abstract

Cited by 417 (4 self)
 Add to MetaCart
Planning under uncertainty is a central problem in the study of automated sequential decision making, and has been addressed by researchers in many different fields, including AI planning, decision analysis, operations research, control theory and economics. While the assumptions and perspectives adopted in these areas often differ in substantial ways, many planning problems of interest to researchers in these fields can be modeled as Markov decision processes (MDPs) and analyzed using the techniques of decision theory. This paper presents an overview and synthesis of MDPrelated methods, showing how they provide a unifying framework for modeling many classes of planning problems studied in AI. It also describes structural properties of MDPs that, when exhibited by particular classes of problems, can be exploited in the construction of optimal or approximately optimal policies or plans. Planning problems commonly possess structure in the reward and value functions used to de...
Algorithms for Sequential Decision Making
, 1996
"... Sequential decision making is a fundamental task faced by any intelligent agent in an extended interaction with its environment; it is the act of answering the question "What should I do now?" In this thesis, I show how to answer this question when "now" is one of a finite set of states, "do" is one ..."
Abstract

Cited by 175 (8 self)
 Add to MetaCart
Sequential decision making is a fundamental task faced by any intelligent agent in an extended interaction with its environment; it is the act of answering the question "What should I do now?" In this thesis, I show how to answer this question when "now" is one of a finite set of states, "do" is one of a finite set of actions, "should" is maximize a longrun measure of reward, and "I" is an automated planning or learning system (agent). In particular,
LAO*: A heuristic search algorithm that finds solutions with loops
, 2001
"... Classic heuristic search algorithms can find solutions that take the form of a simple path (A*), a tree, or an acyclic graph (AO*). In this paper, we describe a novel generalization of heuristic search, called LAO*, that can find solutions with loops. We show that LAO* can be used to solve Markov de ..."
Abstract

Cited by 142 (14 self)
 Add to MetaCart
Classic heuristic search algorithms can find solutions that take the form of a simple path (A*), a tree, or an acyclic graph (AO*). In this paper, we describe a novel generalization of heuristic search, called LAO*, that can find solutions with loops. We show that LAO* can be used to solve Markov decision problems and that it shares the advantage heuristic search has over dynamic programming for other classes of problems. Given a start state, it can find an optimal solution without evaluating the entire state space. 2001 Elsevier Science B.V. All rights reserved. Keywords: Heuristic search; Dynamic programming; Markov decision problems 1.
On the complexity of solving Markov decision problems
 IN PROC. OF THE ELEVENTH INTERNATIONAL CONFERENCE ON UNCERTAINTY IN ARTIFICIAL INTELLIGENCE
, 1995
"... Markov decision problems (MDPs) provide the foundations for a number of problems of interest to AI researchers studying automated planning and reinforcement learning. In this paper, we summarize results regarding the complexity of solving MDPs and the running time of MDP solution algorithms. We argu ..."
Abstract

Cited by 131 (10 self)
 Add to MetaCart
Markov decision problems (MDPs) provide the foundations for a number of problems of interest to AI researchers studying automated planning and reinforcement learning. In this paper, we summarize results regarding the complexity of solving MDPs and the running time of MDP solution algorithms. We argue that, although MDPs can be solved efficiently in theory, more study is needed to reveal practical algorithms for solving large problems quickly. To encourage future research, we sketch some alternative methods of analysis that rely on the structure of MDPs.
Hierarchical solution of Markov decision processes using macroactions
 In Proc. of Uncertainty in Artificial Intelligence (UAI
, 1998
"... actions, or macroactions, in the solution of Markov decision processes. Unlike current models that combine both primitive actions and macroactions and leave the state space unchanged, we propose a hierarchical model (using an abstract MDP) that works with macroactions only, and that significantly ..."
Abstract

Cited by 125 (10 self)
 Add to MetaCart
actions, or macroactions, in the solution of Markov decision processes. Unlike current models that combine both primitive actions and macroactions and leave the state space unchanged, we propose a hierarchical model (using an abstract MDP) that works with macroactions only, and that significantly reduces the size of the state space. This is achieved by treating macroactions as local policies that act in certain regions MDP to those at the boundaries of regions. The abstract MDP approximates the original and can be solved more efficiently. We discuss several ways in which macroactions can be generated to ensure good solution quality. Finally, we consider ways in which macroactions can be reused to solve multiple, related MDPs; and we show that this can justify the computational overhead of macroaction generation. 1
Hierarchical Control and Learning for Markov Decision Processes
, 1998
"... This dissertation investigates the use of hierarchy and problem decomposition as a means of solving large, stochastic, sequential decision problems. These problems are framed as Markov decision problems (MDPs). The new technical content of this dissertation begins with a discussion of the concept o ..."
Abstract

Cited by 108 (2 self)
 Add to MetaCart
This dissertation investigates the use of hierarchy and problem decomposition as a means of solving large, stochastic, sequential decision problems. These problems are framed as Markov decision problems (MDPs). The new technical content of this dissertation begins with a discussion of the concept of temporal abstraction. Temporal abstraction is shown to be equivalent to the transformation of a policy defined over a region of an MDP to an action in a semiMarkov decision problem (SMDP). Several algorithms are presented for performing this transformation efficiently. This dissertation introduces the HAM method for generating hierarchical, temporally abstract actions. This method permits the partial specification of abstract actions in a way that corresponds to an abstract plan or strategy. Abstr...
Model Minimization in Markov Decision Processes
 In Proceedings of the Fourteenth National Conference on Artificial Intelligence
, 1997
"... We use the notion of stochastic bisimulation homogeneity to analyze planning problems represented as Markov decision processes (MDPs). Informally, a partition of the state space for an MDP is said to be homogeneous if for each action, states in the same block have the same probability of being ..."
Abstract

Cited by 105 (7 self)
 Add to MetaCart
We use the notion of stochastic bisimulation homogeneity to analyze planning problems represented as Markov decision processes (MDPs). Informally, a partition of the state space for an MDP is said to be homogeneous if for each action, states in the same block have the same probability of being carried to each other block. We provide an algorithm for finding the coarsest homogeneous refinement of any partition of the state space of an MDP. The resulting partition can be used to construct a reduced MDP which is minimal in a well defined sense and can be used to solve the original MDP. Our algorithm is an adaptation of known automata minimization algorithms, and is designed to operate naturally on factored or implicit representations in which the full state space is never explicitly enumerated. We show that simple variations on this algorithm are equivalent or closely similar to several different recently published algorithms for finding optimal solutions to (partially ...
Bridging the gap between planning and scheduling
 Knowledge Engineering Review
"... Planning research in Artificial Intelligence (AI) has often focused on problems where there are cascading levels of action choice and complex interactions between actions. In contrast, Scheduling research has focused on much larger problems where there is little action choice, but the resulting orde ..."
Abstract

Cited by 94 (9 self)
 Add to MetaCart
Planning research in Artificial Intelligence (AI) has often focused on problems where there are cascading levels of action choice and complex interactions between actions. In contrast, Scheduling research has focused on much larger problems where there is little action choice, but the resulting ordering problem is hard. In this paper, we give an overview of AI planning and scheduling techniques, focusing on their similarities, differences, and limitations. We also argue that many difficult practical problems lie somewhere between planning and scheduling, and that neither area has the right set of tools for solving these vexing problems. 1 The Ambitious Spacecraft Imagine a hypothetical spacecraft enroute to a distant planet. Between propulsion cycles, there are time windows when the craft can be turned for communication and scientific observations. At any given time, the spacecraft has a large set of possible scientific observations that it can perform, each having some value or priority. For each observation, the spacecraft will need to be turned towards the target and the required measurement or exposure taken. Unfortunately, turning to a target is a slow operation that may take up to 30 minutes, depending on the magnitude of the turn. As a result, the choice of experiments and the order in which they are performed has a significant impact on the duration of turns and, therefore, on how much can be accomplished. All this is further complicated by several things: