Results 1  10
of
21
Planning and acting in partially observable stochastic domains
 ARTIFICIAL INTELLIGENCE
, 1998
"... In this paper, we bring techniques from operations research to bear on the problem of choosing optimal actions in partially observable stochastic domains. We begin by introducing the theory of Markov decision processes (mdps) and partially observable mdps (pomdps). We then outline a novel algorithm ..."
Abstract

Cited by 832 (30 self)
 Add to MetaCart
In this paper, we bring techniques from operations research to bear on the problem of choosing optimal actions in partially observable stochastic domains. We begin by introducing the theory of Markov decision processes (mdps) and partially observable mdps (pomdps). We then outline a novel algorithm for solving pomdps offline and show how, in some cases, a finitememory controller can be extracted from the solution to a pomdp. We conclude with a discussion of how our approach relates to previous work, the complexity of finding exact solutions to pomdps, and of some possibilities for finding approximate solutions.
DecisionTheoretic Planning: Structural Assumptions and Computational Leverage
 JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH
, 1999
"... Planning under uncertainty is a central problem in the study of automated sequential decision making, and has been addressed by researchers in many different fields, including AI planning, decision analysis, operations research, control theory and economics. While the assumptions and perspectives ..."
Abstract

Cited by 423 (4 self)
 Add to MetaCart
Planning under uncertainty is a central problem in the study of automated sequential decision making, and has been addressed by researchers in many different fields, including AI planning, decision analysis, operations research, control theory and economics. While the assumptions and perspectives adopted in these areas often differ in substantial ways, many planning problems of interest to researchers in these fields can be modeled as Markov decision processes (MDPs) and analyzed using the techniques of decision theory. This paper presents an overview and synthesis of MDPrelated methods, showing how they provide a unifying framework for modeling many classes of planning problems studied in AI. It also describes structural properties of MDPs that, when exhibited by particular classes of problems, can be exploited in the construction of optimal or approximately optimal policies or plans. Planning problems commonly possess structure in the reward and value functions used to de...
An Algorithm for Probabilistic Planning
, 1995
"... We define the probabilistic planning problem in terms of a probability distribution over initial world states, a boolean combination of propositions representing the goal, a probability threshold, and actions whose effects depend on the executiontime state of the world and on random chance. Adoptin ..."
Abstract

Cited by 258 (18 self)
 Add to MetaCart
We define the probabilistic planning problem in terms of a probability distribution over initial world states, a boolean combination of propositions representing the goal, a probability threshold, and actions whose effects depend on the executiontime state of the world and on random chance. Adopting a probabilistic model complicates the definition of plan success: instead of demanding a plan that provably achieves the goal, we seek plans whose probability of success exceeds the threshold. In this paper, we present buridan, an implemented leastcommitment planner that solves problems of this form. We prove that the algorithm is both sound and complete. We then explore buridan's efficiency by contrasting four algorithms for plan evaluation, using a combination of analytic methods and empirical experiments. We also describe the interplay between generating plans and evaluating them, and discuss the role of search control in probabilistic planning. 3 We gratefully acknowledge the comment...
Xavier: A Robot Navigation Architecture Based on Partially Observable Markov Decision Process Models
 Artificial Intelligence Based Mobile Robotics: Case Studies of Successful Robot Systems
, 1998
"... Autonomous mobile robots need very reliable navigation capabilities in order to operate unattended for long periods of time. We present a technique for achieving this goal that uses partially observable Markov decision process models (POMDPs) to explicitly model navigation uncertainty, including act ..."
Abstract

Cited by 98 (7 self)
 Add to MetaCart
Autonomous mobile robots need very reliable navigation capabilities in order to operate unattended for long periods of time. We present a technique for achieving this goal that uses partially observable Markov decision process models (POMDPs) to explicitly model navigation uncertainty, including actuator and sensor uncertainty and approximate knowledge of the environment. This allows the robot to maintain a probability distribution over its current pose. Thus, while the robot rarely knows exactly where it is, it always has some belief as to what its true pose is, and is never completely lost. We present a navigation architecture based on POMDPs that provides a uniform framework with an established theoretical foundation for pose estimation, path planning, robot control during navigation, and learning. Our experiments show that this architecture indeed leads to robust corridor navigation for an actual indoor mobile robot. 1
Utility Models for GoalDirected DecisionTheoretic Planners
 Computational Intelligence
, 1993
"... AI planning agents are goaldirected: success is measured in terms of whether or not an input goal is satisfied, and the agent's computational processes are driven by those goals. A decisiontheoretic agent, on the other hand, has no explicit goals success is measured in terms of its prefere ..."
Abstract

Cited by 95 (10 self)
 Add to MetaCart
AI planning agents are goaldirected: success is measured in terms of whether or not an input goal is satisfied, and the agent's computational processes are driven by those goals. A decisiontheoretic agent, on the other hand, has no explicit goals success is measured in terms of its preferences or a utility function that respects those preferences. The two approaches have complementary strengths and weaknesses. Symbolic planning provides a computational theory of plan generation, but under unrealistic assumptions: perfect information about and control over the world and a restrictive model of actions and goals. Decision theory provides a normative model of choice under uncertainty, but offers no guidance as to how the planning options are to be generated. This paper unifies the two approaches to planning by describing utility models that support rational decision making while retaining the goal information needed to support plan generation. We develop an extended model of goals tha...
An Algorithm for Probabilistic LeastCommitment Planning
, 1994
"... We define the probabilistic planning problem in terms of a probability distribution over initial world states, a boolean combination of goal propositions, a probability threshold, and actions whose effects depend on the executiontime state of the world and on random chance. Adopting a probabilistic ..."
Abstract

Cited by 83 (2 self)
 Add to MetaCart
We define the probabilistic planning problem in terms of a probability distribution over initial world states, a boolean combination of goal propositions, a probability threshold, and actions whose effects depend on the executiontime state of the world and on random chance. Adopting a probabilistic model complicates the definition of plan success: instead of demanding a plan that provably achieves the goal, we seek plans whose probability of success exceeds the threshold. This paper describes a probabilistic semantics for planning under uncertainty, and presents a fully implemented algorithm that generates plans that succeed with probability no less than a usersupplied probability threshold. The algorithm is sound (if it terminates then the generated plan is sufficiently likely to achieve the goal) and complete (the algorithm will generate a solution if one exists).
The effect of representation and knowledge on goaldirected exploration with reinforcement learning algorithms: The proofs
, 1995
"... Abstract. We analyze the complexity of online reinforcementlearning algorithms applied to goaldirected exploration tasks. Previous work had concluded that, even in deterministic state spaces, initially uninformed reinforcement learning was at least exponential for such problems, or that it was of ..."
Abstract

Cited by 49 (4 self)
 Add to MetaCart
Abstract. We analyze the complexity of online reinforcementlearning algorithms applied to goaldirected exploration tasks. Previous work had concluded that, even in deterministic state spaces, initially uninformed reinforcement learning was at least exponential for such problems, or that it was of polynomial worstcase timecomplexity only if the learning methods were augmented. We prove that, to the contrary, the algorithms are tractable with only a simple change in the reward structure (“penalizing the agent for action executions”) or in the initialization of the values that they maintain. In particular, we provide tight complexity bounds for both Watkins ’ Qlearning and Heger’s Qhatlearning and show how their complexity depends on properties of the state spaces. We also demonstrate how one can decrease the complexity even further by either learning action models or utilizing prior knowledge of the topology of the state spaces. Our results provide guidance for empirical reinforcementlearning researchers on how to distinguish hard reinforcementlearning problems from easy ones and how to represent them in a way that allows them to be solved efficiently.
Complexity analysis of realtime reinforcement learning applied to finding shortest paths in deterministic domains
, 1992
"... This paper analyzes the complexity of online reinforcement learning algorithms, namely asynchronous realtime versions of Qlearning and valueiteration, applied to the problem of reaching a goal state in deterministic domains. Previous work had concluded that, in many cases, tabula rasa reinforceme ..."
Abstract

Cited by 44 (5 self)
 Add to MetaCart
This paper analyzes the complexity of online reinforcement learning algorithms, namely asynchronous realtime versions of Qlearning and valueiteration, applied to the problem of reaching a goal state in deterministic domains. Previous work had concluded that, in many cases, tabula rasa reinforcement learning was exponential for such problems, or was tractable only if the learning algorithm was augmented. We show that, to the contrary, the algorithms are tractable with only a simple change in the task representation or initialization. We provide tight bounds on the worstcase complexity, and show how the complexity is even smaller if the reinforcement learning algorithms have initial knowledge of the topology of the state space or the domain has certain special properties. We also present a novel bidirectional Qlearning algorithm to find optimal paths from all states to a goal state and show that it is no more complex than the other algorithms.
Control Strategies for a Stochastic Planner
 In Proceedings of the Twelfth National Conference on Artificial Intelligence
, 1994
"... We present new algorithms for local planning over Markov decision processes. The baselevel algorithm possesses several interesting features for control of computation, based on selecting computations according to their expected benefit to decision quality. The algorithms are shown to expand the age ..."
Abstract

Cited by 36 (1 self)
 Add to MetaCart
We present new algorithms for local planning over Markov decision processes. The baselevel algorithm possesses several interesting features for control of computation, based on selecting computations according to their expected benefit to decision quality. The algorithms are shown to expand the agent's knowledge where the world warrants it, with appropriate responsiveness to time pressure and randomness. We then develop an introspective algorithm, using an internal representation of what computational work has already been done. This strategy extends the agent's knowledge base where warranted by the agent's world model and the agent's knowledge of the work already put into various parts of this model. It also enables the agent to act so as to take advantage of the computational savings inherent in staying in known parts of the state space. The control flexibility provided by this strategy, by incorporating natural problemsolving methods, directs computational effort towards where it'...
Planning with Execution and Incomplete Information
, 1996
"... We are motivated by the problem of building agents that interact in complex realworld domains, such as UNIX and the Internet. Such agents must be able to exploit complete information when possible, yet cope with incomplete information when necessary. They need to distinguish actions that return inf ..."
Abstract

Cited by 34 (4 self)
 Add to MetaCart
We are motivated by the problem of building agents that interact in complex realworld domains, such as UNIX and the Internet. Such agents must be able to exploit complete information when possible, yet cope with incomplete information when necessary. They need to distinguish actions that return information from those that change the world, and know when each type of action is appropriate. They must also be able to plan to obtain information needed for further planning. They should be able to represent and exploit the richness of their domains, including universally quantified causal (e.g., UNIX chmod *) and observational (e.g., ls) effects, which are ubiquitous in realworld domains such as the Internet. The xii planner solves the problems listed above by extending classical planner representations and algorithms to deal with incomplete information. xii represents and reasons about local closed world information, information preconditions and postconditions and universally quantified ...