Results 1 - 10
of
18
Planning and acting in partially observable stochastic domains
- ARTIFICIAL INTELLIGENCE
, 1998
"... In this paper, we bring techniques from operations research to bear on the problem of choosing optimal actions in partially observable stochastic domains. We begin by introducing the theory of Markov decision processes (mdps) and partially observable mdps (pomdps). We then outline a novel algorithm ..."
Abstract
-
Cited by 629 (24 self)
- Add to MetaCart
In this paper, we bring techniques from operations research to bear on the problem of choosing optimal actions in partially observable stochastic domains. We begin by introducing the theory of Markov decision processes (mdps) and partially observable mdps (pomdps). We then outline a novel algorithm for solving pomdps offline and show how, in some cases, a finite-memory controller can be extracted from the solution to a pomdp. We conclude with a discussion of how our approach relates to previous work, the complexity of finding exact solutions to pomdps, and of some possibilities for finding approximate solutions.
Decision-Theoretic Planning: Structural Assumptions and Computational Leverage
- JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH
, 1999
"... Planning under uncertainty is a central problem in the study of automated sequential decision making, and has been addressed by researchers in many different fields, including AI planning, decision analysis, operations research, control theory and economics. While the assumptions and perspectives ..."
Abstract
-
Cited by 342 (3 self)
- Add to MetaCart
Planning under uncertainty is a central problem in the study of automated sequential decision making, and has been addressed by researchers in many different fields, including AI planning, decision analysis, operations research, control theory and economics. While the assumptions and perspectives adopted in these areas often differ in substantial ways, many planning problems of interest to researchers in these fields can be modeled as Markov decision processes (MDPs) and analyzed using the techniques of decision theory. This paper presents an overview and synthesis of MDP-related methods, showing how they provide a unifying framework for modeling many classes of planning problems studied in AI. It also describes structural properties of MDPs that, when exhibited by particular classes of problems, can be exploited in the construction of optimal or approximately optimal policies or plans. Planning problems commonly possess structure in the reward and value functions used to de...
An Algorithm for Probabilistic Planning
, 1995
"... We define the probabilistic planning problem in terms of a probability distribution over initial world states, a boolean combination of propositions representing the goal, a probability threshold, and actions whose effects depend on the execution-time state of the world and on random chance. Adoptin ..."
Abstract
-
Cited by 235 (18 self)
- Add to MetaCart
We define the probabilistic planning problem in terms of a probability distribution over initial world states, a boolean combination of propositions representing the goal, a probability threshold, and actions whose effects depend on the execution-time state of the world and on random chance. Adopting a probabilistic model complicates the definition of plan success: instead of demanding a plan that provably achieves the goal, we seek plans whose probability of success exceeds the threshold. In this paper, we present buridan, an implemented least-commitment planner that solves problems of this form. We prove that the algorithm is both sound and complete. We then explore buridan's efficiency by contrasting four algorithms for plan evaluation, using a combination of analytic methods and empirical experiments. We also describe the interplay between generating plans and evaluating them, and discuss the role of search control in probabilistic planning. 3 We gratefully acknowledge the comment...
Utility Models for Goal-Directed Decision-Theoretic Planners
- Computational Intelligence
, 1993
"... AI planning agents are goal-directed: success is measured in terms of whether or not an input goal is satisfied, and the agent's computational processes are driven by those goals. A decision-theoretic agent, on the other hand, has no explicit goals--- success is measured in terms of its preferences ..."
Abstract
-
Cited by 88 (10 self)
- Add to MetaCart
AI planning agents are goal-directed: success is measured in terms of whether or not an input goal is satisfied, and the agent's computational processes are driven by those goals. A decision-theoretic agent, on the other hand, has no explicit goals--- success is measured in terms of its preferences or a utility function that respects those preferences. The two approaches have complementary strengths and weaknesses. Symbolic planning provides a computational theory of plan generation, but under unrealistic assumptions: perfect information about and control over the world and a restrictive model of actions and goals. Decision theory provides a normative model of choice under uncertainty, but offers no guidance as to how the planning options are to be generated. This paper unifies the two approaches to planning by describing utility models that support rational decision making while retaining the goal information needed to support plan generation. We develop an extended model of goals tha...
Xavier: A Robot Navigation Architecture Based on Partially Observable Markov Decision Process Models
- Artificial Intelligence Based Mobile Robotics: Case Studies of Successful Robot Systems
, 1998
"... Autonomous mobile robots need very reliable navigation capabilities in order to operate unattended for long periods of time. We present a technique for achieving this goal that uses partially observable Markov decision process models (POMDPs) to explicitly model navigation uncertainty, including act ..."
Abstract
-
Cited by 88 (7 self)
- Add to MetaCart
Autonomous mobile robots need very reliable navigation capabilities in order to operate unattended for long periods of time. We present a technique for achieving this goal that uses partially observable Markov decision process models (POMDPs) to explicitly model navigation uncertainty, including actuator and sensor uncertainty and approximate knowledge of the environment. This allows the robot to maintain a probability distribution over its current pose. Thus, while the robot rarely knows exactly where it is, it always has some belief as to what its true pose is, and is never completely lost. We present a navigation architecture based on POMDPs that provides a uniform framework with an established theoretical foundation for pose estimation, path planning, robot control during navigation, and learning. Our experiments show that this architecture indeed leads to robust corridor navigation for an actual indoor mobile robot. 1
An Algorithm for Probabilistic Least-Commitment Planning
, 1994
"... We define the probabilistic planning problem in terms of a probability distribution over initial world states, a boolean combination of goal propositions, a probability threshold, and actions whose effects depend on the execution-time state of the world and on random chance. Adopting a probabilistic ..."
Abstract
-
Cited by 81 (2 self)
- Add to MetaCart
We define the probabilistic planning problem in terms of a probability distribution over initial world states, a boolean combination of goal propositions, a probability threshold, and actions whose effects depend on the execution-time state of the world and on random chance. Adopting a probabilistic model complicates the definition of plan success: instead of demanding a plan that provably achieves the goal, we seek plans whose probability of success exceeds the threshold. This paper describes a probabilistic semantics for planning under uncertainty, and presents a fully implemented algorithm that generates plans that succeed with probability no less than a user-supplied probability threshold. The algorithm is sound (if it terminates then the generated plan is sufficiently likely to achieve the goal) and complete (the algorithm will generate a solution if one exists).
The effect of representation and knowledge on goal-directed exploration with reinforcement learning algorithms: The proofs
, 1995
"... Abstract. We analyze the complexity of on-line reinforcement-learning algorithms applied to goal-directed exploration tasks. Previous work had concluded that, even in deterministic state spaces, initially uninformed reinforcement learning was at least exponential for such problems, or that it was of ..."
Abstract
-
Cited by 45 (4 self)
- Add to MetaCart
Abstract. We analyze the complexity of on-line reinforcement-learning algorithms applied to goal-directed exploration tasks. Previous work had concluded that, even in deterministic state spaces, initially uninformed reinforcement learning was at least exponential for such problems, or that it was of polynomial worst-case time-complexity only if the learning methods were augmented. We prove that, to the contrary, the algorithms are tractable with only a simple change in the reward structure (“penalizing the agent for action executions”) or in the initialization of the values that they maintain. In particular, we provide tight complexity bounds for both Watkins ’ Q-learning and Heger’s Q-hat-learning and show how their complexity depends on properties of the state spaces. We also demonstrate how one can decrease the complexity even further by either learning action models or utilizing prior knowledge of the topology of the state spaces. Our results provide guidance for empirical reinforcement-learning researchers on how to distinguish hard reinforcement-learning problems from easy ones and how to represent them in a way that allows them to be solved efficiently.
Complexity analysis of real-time reinforcement learning applied to finding shortest paths in deterministic domains
, 1992
"... This paper analyzes the complexity of on-line reinforcement learning algorithms, namely asynchronous realtime versions of Q-learning and value-iteration, applied to the problem of reaching a goal state in deterministic domains. Previous work had concluded that, in many cases, tabula rasa reinforceme ..."
Abstract
-
Cited by 39 (4 self)
- Add to MetaCart
This paper analyzes the complexity of on-line reinforcement learning algorithms, namely asynchronous realtime versions of Q-learning and value-iteration, applied to the problem of reaching a goal state in deterministic domains. Previous work had concluded that, in many cases, tabula rasa reinforcement learning was exponential for such problems, or was tractable only if the learning algorithm was augmented. We show that, to the contrary, the algorithms are tractable with only a simple change in the task representation or initialization. We provide tight bounds on the worst-case complexity, and show how the complexity is even smaller if the reinforcement learning algorithms have initial knowledge of the topology of the state space or the domain has certain special properties. We also present a novel bidirectional Q-learning algorithm to find optimal paths from all states to a goal state and show that it is no more complex than the other algorithms.
Control Strategies for a Stochastic Planner
- In Proceedings of the Twelfth National Conference on Artificial Intelligence
, 1994
"... We present new algorithms for local planning over Markov decision processes. The base-level algorithm possesses several interesting features for control of computation, based on selecting computations according to their expected benefit to decision quality. The algorithms are shown to expand the age ..."
Abstract
-
Cited by 34 (1 self)
- Add to MetaCart
We present new algorithms for local planning over Markov decision processes. The base-level algorithm possesses several interesting features for control of computation, based on selecting computations according to their expected benefit to decision quality. The algorithms are shown to expand the agent's knowledge where the world warrants it, with appropriate responsiveness to time pressure and randomness. We then develop an introspective algorithm, using an internal representation of what computational work has already been done. This strategy extends the agent's knowledge base where warranted by the agent's world model and the agent's knowledge of the work already put into various parts of this model. It also enables the agent to act so as to take advantage of the computational savings inherent in staying in known parts of the state space. The control flexibility provided by this strategy, by incorporating natural problem-solving methods, directs computational effort towards where it'...
Planning with Execution and Incomplete Information
, 1996
"... We are motivated by the problem of building agents that interact in complex real-world domains, such as UNIX and the Internet. Such agents must be able to exploit complete information when possible, yet cope with incomplete information when necessary. They need to distinguish actions that return inf ..."
Abstract
-
Cited by 34 (4 self)
- Add to MetaCart
We are motivated by the problem of building agents that interact in complex real-world domains, such as UNIX and the Internet. Such agents must be able to exploit complete information when possible, yet cope with incomplete information when necessary. They need to distinguish actions that return information from those that change the world, and know when each type of action is appropriate. They must also be able to plan to obtain information needed for further planning. They should be able to represent and exploit the richness of their domains, including universally quantified causal (e.g., UNIX chmod *) and observational (e.g., ls) effects, which are ubiquitous in real-world domains such as the Internet. The xii planner solves the problems listed above by extending classical planner representations and algorithms to deal with incomplete information. xii represents and reasons about local closed world information, information preconditions and postconditions and universally quantified ...

