Results 1 - 10
of
276
Decision-Theoretic Planning: Structural Assumptions and Computational Leverage
- JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH
, 1999
"... Planning under uncertainty is a central problem in the study of automated sequential decision making, and has been addressed by researchers in many different fields, including AI planning, decision analysis, operations research, control theory and economics. While the assumptions and perspectives ..."
Abstract
-
Cited by 342 (3 self)
- Add to MetaCart
Planning under uncertainty is a central problem in the study of automated sequential decision making, and has been addressed by researchers in many different fields, including AI planning, decision analysis, operations research, control theory and economics. While the assumptions and perspectives adopted in these areas often differ in substantial ways, many planning problems of interest to researchers in these fields can be modeled as Markov decision processes (MDPs) and analyzed using the techniques of decision theory. This paper presents an overview and synthesis of MDP-related methods, showing how they provide a unifying framework for modeling many classes of planning problems studied in AI. It also describes structural properties of MDPs that, when exhibited by particular classes of problems, can be exploited in the construction of optimal or approximately optimal policies or plans. Planning problems commonly possess structure in the reward and value functions used to de...
Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning
- Artificial Intelligence
, 1999
"... Learning, planning, and representing knowledge at multiple levels of temporal abstraction are key, longstanding challenges for AI. In this paper we consider how these challenges can be addressed within the mathematical framework of reinforcement learning and Markov decision processes (MDPs). We ..."
Abstract
-
Cited by 342 (22 self)
- Add to MetaCart
Learning, planning, and representing knowledge at multiple levels of temporal abstraction are key, longstanding challenges for AI. In this paper we consider how these challenges can be addressed within the mathematical framework of reinforcement learning and Markov decision processes (MDPs). We extend the usual notion of action in this framework to include options---closed-loop policies for taking action over a period of time. Examples of options include picking up an object, going to lunch, and traveling to a distant city, as well as primitive actions such as muscle twitches and joint torques. Overall, we show that options enable temporally abstract knowledge and action to be included in the reinforcement learning framework in a natural and general way. In particular, we show that options may be used interchangeably with primitive actions in planning methods such as dynamic programming and in learning methods such as Q-learning.
Explanation-Based Learning: An Alternative View
- Machine Learning
, 1986
"... Key words: machine learning, concept acquisition, explanation-based learning Abstract. In the last issue of this journal Mitchell, Keller, and Kedar-Cabelli presented a unifying framework for the explanation-based approach to machine learning. While it works well for a number of systems, the framewo ..."
Abstract
-
Cited by 333 (19 self)
- Add to MetaCart
Key words: machine learning, concept acquisition, explanation-based learning Abstract. In the last issue of this journal Mitchell, Keller, and Kedar-Cabelli presented a unifying framework for the explanation-based approach to machine learning. While it works well for a number of systems, the framework does not adequately capture certain aspects of the systems under development by the explanation-based learning group at Illinois. The primary inadequacies arise in the treatment of concept operationality, organization of knowledge into schemata, and learning from observation. This paper outlines six specific problems with the previously proposed framework and presents an alternative generalization method to perform explanation-based learning of new concepts.
Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition
- Journal of Artificial Intelligence Research
, 2000
"... This paper presents a new approach to hierarchical reinforcement learning based on decomposing the target Markov decision process (MDP) into a hierarchy of smaller MDPs and decomposing the value function of the target MDP into an additive combination of the value functions of the smaller MDPs. Th ..."
Abstract
-
Cited by 307 (6 self)
- Add to MetaCart
This paper presents a new approach to hierarchical reinforcement learning based on decomposing the target Markov decision process (MDP) into a hierarchy of smaller MDPs and decomposing the value function of the target MDP into an additive combination of the value functions of the smaller MDPs. The decomposition, known as the MAXQ decomposition, has both a procedural semantics---as a subroutine hierarchy---and a declarative semantics---as a representation of the value function of a hierarchical policy. MAXQ unifies and extends previous work on hierarchical reinforcement learning by Singh, Kaelbling, and Dayan and Hinton. It is based on the assumption that the programmer can identify useful subgoals and define subtasks that achieve these subgoals. By defining such subgoals, the programmer constrains the set of policies that need to be considered during reinforcement learning. The MAXQ value function decomposition can represent the value function of any policy that is consisten...
Universal Plans for Reactive Robots in Unpredictable Environments
, 1987
"... In: Proc 10th IJCAI, 1987, 1039ff. To date, reactive robot behavior has been achieved only through manual programming. This paper describes a new kind of plan, called a "universal plan", which can be synthesized automatically, yet generates appropriate behavior in unpredictable environments. In cla ..."
Abstract
-
Cited by 306 (0 self)
- Add to MetaCart
In: Proc 10th IJCAI, 1987, 1039ff. To date, reactive robot behavior has been achieved only through manual programming. This paper describes a new kind of plan, called a "universal plan", which can be synthesized automatically, yet generates appropriate behavior in unpredictable environments. In classical planning work, problems were posed with unique initial and final world states; in my approach a problem specifies only a goal condition. The planner is thus unable to commit to any specific future course of events but must specify appropriate reactions for anticipated situations. An alternative conception is that one universal plan compactly represents every classical plan. Which part of the universal plan is executed depends entirely on how the environment behaves at execution time. Universal plans are constructed from state-space operator schemas by a nonlinear planner. They explicitly identify predicates requiring monitoring at each moment of execution, and provide for sabotage, se...
Teleo-reactive programs for agent control
- Journal of Artificial Intelligence Research
, 1994
"... A formalism is presented for computing and organizing actions for autonomous agents in dynamic environments. We introduce the notion of teleo-reactive (T-R) programs whose execution entails the construction of circuitry for the continuous computation of the parameters and conditions on which agent a ..."
Abstract
-
Cited by 183 (1 self)
- Add to MetaCart
A formalism is presented for computing and organizing actions for autonomous agents in dynamic environments. We introduce the notion of teleo-reactive (T-R) programs whose execution entails the construction of circuitry for the continuous computation of the parameters and conditions on which agent action is based. In addition to continuous feedback, T-R programs support parameter binding and recursion. A primary di erence between T-R programs and many other circuit-based systems is that the circuitry of T-R programs is more compact; it is constructed at run time and thus does not have toanticipate all the contingencies that might arise over all possible runs. In addition, T-R programs are intuitive and easy to write and are written in a form that is compatible with automatic planning and learning methods. We brie y describe some experimental applications of T-R programs in the control of simulated and actual mobile robots. 1.
Automated Discourse Generation Using Discourse Structure Relations
- Artificial Intelligence
, 1993
"... This paper summarizes work over the past five years on the automated planning and generation of multisentence texts using discourse structure relations, placing it in context of ongoing efforts by Computational Linguists and Linguists to understand the structure of discourse. Based on a series of ..."
Abstract
-
Cited by 162 (1 self)
- Add to MetaCart
This paper summarizes work over the past five years on the automated planning and generation of multisentence texts using discourse structure relations, placing it in context of ongoing efforts by Computational Linguists and Linguists to understand the structure of discourse. Based on a series of studies by the author and others, the paper describes how the orientation of generation toward communicative intentions illuminates the central structural role played by intersegment discourse relations. It outlines several facets of discourse structure relations as they are required by and used in text planners --- their nature, number, and extension to associated tasks such as sentence planning and text formatting. In Artificial Intelligence 63, Special Issue on Natural Language Processing, 1993. This work was partially supported by the Rome Air Development Center under RADC contract FQ7619-8903326 -0001. 1 1 Introduction Every day, people produce thousands of words of connected...
Automatically Generating Abstractions for Planning
- Artificial Intelligence
, 1994
"... This article presents a completely automated approach to generating abstractions for planning. The abstractions are generated using a tractable, domain-independent algorithm whose only input is the definition of a problem to be solved and whose output is an abstraction hierarchy that is tailored ..."
Abstract
-
Cited by 156 (3 self)
- Add to MetaCart
This article presents a completely automated approach to generating abstractions for planning. The abstractions are generated using a tractable, domain-independent algorithm whose only input is the definition of a problem to be solved and whose output is an abstraction hierarchy that is tailored to the particular problem. The algorithm generates abstraction hierarchies by dropping literals from the original problem definition. It forms abstractions that satisfy the ordered monotonicity property, which guarantees that the structure of an abstract solution is not changed in the process of refining it. The algorithm for generating abstractions is implemented in a system called alpine, which generates abstractions for a hierarchical version of the prodigy problem solver. The abstractions generated by alpine are tested in multiple domains on large problem sets and are shown to produce shorter solutions with significantly less search than planning without using abstraction. 1 1 ...
Planning Under Time Constraints in Stochastic Domains
- ARTIFICIAL INTELLIGENCE
, 1993
"... We provide a method, based on the theory of Markov decision processes, for efficient planning in stochastic domains. Goals are encoded as reward functions, expressing the desirability of each world state; the planner must find a policy (mapping from states to actions) that maximizes future reward ..."
Abstract
-
Cited by 150 (17 self)
- Add to MetaCart
We provide a method, based on the theory of Markov decision processes, for efficient planning in stochastic domains. Goals are encoded as reward functions, expressing the desirability of each world state; the planner must find a policy (mapping from states to actions) that maximizes future rewards. Standard goals of achievement, as well as goals of maintenance and prioritized combinations of goals, can be specified in this way. An optimal policy can be found using existing methods, but these methods require time at best polynomial in the number of states in the domain, where the number of states is exponential in the number of propositions (or state variables). By using information about the starting state, the reward function, and the transition probabilities of the domain, we restrict the planner's attention to a set of world states that are likely to be encountered in satisfying the goal. Using this restricted set of states, the planner can generate more or less complete ...
Planning With Deadlines in Stochastic Domains
- In Proceedings of the Eleventh National Conference on Artificial Intelligence
, 1993
"... We provide a method, based on the theory of Markov decision problems, for efficient planning in stochastic domains. Goals are encoded as reward functions, expressing the desirability of each world state; the planner must find a policy (mapping from states to actions) that maximizes future rewards. S ..."
Abstract
-
Cited by 125 (10 self)
- Add to MetaCart
We provide a method, based on the theory of Markov decision problems, for efficient planning in stochastic domains. Goals are encoded as reward functions, expressing the desirability of each world state; the planner must find a policy (mapping from states to actions) that maximizes future rewards. Standard goals of achievement, as well as goals of maintenance and prioritized combinations of goals, can be specified in this way. An optimal policy can be found using existing methods, but these methods are at best polynomial in the number of states in the domain, where the number of states is exponential in the number of propositions (or state variables) . By using information about the starting state, the reward function, and the transition probabilities of the domain, we can restrict the planner's attention to a set of world states that are likely to be encountered in satisfying the goal. Furthermore, the planner can generate more or less complete plans depending on the time it has avail...

