Results 1 - 10
of
150
Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning
- Artificial Intelligence
, 1999
"... Learning, planning, and representing knowledge at multiple levels of temporal abstraction are key, longstanding challenges for AI. In this paper we consider how these challenges can be addressed within the mathematical framework of reinforcement learning and Markov decision processes (MDPs). We ..."
Abstract
-
Cited by 342 (22 self)
- Add to MetaCart
Learning, planning, and representing knowledge at multiple levels of temporal abstraction are key, longstanding challenges for AI. In this paper we consider how these challenges can be addressed within the mathematical framework of reinforcement learning and Markov decision processes (MDPs). We extend the usual notion of action in this framework to include options---closed-loop policies for taking action over a period of time. Examples of options include picking up an object, going to lunch, and traveling to a distant city, as well as primitive actions such as muscle twitches and joint torques. Overall, we show that options enable temporally abstract knowledge and action to be included in the reinforcement learning framework in a natural and general way. In particular, we show that options may be used interchangeably with primitive actions in planning methods such as dynamic programming and in learning methods such as Q-learning.
A Robust and Fast Action Selection Mechanism for Planning
- In Proceedings of AAAI-97
, 1997
"... The ability to plan and react in dynamic environments is central to intelligent behavior yet few algorithms have managed to combine fast planning with a robust execution. In this paper we develop one such algorithm by looking at planning as real time search. For that we develop a variation of Korf's ..."
Abstract
-
Cited by 127 (17 self)
- Add to MetaCart
The ability to plan and react in dynamic environments is central to intelligent behavior yet few algorithms have managed to combine fast planning with a robust execution. In this paper we develop one such algorithm by looking at planning as real time search. For that we develop a variation of Korf's Learning Real Time A algorithm together with a suitable heuristic function. The resulting algorithm interleaves lookahead with execution and never builds a plan. It is an action selection mechanism that decides at each time point what to do next. Yet it solves hard planning problems faster than any domain independent planning algorithm known to us, including the powerful SAT planner recently introduced by Kautz and Selman. It also works in the presence of perturbations and noise, and can be given a fixed time window to operate. We illustrate each of these features by running the algorithm on a number of benchmark problems. 1 Introduction The ability to plan and react ...
Hierarchical Control and Learning for Markov Decision Processes
, 1998
"... This dissertation investigates the use of hierarchy and problem decomposition as a means of solving large, stochastic, sequential decision problems. These problems are framed as Markov decision problems (MDPs). The new technical content of this dissertation begins with a discussion of the concept o ..."
Abstract
-
Cited by 98 (2 self)
- Add to MetaCart
This dissertation investigates the use of hierarchy and problem decomposition as a means of solving large, stochastic, sequential decision problems. These problems are framed as Markov decision problems (MDPs). The new technical content of this dissertation begins with a discussion of the concept of temporal abstraction. Temporal abstraction is shown to be equivalent to the transformation of a policy defined over a region of an MDP to an action in a semi-Markov decision problem (SMDP). Several algorithms are presented for performing this transformation efficiently. This dissertation introduces the HAM method for generating hierarchical, temporally abstract actions. This method permits the partial specification of abstract actions in a way that corresponds to an abstract plan or strategy. Abstr...
Programmable reinforcement learning agents
, 2001
"... We present an expressive agent design language for reinforcement learning that allows the user to constrain the policies considered by the learning process.The language includes standard features such as parameterized subroutines, temporary interrupts, aborts, and memory variables, but also allows f ..."
Abstract
-
Cited by 87 (1 self)
- Add to MetaCart
We present an expressive agent design language for reinforcement learning that allows the user to constrain the policies considered by the learning process.The language includes standard features such as parameterized subroutines, temporary interrupts, aborts, and memory variables, but also allows for unspecified choices in the agent program. For learning that which isn’t specified, we present provably convergent learning algorithms. We demonstrate by example that agent programs written in the language are concise as well as modular. This facilitates state abstraction and the transferability of learned skills. 1
Creating Advice-Taking Reinforcement Learners
- Machine Learning
, 1996
"... . Learning from reinforcements is a promising approach for creating intelligent agents. However, reinforcement learning usually requires a large number of training episodes. We present and evaluate a design that addresses this shortcoming by allowing a connectionist Q-learner to accept advice given, ..."
Abstract
-
Cited by 84 (10 self)
- Add to MetaCart
. Learning from reinforcements is a promising approach for creating intelligent agents. However, reinforcement learning usually requires a large number of training episodes. We present and evaluate a design that addresses this shortcoming by allowing a connectionist Q-learner to accept advice given, at any time and in a natural manner, by an external observer. In our approach, the advice-giver watches the learner and occasionally makes suggestions, expressed as instructions in a simple imperative programming language. Based on techniques from knowledge-based neural networks, we insert these programs directly into the agent's utility function. Subsequent reinforcement learning further integrates and refines the advice. We present empirical evidence that investigates several aspects of our approach and show that, given good advice, a learner can achieve statistically significant gains in expected reward. A second experiment shows that advice improves the expected reward regardless of the...
An Autonomous Spacecraft Agent Prototype
- Autonomous Robots
, 1997
"... This paper describes the New Millennium Remote Agent #NMRA# architecture for autonomous spacecraft control systems. This architecture integrates traditional real-time monitoring and control with constraintbased planning and scheduling, robust multi-threaded execution, and model-based diagnosis ..."
Abstract
-
Cited by 63 (18 self)
- Add to MetaCart
This paper describes the New Millennium Remote Agent #NMRA# architecture for autonomous spacecraft control systems. This architecture integrates traditional real-time monitoring and control with constraintbased planning and scheduling, robust multi-threaded execution, and model-based diagnosis and recon#guration.
Intelligence by Design: Principles of Modularity and Coordination for Engineering Complex Adaptive Agents
, 2001
"... All intelligence relies on search --- for example, the search for an intelligent agent's next action. Search is only likely to succeed in resource-bounded agents if they have already been biased towards finding the right answer. In artificial agents, the primary source of bias is engineering. This d ..."
Abstract
-
Cited by 62 (21 self)
- Add to MetaCart
All intelligence relies on search --- for example, the search for an intelligent agent's next action. Search is only likely to succeed in resource-bounded agents if they have already been biased towards finding the right answer. In artificial agents, the primary source of bias is engineering. This dissertation
When Push comes to Shove: A Computational Model of the Role of Motor Control in the Acquisition of Action Verbs
, 1997
"... Children learn a variety of verbs for hand actions starting in their second year of life. The semantic distinctions can be subtle, and they vary across languages, yet they are learned quickly. Howis this possible? This dissertation explores the hypothesis that to explain the acquisition and use of a ..."
Abstract
-
Cited by 57 (1 self)
- Add to MetaCart
Children learn a variety of verbs for hand actions starting in their second year of life. The semantic distinctions can be subtle, and they vary across languages, yet they are learned quickly. Howis this possible? This dissertation explores the hypothesis that to explain the acquisition and use of action verbs, motor control must be taken into account. It presents a model of embodied semantics|based on the principles of neural computation in general and on the human motor system in particular|which takes a set of labelled actions and learns both to label novel actions and to obey verbal commands. Akey feature of the model is the executing schema, anactivecontroller mechanism which, by actually driving behavior, allows the model to carry out verbal commands. A hard-wired mechanism links the activity of executing schemas to a set of linguistically important features including hand posture, joint motions, force, aspect and goals. The feature set is relatively small and is xed, helping to make learning tractable. Moreover, the use of traditional feature structures facilitates the use of model merging, a Bayesian probabilistic learning algorithm which rapidly learns plausible word meanings, automatically determines an appropriate number of senses for each verb, and can plausibly be mapped to a connectionist recruitment
Architectural Requirements for Human-like Agents Both Natural and Artificial. (What sorts of machines can love?)
"... This paper, an expanded version of a talk on love given to a literary society, attempts to analyse some of the architectural requirements for an agent which is capable of having primary, secondary and tertiary emotions, including being infatuated or in love. It elaborates on work done previously in ..."
Abstract
-
Cited by 56 (19 self)
- Add to MetaCart
This paper, an expanded version of a talk on love given to a literary society, attempts to analyse some of the architectural requirements for an agent which is capable of having primary, secondary and tertiary emotions, including being infatuated or in love. It elaborates on work done previously in the Birmingham Cognition and Affect group, describing our proposed three level architecture (with reactive, deliberative and metamanagement layers), showing how different sorts of emotions relate to those layers. Some of the relationships between emotional states involving partial loss of control of attention (e.g. emotional states involved in being in love) and other states which involve dispositions (e.g. attitudes such as loving) are discussed and related to the architecture. The work of poets and playwrights can be shown to involve an implicit commitment to the hypothesis that minds are (at least) information processing engines. Besides loving, many other familiar states and process...
Temporal Abstraction in Reinforcement Learning
, 2000
"... Decision making usually involves choosing among different courses of action over a broad range of time scales. For instance, a person planning a trip to a distant location makes high-level decisions regarding what means of transportation to use, but also chooses low-level actions, such as the moveme ..."
Abstract
-
Cited by 55 (2 self)
- Add to MetaCart
Decision making usually involves choosing among different courses of action over a broad range of time scales. For instance, a person planning a trip to a distant location makes high-level decisions regarding what means of transportation to use, but also chooses low-level actions, such as the movements for getting into a car. The problem of picking an appropriate time scale for reasoning and learning has been explored in artificial intelligence, control theory and robotics. In this dissertation we develop a framework that allows novel solutions to this problem, in the context of Markov Decision Processes (MDPs) and reinforcement learning. In this dissertation, we present a general framework for prediction, control and learning at multipl...

