Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning (1999)
Cached
Download Links
- [www.cs.colorado.edu]
- [ftp.cs.umass.edu]
- [www.cs.ualberta.ca]
- DBLP
Other Repositories/Bibliography
| Venue: | Artificial Intelligence |
| Citations: | 342 - 22 self |
BibTeX
@ARTICLE{Sutton99betweenmdps,
author = {Richard Sutton and Doina Precup and Satinder Singh},
title = {Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning},
journal = {Artificial Intelligence},
year = {1999},
volume = {112},
pages = {181--211}
}
Years of Citing Articles
OpenURL
Abstract
Learning, planning, and representing knowledge at multiple levels of temporal abstraction are key, longstanding challenges for AI. In this paper we consider how these challenges can be addressed within the mathematical framework of reinforcement learning and Markov decision processes (MDPs). We extend the usual notion of action in this framework to include options---closed-loop policies for taking action over a period of time. Examples of options include picking up an object, going to lunch, and traveling to a distant city, as well as primitive actions such as muscle twitches and joint torques. Overall, we show that options enable temporally abstract knowledge and action to be included in the reinforcement learning framework in a natural and general way. In particular, we show that options may be used interchangeably with primitive actions in planning methods such as dynamic programming and in learning methods such as Q-learning.







