Between MDPs and semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales (1998)
Cached
Download Links
| Venue: | Journal of Artificial Intelligence Research |
| Citations: | 51 - 7 self |
BibTeX
@TECHREPORT{Sutton98betweenmdps,
author = {Richard S. Sutton and Doina Precup},
title = {Between MDPs and semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales},
institution = {Journal of Artificial Intelligence Research},
year = {1998}
}
Years of Citing Articles
OpenURL
Abstract
Learning, planning, and representing knowledge at multiple levels of temporal abstraction are key challenges for AI. In this paper we develop an approach to these problems based on the mathematical framework of reinforcement learning and Markov decision processes (MDPs). We extend the usual notion of action to include options—whole courses of behavior that may be temporally extended, stochastic, and contingent on events. Examples of options include picking up an object, going to lunch, and traveling to a distant city, as well as primitive actions such as muscle twitches and joint torques. Options may be given a priori, learned by experience, or both. They may be used interchangeably with actions in a variety of planning and learning methods. The theory of semi-Markov decision processes (SMDPs) can be applied to model the consequences of options and as a basis for planning and learning methods using them. In this paper we develop these connections, building on prior work by Bradtke and Duff (1995), Parr (in prep.) and others. Our main novel results concern the interface between the MDP and SMDP levels of analysis. We show how a set of options can be altered by changing only their termination conditions







