On Planning And Exploration In Non-Discrete Environments (1991)
| Venue: | Gesellschaft fur Mathematik und Datenverarbeitung, D-5205 St |
| Citations: | 10 - 5 self |
BibTeX
@TECHREPORT{Thrun91onplanning,
author = {Sebastian B. Thrun and Knut Möller},
title = {On Planning And Exploration In Non-Discrete Environments},
institution = {Gesellschaft fur Mathematik und Datenverarbeitung, D-5205 St},
year = {1991}
}
Years of Citing Articles
OpenURL
Abstract
The application of reinforcement learning to control problems has received considerable attention in the last few years [And86, Bar89, Sut84]. In general there are two principles to solve reinforcement learning problems: direct and indirect techniques, both having their advantages and disadvantages. We present a system that combines both methods [TML91, TML90]. By interaction with an unknown environment a world model is progressively constructed using the backpropagation algorithm. For optimizing actions with respect to future reinforcement planning is applied in two steps: An experience network proposes a plan which is subsequently optimized by gradient descent with a chain of model networks. While operating in a goal-oriented manner due to the planning process the experience network is trained. Its accumulating experience is fed back into the planning process in form of initial plans, such that planning can be gradually reduced. In order to ensure complete system identif...







