Reinforcement learning with a bilinear Q function
BibTeX
@MISC{Elkan_reinforcementlearning,
author = {Charles Elkan},
title = {Reinforcement learning with a bilinear Q function},
year = {}
}
OpenURL
Abstract
Abstract. Many reinforcement learning methods are based on a function Q(s, a) whose value is the discounted total reward expected after performing the action a in the state s. This paper explores the implications of representing the Q function as Q(s, a) = s T W a, where W is a matrix that is learned. In this representation, both s and a are real-valued vectors that may have high dimension. We show that action selection can be done using standard linear programming, and that W can be learned using standard linear regression in the algorithm known as fitted Q iteration. Experimentally, the resulting method learns to solve the mountain car task in a sample-efficient way. The same method is also applicable to an inventory management task where the state space and the action space are continuous and high-dimensional. 1







