## Reinforcement learning with a bilinear Q function

Citations: | 1 - 1 self |

### BibTeX

@MISC{Elkan_reinforcementlearning,

author = {Charles Elkan},

title = {Reinforcement learning with a bilinear Q function},

year = {}

}

### OpenURL

### Abstract

Abstract. Many reinforcement learning methods are based on a function Q(s, a) whose value is the discounted total reward expected after performing the action a in the state s. This paper explores the implications of representing the Q function as Q(s, a) = s T W a, where W is a matrix that is learned. In this representation, both s and a are real-valued vectors that may have high dimension. We show that action selection can be done using standard linear programming, and that W can be learned using standard linear regression in the algorithm known as fitted Q iteration. Experimentally, the resulting method learns to solve the mountain car task in a sample-efficient way. The same method is also applicable to an inventory management task where the state space and the action space are continuous and high-dimensional. 1