## Learning and Value Function Approximation in Complex Decision Processes (1998)

Citations: | 36 - 4 self |

### BibTeX

@MISC{Roy98learningand,

author = {Benjamin Van Roy},

title = {Learning and Value Function Approximation in Complex Decision Processes},

year = {1998}

}

### Years of Citing Articles

### OpenURL

### Abstract

In principle, a wide variety of sequential decision problems -- ranging from dynamic resource allocation in telecommunication networks to financial risk management -- can be formulated in terms of stochastic control and solved by the algorithms of dynamic programming. Such algorithms compute and store a value function, which evaluates expected future reward as a function of current state. Unfortunately, exact computation of the value function typically requires time and storage that grow proportionately with the number of states, and consequently, the enormous state spaces that arise in practical applications render the algorithms intractable. In this thesis, we study tractable methods that approximate the value function. Our work builds on research in an area of artificial intelligence known as reinforcement learning. A point of focus of this thesis is temporal-difference learning -- a stochastic algorithm inspired to some extent by phenomena observed in animal behavior. Given a selection of...