## Kernel-Based Reinforcement Learning in Average-Cost Problems: An Application to Optimal Portfolio Choice (2000)

Venue: | Advances in Neural Information Processing Systems |

Citations: | 15 - 2 self |

### BibTeX

@INPROCEEDINGS{Ormoneit00kernel-basedreinforcement,

author = {Dirk Ormoneit and Peter Glynn},

title = {Kernel-Based Reinforcement Learning in Average-Cost Problems: An Application to Optimal Portfolio Choice},

booktitle = {Advances in Neural Information Processing Systems},

year = {2000},

pages = {1068--1074}

}

### OpenURL

### Abstract

Many approaches to reinforcement learning combine neural networks or other parametric function approximators with a form of temporal-difference learning to estimate the value function of a Markov Decision Process. A significant disadvantage of those procedures is that the resulting learning algorithms are frequently unstable.

### Citations

9048 | Statistical Learning Theory - Vapnik - 1998 |

3951 | Classification and regression trees - Breiman, Olsen, et al. - 1984 |

1235 | Learning to Predict by the Method of Temporal Differences - Sutton - 1988 |

1003 | A Probabilistic Theory of Pattern Recognition - Devroye, Györfi, et al. - 1996 |

754 | Random Number Generation and Quasi-Monte Carlo Methods, 1st Ed - Niederreiter - 1992 |

342 | Dynamic Programming and
- Bertsekas
- 2000
(Show Context)
Citation Context ...lementation of this approach and we present some of our theoretical results. In particular, we consider the relative value iteration algorithm for average-cost MDPs that is described, for example, in =-=[Ber95]-=-. This procedure iterates a variant of equation (1) to generate a sequence of value function estimates, h k m , that eventually converge to a solution of (1) (or (3), respectively). An important pract... |

319 | Policy gradient methods for reinforcement learning with function approximation - Sutton, McAllester, et al. |

252 | A.: Generalization in reinforcement learning: Safely approximating the value function. Advances in neural information processing systems - Boyan, Moore - 1995 |

235 | Optimal global rates of convergence for nonparametric regression - Stone - 1982 |

218 | B.: An analysis of temporal-difference learning with function approximation - Tsitsiklis, Roy - 1997 |

215 | Smooth regression analysis - Watson - 1964 |

188 | Consistent nonparametric regression - Stone - 1977 |

177 | On actor-critic algorithms - Konda, Tsitsiklis - 2003 |

103 | Kernel-based reinforcement learning
- Ormoneit, Sen
- 2002
(Show Context)
Citation Context ...e much more widely applicable in practice. While our method addresses both discounted- and average-cost problems, we focus on average-costs here and refer the reader interested in discounted-costs to =-=[OS00]-=-. For brevity, we also defer technical details and proofs to an accompanying paper [OG00]. Note that averagecost reinforcement learning has been discussed by several authors (e.g. [TR99]). The remaind... |

86 | Using randomization to break the curse of dimensionality
- Rust
- 1997
(Show Context)
Citation Context ...licy search or perturbation methods can by construction at most be optimal in a local sense [SMSM00, VRK00]. Relevant earlier work on local averaging in the context of reinforcement learning includes =-=[Rus97]-=- and [Gor99]. While these papers pursue related ideas, their approaches differ fundamentally from ours in the assumption that the transition probabilities of the MDP are known and can be used for lear... |

75 | Discrete-time controlled markov processes with average cost criterion: a survey - Arapostathis, Borkar, et al. - 1993 |

66 | Approximate Solutions to Markov Decision Processes
- Gordon
- 1999
(Show Context)
Citation Context ...or perturbation methods can by construction at most be optimal in a local sense [SMSM00, VRK00]. Relevant earlier work on local averaging in the context of reinforcement learning includes [Rus97] and =-=[Gor99]-=-. While these papers pursue related ideas, their approaches differ fundamentally from ours in the assumption that the transition probabilities of the MDP are known and can be used for learning. By con... |

40 | Perturbation realization, potentials and sensitivity analysis of Markov processes - Cao, Chen - 1997 |

27 | Feature-based methods for large-scale dynamic programming - Tsitsiklis, Roy - 1996 |

25 | On the existence of fixed points for approximate value iteration and temporal-difference learning - Farias, Roy |

22 | The policy iteration algorithm for average reward Markov decision processes with general state space - Meyn |

19 | Hoeffding’s inequality for uniformly ergodic Markov chains. Statistics & probability letters - Glynn, Ormoneit |

19 | The uniform convergence of nearest neighbor regression function estimators and their application in optimization - Devroye - 1978 |

18 | Average cost temporal-difference learning
- Tsitsiklis, Roy
- 1999
(Show Context)
Citation Context ...nted-costs to [OS00]. For brevity, we also defer technical details and proofs to an accompanying paper [OG00]. Note that averagecost reinforcement learning has been discussed by several authors (e.g. =-=[TR99]-=-). The remainder of this work is organized as follows. In Section 2 be provide basic definitions and we describe the kernel-based reinforcement learning algorithm. Section 3 focuses on the practical i... |

17 | A Comparison of Policy Iteration Methods for Solving Continuous-State - Rust - 1997 |

8 | UOptid kernel shapes for local linear regression - Ormoneit, Hastie - 1999 |

3 | Limit theorems for semi-Markov processes - ATHREYA, NEY - 1978 |

3 | The strong uniform consistency of kernel density estimates - Devroye, Wagner - 1980 |

2 | On estimating regression," Theor - Nadaraya - 1964 |

1 | A splitting technique for Harris recurrent chains," Zeitschrift fur Wahrscheinlichkeitstheorie und verwandte Gebiete - Nummelin - 1978 |

1 | On the strong unversal consistency of nearest neighbor regression function estimates - Devroye, Gyorfi, et al. - 1385 |