MetaCart Sign in to MyCiteSeerX

Include Citations | Advanced Search | Help

Disambiguated Search | Include Citations | Advanced Search | Help

Learning to Trade via Direct Reinforcement (2001) [25 citations — 1 self]

by John Moody ,  Matthew Saffell
Add To MetaCart

Abstract:

We present methods for optimizing portfolios, asset allocations, and trading systems based on direct reinforcement (DR). In this approach, investment decision making is viewed as a stochastic control problem, and strategies are discovered directly. We present an adaptive algorithm called recurrent reinforcement learning (RRL) for discovering investment policies. The need to build forecasting models is eliminated, and better trading performance is obtained. The direct reinforcement approach differs from dynamic programming and reinforcement algorithms such as TD-learning and Q-learning, which attempt to estimate a value function for the control problem. We find that the RRL direct reinforcement framework enables a simpler problem representation, avoids Bellman's curse of dimensionality and offers compelling advantages in efficiency. We demonstrate how direct reinforcement can be used to optimize risk-adjusted investment returns (including the differential Sharpe ratio), while accounting for the effects of transaction costs. In extensive simulation work using real financial data, we find that our approach based on RRL produces better trading strategies than systems utilizing Q-Learning (a value function method). Real-world applications include an intra-daily currency trader and a monthly asset allocation system for the S&P 500 Stock Index and T-Bills.

Citations

2044 Learning internal representations by error propagation – Rumelhart, G, et al. - 1986
1933 Reinforcement Learning: An introduction – Sutton, Barto - 1998
1487 Dynamic programming – Bellman - 1957
941 Reinforcement learning: A survey – Kaelbling, Littman, et al. - 1996
931 Learning to predict by the methods of temporal differences – Sutton - 1988
487 Some studies in machine learning using the game of checkers II: Recent progress – Samuel - 1967
394 Neuronlike adaptive elements that can solve difficult learning control problems – Barto, Sutton, et al. - 1983
378 Adaptive Switching Circuits – Widrow, Hoff - 1960
339 Nonlinear Programming – Bertsekas - 2003
321 A learning Algorithm for Continually Running Fully Recurrent Neural Networks – Williams, Zipser - 1989
321 Identification and control of dynamical systems using neural networks – Narendra, Parthasarathy - 1990
245 Prioritized sweeping: Reinforcement learning with less data and less real time – Moore, Atkeson - 1993
223 Improving elevator performance using reinforcement learning – Crites, Barto - 1996
221 Option pricing: A simplified approach – Cox, Ross, et al. - 1979
208 Temporal Credit Assignment in Reinforcement Learning – Sutton - 1984
188 Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning – Williams - 1992
176 Portfolio selection: Efficient diversification of investments – Markowitz - 1970
176 Y.: Policy gradient methods for reinforcement learning with function approximation – Sutton, McAllester, et al. - 2000
172 Backpropagation through time: What it does and how to do it – Werbos - 1990
160 Learning with delayed rewards – Watkins - 1989
155 TD-Gammon, a self-teaching backgammon program, achieves master-level play – Tesauro - 1994
126 Lifetime portfolio selection under uncertainty: the continuous-time case The – Merton - 1969
109 Valuing American options by simulation: A simple least-squares approach – Longstaff, Schwartz - 2001
101 Gradient descent for general reinforcement learning – Baird, Moore - 1999
98 Continuous-Time Finance – Merton - 1990
92 Technical note: Q-learning – Watkins, Dayan - 1992
92 Actor-Critic Algorithms – Konda, Tsitsiklis - 2000
80 E cient learning and planning within the Dyna framework – Peng, Williams - 1993
54 Mutual Fund Performance – Sharpe - 1966
54 Simulation-Based Optimization of Markov Reward Processes – Marbach, Tsitsiklis - 1998
52 Strategic Asset Allocation – Brennan, Schwartz, et al. - 1997
47 Direct Gradient-Based Reinforcement Learning: II. Gradient Descent Algorithms and Experiments – Baxter, Weaver, et al. - 1999
40 Toward a theory of reinforcement-learning connectionist systems – Williams - 1988
40 T.: High-performance job-shop scheduling with a timedelay TD (λ) network – Zhang, Dietterich - 1996
33 Optimal Stopping of Markov Processes: Hilbert Space Theory, Approximation Algorithms and an Application to Pricing Financial Derivatives – Tsitsiklis, Roy - 1999
29 Security Markets, Stochastic Models – Duffie - 1988
25 private communication – unknown authors - 2000
16 Risk-sensitive reinforcement learning – Mihatsch, Neuneier - 2002
15 Simulation of self-organizing systems by digital computer – Farley, Clark - 1954
12 Performance functions and reinforcement learning for trading systems and portfolios – Moody, Wu, et al. - 1998
11 Localizing policy gradient estimates to action transitions – Grudic, Ungar - 2000
7 Optimal asset allocation using adaptive dynamic programming – Neuneier - 1996
7 On the use and misuse of downside risk – Sortino, Forsey - 1996
7 A brief history of downside risk measures – Nawrocki - 1999
6 Optimal algorithms and lower partial moment: Ex post results – Nawrocki - 1991
4 difference learning and TD-Gammon – “Temporal - 1995
3 Temporal-difference learning and applications in finance,” in Computational Finance – Roy - 1999
2 Reinforcement learning for trading – Moody, Saffell - 1999
2 consumption and portfolio rules in a continuous-time model – “Optimum - 1971
2 Dynamic programming applications in finance – Elton, Gruber - 1971