MetaCart Sign in to MyCiteSeerX

Include Citations | Advanced Search | Help

Disambiguated Search | Include Citations | Advanced Search | Help

A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes (2002) [102 citations — 7 self]

Abstract:

A critical issue for the application of Markov decision processes (MDPs) to realistic problems is how the complexity of planning scales with the size of the MDP. In stochastic environments with very large or innite state spaces, traditional planning and reinforcement learning algorithms may be inapplicable, since their running time typically grows linearly with the state space size in the worst case. In this paper we present a new algorithm that, given only a generative model (a natural and common type of simulator) for an arbitrary MDP, performs on-line, near-optimal planning with a per-state running time that has no dependence on the number of states. The running time is exponential in the horizon time (which depends only on the discount factor and the desired degree of approximation to the optimal policy). Our algorithm thus provides a dierent complexity trade-o than classical algorithms such as value iteration | rather than scaling linearly in both horizon time and state space size, our running time trades an exponential dependence on the former in exchange for no dependence on the latter. Our algorithm is based on the idea of sparse sampling. We prove that a randomly sampled look-ahead tree that covers only a vanishing fraction of the full look-ahead tree nevertheless suces to compute nearoptimal actions from any state of an MDP. Practical implementations of the algorithm are discussed, and we draw ties to our related recent results on nding a near-best strategy from a given class of strategies in very large partially observable MDPs [KMN00]. 1

Citations

2210 Artificial Intelligence: A Modern Approach – Russell, Norvig - 1995
2010 The Design and Analysis of Computer Algorithms – Aho, Hopcroft, et al. - 1974
454 Reinforcement Learning – Sutton, Barto - 1998
187 Tractable inference for complex stochastic processes – Boyen, Koller - 1998
76 Approximate planning in large pomdps via reusable trajectories – Kearns, Mansour, et al. - 2000
53 Solving very large weakly coupled markov decision processes – Meuleau, Boutilier, et al. - 1998
47 An Upper Bound on the Loss from Approximate Optimal Value Functions – Singh - 1994
28 Approximate Planning for Factored POMDPs using Belief State Simplification – McAllester, Singh - 1999
27 Finite-sample convergence rates for Q-learning and indirect algorithms – Kearns, Singh - 1999
11 Arti cial Intelligence: Amodern approach – Russell, Norvig - 1995
5 Applying online-search to reinforcement learning – Davies, Ng, et al. - 1998