Results 1 -
3 of
3
Characterization and computation of restless bandit marginal productivity indices. SMCtools ’07
- Proc. 2007 Workshop on Tools for Solving Structured Markov Chains
"... Appl. Probab. 25A, 287-298] yields a practical scheduling rule for the versatile yet intractable multi-armed restless bandit problem, involving the optimal dynamic priority allocation to multiple stochastic projects, modeled as restless bandits, i.e., binary-action (active/passive) (semi-) Markov de ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Appl. Probab. 25A, 287-298] yields a practical scheduling rule for the versatile yet intractable multi-armed restless bandit problem, involving the optimal dynamic priority allocation to multiple stochastic projects, modeled as restless bandits, i.e., binary-action (active/passive) (semi-) Markov decision processes. A growing body of evidence shows that such a rule is nearly optimal in a wide variety of applications, which raises the need to efficiently compute the Whittle index and more general marginal productivity index (MPI) extensions in large-scale models. For such a purpose, this paper extends to restless bandits the parametric linear programming (LP) approach deployed 3 in [J. Niño-Mora. A ( 2 / 3) n fast-pivoting algorithm for the Gittins index and optimal stopping of a Markov chain, INFORMS J. Comp., in press], which yielded a fast Gittins-index algorithm. Yet the extension is not straightforward, as the MPI is only defined for the limited range of socalled indexable bandits, which motivates the quest for methods to establish indexability. This paper furnishes algorithmic and analytical tools to realize the potential of MPI policies in largescale applications, presenting the following contributions: (i) a complete algorithmic
Numerical Analysis in the Twentieth Century
- in Numerical Analysis: Historical Developments in the 20th Century, C. Brezinski e L. Wuytack, Editors, North–Holland
, 2001
"... This paper attracted much attention while a similar result obtained by William Karush in his Master's Thesis in 1939 [154] under the supervision of Lawrence M. Graves at the University of Chicago and by Fritz John (1910-1995) in 1948 [147] were almost totally ignored (John's paper was even rejected) ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This paper attracted much attention while a similar result obtained by William Karush in his Master's Thesis in 1939 [154] under the supervision of Lawrence M. Graves at the University of Chicago and by Fritz John (1910-1995) in 1948 [147] were almost totally ignored (John's paper was even rejected)
Computing a Classic Index for Finite-Horizon Bandits INFORMS holds copyright to this article and distributed this copy as a courtesy to the author(s).
, 2011
"... This paper considers the efficient exact computation of the counterpart of the Gittins index for a finitehorizon discrete-state bandit, which measures for each initial state the average productivity, given by the maximum ratio of expected total discounted reward earned to expected total discounted t ..."
Abstract
- Add to MetaCart
This paper considers the efficient exact computation of the counterpart of the Gittins index for a finitehorizon discrete-state bandit, which measures for each initial state the average productivity, given by the maximum ratio of expected total discounted reward earned to expected total discounted time expended that can be achieved through a number of successive plays stopping by the given horizon. Besides characterizing optimal policies for the finite-horizon one-armed bandit problem, such an index provides a suboptimal heuristic index rule for the intractable finite-horizon multiarmed bandit problem, which represents the natural extension of the Gittins index rule (optimal in the infinite-horizon case). Although such a finite-horizon index was introduced in classic work in the 1950s, investigation of its efficient exact computation has received scant attention. This paper introduces a recursive adaptive-greedy algorithm using only arithmetic operations that computes the index in (pseudo-)polynomial time in the problem parameters (number of project states and time horizon length). In the special case of a project with limited transitions per state, the complexity is either reduced or depends only on the length of the time horizon. The proposed algorithm is benchmarked in a computational study against the conventional calibration method. Key words: dynamic programming, Markov; bandits, finite-horizon; index policies; analysis of algorithms; computational complexity

