@MISC{Auer95worst-caseanalysis, author = {Peter Auer and Nicolò Cesa-Bianchi}, title = {Worst-Case Analysis of the Bandit Problem}, year = {1995} }

Share

OpenURL

Abstract

The multi-armed bandit is a classical problem in the area of sequential decision theory and has been studied under a variety of statistical assumptions. In this work we investigate the bandit problem from a purely worst-case standpoint. We present a randomized algorithm with an expected total reward of G \Gamma O(G 4=5 K 6=5 ) (disregarding log factors), where K is the number of arms and G is the (unknown) total reward of the best arm. Our analysis holds with no assumptions whatsoever on the way rewards are generated, other than being independent of the algorithm's randomization. Our results can also be interpreted as a novel extension of the on-line prediction model, an intensively studied framework in learning theory. 1 Introduction In the K-armed bandit problem a player is repeatedly faced with a choice among K possible actions. At any discrete time t = 1; 2; : : : ; T each action i in the set f1; : : : ; Kg of allowed actions bears a potential reward x i;t unknown to the pla...