Results 1 
4 of
4
PACBayesian Analysis of Contextual Bandits
"... We derive an instantaneous (perround) datadependent regret bound for stochastic multiarmed bandits with side information (also known as contextual bandits). pThe scaling of our regret bound with the number of states (contexts) N goes as NI⇢t (S; A), where I⇢t (S; A) is the mutual information betwe ..."
Abstract

Cited by 6 (3 self)
 Add to MetaCart
We derive an instantaneous (perround) datadependent regret bound for stochastic multiarmed bandits with side information (also known as contextual bandits). pThe scaling of our regret bound with the number of states (contexts) N goes as NI⇢t (S; A), where I⇢t (S; A) is the mutual information between states and actions (the side information) used by the algorithm at round t. If the algorithm uses all the side information, the regret bound scales as p N ln K, where K is the number of actions (arms). However, if the side information I⇢t (S; A) is not fully used, the regret bound is significantly tighter. In the extreme case, when I⇢t (S; A) =0, the dependence on the number of states reduces from linear to logarithmic. Our analysis allows to provide the algorithm large amount of side information, let the algorithm to decide which side information is relevant for the task, and penalize the algorithm only for the side information that it is using de facto. We also present an algorithm for multiarmed bandits with side information with O(K) computational complexity per game round. 1
Evaluation and Analysis of the Performance of the EXP3 Algorithm in Stochastic Environments
"... EXP3 is a popular algorithm for adversarial multiarmed bandits, suggested and analyzed in this setting by Auer et al. [2002b]. Recently there was an increased interest in the performance of this algorithm in the stochastic setting, due to its new applications to stochastic multiarmed bandits with si ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
EXP3 is a popular algorithm for adversarial multiarmed bandits, suggested and analyzed in this setting by Auer et al. [2002b]. Recently there was an increased interest in the performance of this algorithm in the stochastic setting, due to its new applications to stochastic multiarmed bandits with side information [Seldin et al., 2011] and to multiarmed bandits in the mixed stochasticadversarial setting [Bubeck and Slivkins, 2012]. We present an empirical evaluation and improved analysis of the performance of the EXP3 algorithm in the stochastic setting, as well as a modification of the EXP3 algorithm capable of achieving “logarithmic ” regret in stochastic environments. 1.
Explore no more: Improved highprobability regret bounds for nonstochastic bandits
"... Abstract This work addresses the problem of regret minimization in nonstochastic multiarmed bandit problems, focusing on performance guarantees that hold with high probability. Such results are rather scarce in the literature since proving them requires a large deal of technical effort and signifi ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Abstract This work addresses the problem of regret minimization in nonstochastic multiarmed bandit problems, focusing on performance guarantees that hold with high probability. Such results are rather scarce in the literature since proving them requires a large deal of technical effort and significant modifications to the standard, more intuitive algorithms that come only with guarantees that hold on expectation. One of these modifications is forcing the learner to sample arms from the uniform distribution at least Ω( √ T ) times over T rounds, which can adversely affect performance if many of the arms are suboptimal. While it is widely conjectured that this property is essential for proving highprobability regret bounds, we show in this paper that it is possible to achieve such strong results without this undesirable exploration component. Our result relies on a simple and intuitive lossestimation strategy called Implicit eXploration (IX) that allows a remarkably clean analysis. To demonstrate the flexibility of our technique, we derive several improved highprobability bounds for various extensions of the standard multiarmed bandit framework. Finally, we conduct a simple experiment that illustrates the robustness of our implicit exploration technique.
Explore no more: Improved highprobability regret bounds for nonstochastic bandits Gergely Neu∗ SequeL team
"... This work addresses the problem of regret minimization in nonstochastic multiarmed bandit problems, focusing on performance guarantees that hold with high probability. Such results are rather scarce in the literature since proving them requires a large deal of technical effort and significant mod ..."
Abstract
 Add to MetaCart
This work addresses the problem of regret minimization in nonstochastic multiarmed bandit problems, focusing on performance guarantees that hold with high probability. Such results are rather scarce in the literature since proving them requires a large deal of technical effort and significant modifications to the standard, more intuitive algorithms that come only with guarantees that hold on expectation. One of these modifications is forcing the learner to sample arms from the uniform distribution at least Ω( T) times over T rounds, which can adversely affect performance if many of the arms are suboptimal. While it is widely conjectured that this property is essential for proving highprobability regret bounds, we show in this paper that it is possible to achieve such strong results without this undesirable exploration component. Our result relies on a simple and intuitive lossestimation strategy called Implicit eXploration (IX) that allows a remarkably clean analysis. To demonstrate the flexibility of our technique, we derive several improved highprobability bounds for various extensions of the standard multiarmed bandit framework. Finally, we conduct a simple experiment that illustrates the robustness of our implicit exploration technique. 1