Results 1  10
of
41,531
Technion
"... A simple multiarmed bandit algorithm with optimal variationbounded regret ..."
Abstract
 Add to MetaCart
A simple multiarmed bandit algorithm with optimal variationbounded regret
MULTIARMED BANDIT PROBLEMS
"... Multiarmed bandit (MAB) problems are a class of sequential resource allocation problems concerned with allocating one or more resources among several alternative (competing) projects. Such problems are paradigms of a fundamental conflict between making decisions (allocating resources) that yield ..."
Abstract

Cited by 17 (0 self)
 Add to MetaCart
Multiarmed bandit (MAB) problems are a class of sequential resource allocation problems concerned with allocating one or more resources among several alternative (competing) projects. Such problems are paradigms of a fundamental conflict between making decisions (allocating resources) that yield
The EpochGreedy Algorithm for Contextual Multiarmed Bandits
"... We present EpochGreedy, an algorithm for contextual multiarmed bandits (also known as bandits with side information). EpochGreedy has the following properties: 1. No knowledge of a time horizon T is necessary. 2. The regret incurred by EpochGreedy is controlled by a sample complexity bound for a ..."
Abstract

Cited by 78 (9 self)
 Add to MetaCart
We present EpochGreedy, an algorithm for contextual multiarmed bandits (also known as bandits with side information). EpochGreedy has the following properties: 1. No knowledge of a time horizon T is necessary. 2. The regret incurred by EpochGreedy is controlled by a sample complexity bound
Algorithms for the multiarmed bandit problem
 Journal of Machine Learning
, 2010
"... The stochastic multiarmed bandit problem is an important model for studying the explorationexploitation tradeoff in reinforcement learning. Although many algorithms for the problem are wellunderstood theoretically, empirical confirmation of their effectiveness is generally scarce. This paper pres ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
presents a thorough empirical study of the most popular multiarmed bandit algorithms. Three important observations can be made from our results. Firstly, simple heuristics such as greedy and Boltzmann exploration outperform theoretically sound algorithms on most settings by a significant margin. Secondly
Pure exploration in multiarmed bandits problems
 IN PROCEEDINGS OF THE TWENTIETH INTERNATIONAL CONFERENCE ON ALGORITHMIC LEARNING THEORY (ALT 2009
, 2009
"... We consider the framework of stochastic multiarmed bandit problems and study the possibilities and limitations of strategies that explore sequentially the arms. The strategies are assessed not in terms of their cumulative regrets, as is usually the case, but through quantities referred to as simpl ..."
Abstract

Cited by 79 (16 self)
 Add to MetaCart
to as simple regrets. The latter are related to the (expected) gains of the decisions that the strategies would recommend for a new oneshot instance of the same multiarmed bandit problem. Here, exploration is only constrained by the number of available rounds (not necessarily known in advance), in contrast
Finitetime analysis of the multiarmed bandit problem
 Machine Learning
, 2002
"... Abstract. Reinforcement learning policies face the exploration versus exploitation dilemma, i.e. the search for a balance between exploring the environment to find profitable actions while taking the empirically best action as often as possible. A popular measure of a policy’s success in addressing ..."
Abstract

Cited by 804 (15 self)
 Add to MetaCart
this dilemma is the regret, that is the loss due to the fact that the globally optimal policy is not followed all the times. One of the simplest examples of the exploration/exploitation dilemma is the multiarmed bandit problem. Lai and Robbins were the first ones to show that the regret for this problem has
Contextual MultiArmed Bandits
"... We study contextual multiarmed bandit problems where the context comes from a metric space and the payoff satisfies a Lipschitz condition with respect to the metric. Abstractly, a contextual multiarmed bandit problem models a situation where, in a sequence of independent trials, an online algorith ..."
Abstract

Cited by 15 (0 self)
 Add to MetaCart
(CTR) of the ads displayed. We cast this problem as a contextual multiarmed bandit problem where queries and ads form metric spaces and the payoff function is Lipschitz with respect to both the metrics. For any ɛ> 0 we present an algorithm with regret O(T a+b+1 a+b+2 +ɛ) where a, b are the covering
Mortal MultiArmed Bandits
"... We formulate and study a new variant of the karmed bandit problem, motivated by ecommerce applications. In our model, arms have (stochastic) lifetime after which they expire. In this setting an algorithm needs to continuously explore new arms, in contrast to the standard karmed bandit model in wh ..."
Abstract

Cited by 15 (0 self)
 Add to MetaCart
function) case. Empirical studies on various reward distributions, including one derived from a realworld ad serving application, show that the proposed algorithms significantly outperform the standard multiarmed bandit approaches applied to these settings. 1
Multiple Identifications in MultiArmed Bandits
"... We study the problem of identifying the top m arms in a multiarmed bandit game. Our proposed solution relies on a new algorithm based on successive rejects of the seemingly bad arms, and successive accepts of the good ones. This algorithmic contribution allows to tackle other multiple identificatio ..."
Abstract

Cited by 13 (0 self)
 Add to MetaCart
We study the problem of identifying the top m arms in a multiarmed bandit game. Our proposed solution relies on a new algorithm based on successive rejects of the seemingly bad arms, and successive accepts of the good ones. This algorithmic contribution allows to tackle other multiple
Multiarmed Bandit Problems with Dependent Arms
 Proceedings of the 24th International Conference on Machine Learning
, 2007
"... We provide a framework to exploit dependencies among arms in multiarmed bandit problems, when the dependencies are in the form of a generative model on clusters of arms. We find an optimal MDPbased policy for the discounted reward case, and also give an approximation of it with formal error guaran ..."
Abstract

Cited by 40 (1 self)
 Add to MetaCart
We provide a framework to exploit dependencies among arms in multiarmed bandit problems, when the dependencies are in the form of a generative model on clusters of arms. We find an optimal MDPbased policy for the discounted reward case, and also give an approximation of it with formal error
Results 1  10
of
41,531