Results 1  10
of
110,066
Combinatorial Pure Exploration of MultiArmed Bandits
"... We study the combinatorial pure exploration (CPE) problem in the stochastic multiarmed bandit setting, where a learner explores a set of arms with the objective of identifying the optimal member of a decision class, which is a collection of subsets of arms with certain combinatorial structures such ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
We study the combinatorial pure exploration (CPE) problem in the stochastic multiarmed bandit setting, where a learner explores a set of arms with the objective of identifying the optimal member of a decision class, which is a collection of subsets of arms with certain combinatorial structures
Pure exploration in multiarmed bandits problems
 IN PROCEEDINGS OF THE TWENTIETH INTERNATIONAL CONFERENCE ON ALGORITHMIC LEARNING THEORY (ALT 2009
, 2009
"... We consider the framework of stochastic multiarmed bandit problems and study the possibilities and limitations of strategies that explore sequentially the arms. The strategies are assessed not in terms of their cumulative regrets, as is usually the case, but through quantities referred to as simpl ..."
Abstract

Cited by 79 (16 self)
 Add to MetaCart
We consider the framework of stochastic multiarmed bandit problems and study the possibilities and limitations of strategies that explore sequentially the arms. The strategies are assessed not in terms of their cumulative regrets, as is usually the case, but through quantities referred
The Nonstochastic Multiarmed Bandit Problem
 SIAM JOURNAL OF COMPUTING
, 2002
"... In the multiarmed bandit problem, a gambler must decide which arm of K nonidentical slot machines to play in a sequence of trials so as to maximize his reward. This classical problem has received much attention because of the simple model it provides of the tradeoff between exploration (trying out ..."
Abstract

Cited by 492 (34 self)
 Add to MetaCart
In the multiarmed bandit problem, a gambler must decide which arm of K nonidentical slot machines to play in a sequence of trials so as to maximize his reward. This classical problem has received much attention because of the simple model it provides of the tradeoff between exploration (trying
Finitetime analysis of the multiarmed bandit problem
 Machine Learning
, 2002
"... Abstract. Reinforcement learning policies face the exploration versus exploitation dilemma, i.e. the search for a balance between exploring the environment to find profitable actions while taking the empirically best action as often as possible. A popular measure of a policy’s success in addressing ..."
Abstract

Cited by 804 (15 self)
 Add to MetaCart
this dilemma is the regret, that is the loss due to the fact that the globally optimal policy is not followed all the times. One of the simplest examples of the exploration/exploitation dilemma is the multiarmed bandit problem. Lai and Robbins were the first ones to show that the regret for this problem has
Approximation Algorithms for Bayesian MultiArmed Bandit Problems∗
, 2014
"... In this paper, we consider several finitehorizon Bayesian multiarmed bandit problems with side constraints. These constraints include metric switching costs between arms, delayed feedback about observations, concave reward functions over plays, and explorethenexploit models. These problems do n ..."
Abstract
 Add to MetaCart
In this paper, we consider several finitehorizon Bayesian multiarmed bandit problems with side constraints. These constraints include metric switching costs between arms, delayed feedback about observations, concave reward functions over plays, and explorethenexploit models. These problems do
The budgeted multiarmed bandit problem
 2972 KNOWLEDGE GRADIENT FOR SEQUENTIAL SAMPLING Omid
, 2004
"... The following coins problem is a version of a multiarmed bandit problem where one has to select from among a set of objects, say classifiers, after an experimentation phase that is constrained by a time or cost budget. The question is how to spend the budget. The problem involves pure exploration o ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
The following coins problem is a version of a multiarmed bandit problem where one has to select from among a set of objects, say classifiers, after an experimentation phase that is constrained by a time or cost budget. The question is how to spend the budget. The problem involves pure exploration
Bandit based MonteCarlo Planning
 In: ECML06. Number 4212 in LNCS
, 2006
"... Abstract. For large statespace Markovian Decision Problems MonteCarlo planning is one of the few viable approaches to find nearoptimal solutions. In this paper we introduce a new algorithm, UCT, that applies bandit ideas to guide MonteCarlo planning. In finitehorizon or discounted MDPs the algo ..."
Abstract

Cited by 433 (7 self)
 Add to MetaCart
Abstract. For large statespace Markovian Decision Problems MonteCarlo planning is one of the few viable approaches to find nearoptimal solutions. In this paper we introduce a new algorithm, UCT, that applies bandit ideas to guide MonteCarlo planning. In finitehorizon or discounted MDPs
On the Combinatorial MultiArmed Bandit Problem with Markovian Rewards
, 2011
"... Abstract—We consider a combinatorial generalization of the classical multiarmed bandit problem that is defined as follows. There is a given bipartite graph of M users and N ≥ M resources. For each userresource pair (i, j), there is an associated state that evolves as an aperiodic irreducible finit ..."
Abstract

Cited by 5 (4 self)
 Add to MetaCart
Abstract—We consider a combinatorial generalization of the classical multiarmed bandit problem that is defined as follows. There is a given bipartite graph of M users and N ≥ M resources. For each userresource pair (i, j), there is an associated state that evolves as an aperiodic irreducible
Results 1  10
of
110,066