Results 1  10
of
12,370
MULTIARMED BANDIT PROBLEMS
"... Multiarmed bandit (MAB) problems are a class of sequential resource allocation problems concerned with allocating one or more resources among several alternative (competing) projects. Such problems are paradigms of a fundamental conflict between making decisions (allocating resources) that yield ..."
Abstract

Cited by 17 (0 self)
 Add to MetaCart
Multiarmed bandit (MAB) problems are a class of sequential resource allocation problems concerned with allocating one or more resources among several alternative (competing) projects. Such problems are paradigms of a fundamental conflict between making decisions (allocating resources) that yield
The MultiArmed Bandit, with Constraints
, 2011
"... The early sections of this paper present an analysis of a Markov decision model that is known as the multiarmed bandit under the assumption that the utility function of the decision maker is either linear or exponential. The analysis includes efficient procedures for computing the expected utility ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
The early sections of this paper present an analysis of a Markov decision model that is known as the multiarmed bandit under the assumption that the utility function of the decision maker is either linear or exponential. The analysis includes efficient procedures for computing the expected utility
Gambling in a rigged casino: The adversarial multiarmed bandit problem
, 1995
"... In the multiarmed bandit problem, a gambler must decide which arm of K nonidentical slot machines to play in a sequence of trials so as to maximize his reward. This classical problem has received much attention because of the simple model it provides of the tradeoff between exploration (trying ou ..."
Abstract

Cited by 244 (7 self)
 Add to MetaCart
In the multiarmed bandit problem, a gambler must decide which arm of K nonidentical slot machines to play in a sequence of trials so as to maximize his reward. This classical problem has received much attention because of the simple model it provides of the tradeoff between exploration (trying
The Nonstochastic Multiarmed Bandit Problem
 SIAM JOURNAL OF COMPUTING
, 2002
"... In the multiarmed bandit problem, a gambler must decide which arm of K nonidentical slot machines to play in a sequence of trials so as to maximize his reward. This classical problem has received much attention because of the simple model it provides of the tradeoff between exploration (trying out ..."
Abstract

Cited by 492 (34 self)
 Add to MetaCart
In the multiarmed bandit problem, a gambler must decide which arm of K nonidentical slot machines to play in a sequence of trials so as to maximize his reward. This classical problem has received much attention because of the simple model it provides of the tradeoff between exploration (trying
On the Combinatorial MultiArmed Bandit Problem with Markovian Rewards
, 2011
"... Abstract—We consider a combinatorial generalization of the classical multiarmed bandit problem that is defined as follows. There is a given bipartite graph of M users and N ≥ M resources. For each userresource pair (i, j), there is an associated state that evolves as an aperiodic irreducible finit ..."
Abstract

Cited by 5 (4 self)
 Add to MetaCart
Abstract—We consider a combinatorial generalization of the classical multiarmed bandit problem that is defined as follows. There is a given bipartite graph of M users and N ≥ M resources. For each userresource pair (i, j), there is an associated state that evolves as an aperiodic irreducible
Finitetime analysis of the multiarmed bandit problem
 Machine Learning
, 2002
"... Abstract. Reinforcement learning policies face the exploration versus exploitation dilemma, i.e. the search for a balance between exploring the environment to find profitable actions while taking the empirically best action as often as possible. A popular measure of a policy’s success in addressing ..."
Abstract

Cited by 804 (15 self)
 Add to MetaCart
this dilemma is the regret, that is the loss due to the fact that the globally optimal policy is not followed all the times. One of the simplest examples of the exploration/exploitation dilemma is the multiarmed bandit problem. Lai and Robbins were the first ones to show that the regret for this problem has
in Generalized Gaussian Multiarmed Bandits
, 2013
"... All intext references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately. ..."
Abstract
 Add to MetaCart
All intext references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately.
Action Elimination and Stopping Conditions for the MultiArmed Bandit and . . .
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2006
"... We incorporate statistical confidence intervals in both the multiarmed bandit and the reinforcement learning problems. In the bandit problem we show that given n arms, it suffices to pull the arms a total of O ) log(1/d) times to find an eoptimal arm with probability of at least 1d. Thi ..."
Abstract

Cited by 82 (5 self)
 Add to MetaCart
We incorporate statistical confidence intervals in both the multiarmed bandit and the reinforcement learning problems. In the bandit problem we show that given n arms, it suffices to pull the arms a total of O ) log(1/d) times to find an eoptimal arm with probability of at least 1d
Stochastic MultiArmedBandit Problem with Nonstationary Rewards
"... In a multiarmed bandit (MAB) problem a gambler needs to choose at each round of play one of K arms, each characterized by an unknown reward distribution. Reward realizations are only observed when an arm is selected, and the gambler’s objective is to maximize his cumulative expected earnings over s ..."
Abstract
 Add to MetaCart
In a multiarmed bandit (MAB) problem a gambler needs to choose at each round of play one of K arms, each characterized by an unknown reward distribution. Reward realizations are only observed when an arm is selected, and the gambler’s objective is to maximize his cumulative expected earnings over
Results 1  10
of
12,370