Results 1  10
of
2,095
PAC bounds for multiarmed bandit and Markov decision processes
 In Fifteenth Annual Conference on Computational Learning Theory (COLT
, 2002
"... Abstract. The bandit problem is revisited and considered under the PAC model. Our main contribution in this part is to show that given n arms, it suffices to pull the arms O ( n ɛ2 log 1) times to find an ɛoptimal δ arm with probability of at least 1 − δ. This is in contrast to the naive bound of O ..."
Abstract

Cited by 61 (2 self)
 Add to MetaCart
algorithm for Markov Decision Processes. This is done essentially by simulating Value Iteration, and in each iteration invoking the multiarmed bandit algorithm. Using our PAC algorithm for the multiarmed bandit problem we improve the dependence on the number of actions. 1
Pure exploration in multiarmed bandits problems
 IN PROCEEDINGS OF THE TWENTIETH INTERNATIONAL CONFERENCE ON ALGORITHMIC LEARNING THEORY (ALT 2009
, 2009
"... We consider the framework of stochastic multiarmed bandit problems and study the possibilities and limitations of strategies that explore sequentially the arms. The strategies are assessed not in terms of their cumulative regrets, as is usually the case, but through quantities referred to as simpl ..."
Abstract

Cited by 79 (16 self)
 Add to MetaCart
to as simple regrets. The latter are related to the (expected) gains of the decisions that the strategies would recommend for a new oneshot instance of the same multiarmed bandit problem. Here, exploration is only constrained by the number of available rounds (not necessarily known in advance), in contrast
MULTIARMED BANDIT PROBLEMS
"... Multiarmed bandit (MAB) problems are a class of sequential resource allocation problems concerned with allocating one or more resources among several alternative (competing) projects. Such problems are paradigms of a fundamental conflict between making decisions (allocating resources) that yield ..."
Abstract

Cited by 17 (0 self)
 Add to MetaCart
Multiarmed bandit (MAB) problems are a class of sequential resource allocation problems concerned with allocating one or more resources among several alternative (competing) projects. Such problems are paradigms of a fundamental conflict between making decisions (allocating resources) that yield
The MultiArmed Bandit, with Constraints
, 2011
"... The early sections of this paper present an analysis of a Markov decision model that is known as the multiarmed bandit under the assumption that the utility function of the decision maker is either linear or exponential. The analysis includes efficient procedures for computing the expected utility ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
The early sections of this paper present an analysis of a Markov decision model that is known as the multiarmed bandit under the assumption that the utility function of the decision maker is either linear or exponential. The analysis includes efficient procedures for computing the expected utility
Action Elimination and Stopping Conditions for the MultiArmed Bandit and . . .
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2006
"... We incorporate statistical confidence intervals in both the multiarmed bandit and the reinforcement learning problems. In the bandit problem we show that given n arms, it suffices to pull the arms a total of O ) log(1/d) times to find an eoptimal arm with probability of at least 1d. Thi ..."
Abstract

Cited by 82 (5 self)
 Add to MetaCart
We incorporate statistical confidence intervals in both the multiarmed bandit and the reinforcement learning problems. In the bandit problem we show that given n arms, it suffices to pull the arms a total of O ) log(1/d) times to find an eoptimal arm with probability of at least 1d
The Sample Complexity of Exploration in the MultiArmed Bandit Problem
 Journal of Machine Learning Research
, 2004
"... We consider the multiarmed bandit problem under the PAC ("probably approximately correct") model. It was shown by EvenDar et al. (2002) that given n arms, a total of O trials suffices in order to find an eoptimal arm with probability at least 1 d. We establish a matching low ..."
Abstract

Cited by 66 (3 self)
 Add to MetaCart
We consider the multiarmed bandit problem under the PAC ("probably approximately correct") model. It was shown by EvenDar et al. (2002) that given n arms, a total of O trials suffices in order to find an eoptimal arm with probability at least 1 d. We establish a matching
Multiarmed bandit algorithms and empirical evaluation
 In European Conference on Machine Learning
, 2005
"... Abstract. The multiarmed bandit problem for a gambler is to decide which arm of a Kslot machine to pull to maximize his total reward in a series of trials. Many realworld learning and optimization problems can be modeled in this way. Several strategies or algorithms have been proposed as a soluti ..."
Abstract

Cited by 63 (0 self)
 Add to MetaCart
Abstract. The multiarmed bandit problem for a gambler is to decide which arm of a Kslot machine to pull to maximize his total reward in a series of trials. Many realworld learning and optimization problems can be modeled in this way. Several strategies or algorithms have been proposed as a
Results 1  10
of
2,095