Results 1  10
of
9,418
The Karmed Dueling Bandits Problem
, 2009
"... We study a partialinformation onlinelearning problem where actions are restricted to noisy comparisons between pairs of strategies (also known as bandits). In contrast to conventional approaches that require the absolute reward of the chosen strategy to be quantifiable and observable, our setting ..."
Abstract

Cited by 29 (7 self)
 Add to MetaCart
We study a partialinformation onlinelearning problem where actions are restricted to noisy comparisons between pairs of strategies (also known as bandits). In contrast to conventional approaches that require the absolute reward of the chosen strategy to be quantifiable and observable, our setting
Relative Upper Confidence Bound for the KArmed Dueling Bandit Problem
"... This paper proposes a new method for the Karmed dueling bandit problem, a variation on the regularKarmed bandit problem that offers only relative feedback about pairs of arms. Our approach extends the Upper Confidence Bound algorithm to the relative setting by using estimates of the pairwise pro ..."
Abstract
 Add to MetaCart
This paper proposes a new method for the Karmed dueling bandit problem, a variation on the regularKarmed bandit problem that offers only relative feedback about pairs of arms. Our approach extends the Upper Confidence Bound algorithm to the relative setting by using estimates of the pairwise
Relative upper confidence bound for the karmed dueling bandit problem. arXiv preprint arXiv:1312.3393
, 2013
"... This paper proposes a new method for the Karmed dueling bandit problem, a variation on the regular Karmed bandit problem that offers only relative feedback about pairs of arms. Our approach extends the Upper Confidence Bound algorithm to the relative setting by using estimates of the pairwise pr ..."
Abstract

Cited by 8 (4 self)
 Add to MetaCart
This paper proposes a new method for the Karmed dueling bandit problem, a variation on the regular Karmed bandit problem that offers only relative feedback about pairs of arms. Our approach extends the Upper Confidence Bound algorithm to the relative setting by using estimates of the pairwise
A Relative Exponential Weighing Algorithm for Adversarial Utilitybased Dueling Bandits
"... We study the Karmed dueling bandit problem which is a variation of the classical MultiArmed Bandit (MAB) problem in which the learner receives only relative feedback about the selected pairs of arms. We propose an efficient algorithm called Relative Exponentialweight algorithm for Exploration an ..."
Abstract
 Add to MetaCart
We study the Karmed dueling bandit problem which is a variation of the classical MultiArmed Bandit (MAB) problem in which the learner receives only relative feedback about the selected pairs of arms. We propose an efficient algorithm called Relative Exponentialweight algorithm for Exploration
Relative confidence sampling for efficient online ranker evaluation
 In WSDM ’14
, 2014
"... A key challenge in information retrieval is that of online ranker evaluation: determining which one of a finite set of rankers performs the best in expectation on the basis of user clicks on presented document lists. When the presented lists are constructed using interleaved comparison methods, whi ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
, which interleave lists proposed by two different candidate rankers, then the problem of minimizing the total regret accumulated while evaluating the rankers can be formalized as a Karmed dueling bandits problem. In this paper, we propose a new method called relative confidence sampling (RCS) that aims
MergeRUCB: A Method for LargeScale Online Ranker Evaluation
"... A key challenge in information retrieval is that of online ranker evaluation: determining which one of a finite set of rankers performs the best in expectation on the basis of user clicks on presented document lists. When the presented lists are constructed using interleaved comparison methods, w ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
, which interleave lists proposed by two different candidate rankers, then the problem of minimizing the total regret accumulated while evaluating the rankers can be formalized as a Karmed dueling bandit problem. In the setting of web search, the number of rankers under consideration may be large
The Nonstochastic Multiarmed Bandit Problem
 SIAM JOURNAL OF COMPUTING
, 2002
"... In the multiarmed bandit problem, a gambler must decide which arm of K nonidentical slot machines to play in a sequence of trials so as to maximize his reward. This classical problem has received much attention because of the simple model it provides of the tradeoff between exploration (trying out ..."
Abstract

Cited by 492 (34 self)
 Add to MetaCart
In the multiarmed bandit problem, a gambler must decide which arm of K nonidentical slot machines to play in a sequence of trials so as to maximize his reward. This classical problem has received much attention because of the simple model it provides of the tradeoff between exploration (trying
Generic Exploration and Karmed Voting Bandits
"... We study a stochastic online learning scheme with partial feedback where the utility of decisions is only observable through an estimation of the environment parameters. We propose a generic pureexploration algorithm, able to cope with various utility functions from multiarmed bandits settings ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
We study a stochastic online learning scheme with partial feedback where the utility of decisions is only observable through an estimation of the environment parameters. We propose a generic pureexploration algorithm, able to cope with various utility functions from multiarmed bandits settings
Copeland Dueling Bandits
"... A version of the dueling bandit problem is addressed in which a Condorcet winner may not exist. Two algorithms are proposed that instead seek to minimize regret with respect to the Copeland winner, which, unlike the Condorcet winner, is guaranteed to exist. The first, Copeland Confidence Bound (CCB ..."
Abstract
 Add to MetaCart
A version of the dueling bandit problem is addressed in which a Condorcet winner may not exist. Two algorithms are proposed that instead seek to minimize regret with respect to the Copeland winner, which, unlike the Condorcet winner, is guaranteed to exist. The first, Copeland Confidence Bound
Generic Exploration and Karmed Voting Bandits (extended version)
"... We study a stochastic online learning scheme with partial feedback where the utility of decisions is only observable through an estimation of the environment parameters. We propose a generic pureexploration algorithm, able to cope with various utility functions from multiarmed bandits settings ..."
Abstract
 Add to MetaCart
We study a stochastic online learning scheme with partial feedback where the utility of decisions is only observable through an estimation of the environment parameters. We propose a generic pureexploration algorithm, able to cope with various utility functions from multiarmed bandits settings
Results 1  10
of
9,418