Results 1  10
of
848
Copeland Dueling Bandits
"... A version of the dueling bandit problem is addressed in which a Condorcet winner may not exist. Two algorithms are proposed that instead seek to minimize regret with respect to the Copeland winner, which, unlike the Condorcet winner, is guaranteed to exist. The first, Copeland Confidence Bound (CCB ..."
Abstract
 Add to MetaCart
A version of the dueling bandit problem is addressed in which a Condorcet winner may not exist. Two algorithms are proposed that instead seek to minimize regret with respect to the Copeland winner, which, unlike the Condorcet winner, is guaranteed to exist. The first, Copeland Confidence Bound
Reducing Dueling Bandits to Cardinal Bandits
, 2014
"... We present algorithms for reducing the Dueling Bandits problem to the conventional (stochastic) MultiArmed Bandits problem. The Dueling Bandits problem is an online model of learning with ordinal feedback of the form “A is preferred to B ” (as opposed to cardinal feedback like “A has value 2.5” ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
We present algorithms for reducing the Dueling Bandits problem to the conventional (stochastic) MultiArmed Bandits problem. The Dueling Bandits problem is an online model of learning with ordinal feedback of the form “A is preferred to B ” (as opposed to cardinal feedback like “A has value 2
Sparse Dueling Bandits
"... The dueling bandit problem is a variation of the classical multiarmed bandit in which the allowable actions are noisy comparisons between pairs of arms. This paper focuses on a new approach for finding the “best ” arm according to the Borda criterion using noisy comparisons. We prove that in the a ..."
Abstract
 Add to MetaCart
The dueling bandit problem is a variation of the classical multiarmed bandit in which the allowable actions are noisy comparisons between pairs of arms. This paper focuses on a new approach for finding the “best ” arm according to the Borda criterion using noisy comparisons. We prove
A Relative Exponential Weighing Algorithm for Adversarial Utilitybased Dueling Bandits
"... We study the Karmed dueling bandit problem which is a variation of the classical MultiArmed Bandit (MAB) problem in which the learner receives only relative feedback about the selected pairs of arms. We propose an efficient algorithm called Relative Exponentialweight algorithm for Exploration an ..."
Abstract
 Add to MetaCart
We study the Karmed dueling bandit problem which is a variation of the classical MultiArmed Bandit (MAB) problem in which the learner receives only relative feedback about the selected pairs of arms. We propose an efficient algorithm called Relative Exponentialweight algorithm for Exploration
Online Rank Elicitation for PlackettLuce: A Dueling Bandits Approach
"... We study the problem of online rank elicitation, assuming that rankings of a set of alternatives obey the PlackettLuce distribution. Following the setting of the dueling bandits problem, the learner is allowed to query pairwise comparisons between alternatives, i.e., to sample pairwise marginals of ..."
Abstract
 Add to MetaCart
We study the problem of online rank elicitation, assuming that rankings of a set of alternatives obey the PlackettLuce distribution. Following the setting of the dueling bandits problem, the learner is allowed to query pairwise comparisons between alternatives, i.e., to sample pairwise marginals
The Karmed Dueling Bandits Problem
, 2009
"... We study a partialinformation onlinelearning problem where actions are restricted to noisy comparisons between pairs of strategies (also known as bandits). In contrast to conventional approaches that require the absolute reward of the chosen strategy to be quantifiable and observable, our setting ..."
Abstract

Cited by 29 (7 self)
 Add to MetaCart
We study a partialinformation onlinelearning problem where actions are restricted to noisy comparisons between pairs of strategies (also known as bandits). In contrast to conventional approaches that require the absolute reward of the chosen strategy to be quantifiable and observable, our setting
Generic Exploration and Karmed Voting Bandits
"... We study a stochastic online learning scheme with partial feedback where the utility of decisions is only observable through an estimation of the environment parameters. We propose a generic pureexploration algorithm, able to cope with various utility functions from multiarmed bandits settings ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
to dueling bandits. The primary application of this setting is to offer a natural generalization of dueling bandits for situations where the environment parameters reflect the idiosyncratic preferences of a mixed crowd. 1.
Interactively Optimizing Information Retrieval Systems as a Dueling Bandits Problem
, 2009
"... We present an online learning framework tailored towards realtime learning from observed user behavior in search engines and other information retrieval systems. In particular, we only require pairwise comparisons which were shown to be reliably inferred from implicit feedback (Joachims et al., 20 ..."
Abstract

Cited by 38 (8 self)
 Add to MetaCart
We present an online learning framework tailored towards realtime learning from observed user behavior in search engines and other information retrieval systems. In particular, we only require pairwise comparisons which were shown to be reliably inferred from implicit feedback (Joachims et al., 2007; Radlinski et al., 2008b). We will present an algorithm with theoretical guarantees as well as simulation results.
Beat the mean bandit
 In Proceedings of the International Conference on Machine Learning (ICML
"... The Dueling Bandits Problem is an online learning framework in which actions are restricted to noisy comparisons between pairs of strategies (also called bandits). It models settings where absolute rewards are difficult to elicit but pairwise preferences are readily available. In this paper, we exte ..."
Abstract

Cited by 16 (3 self)
 Add to MetaCart
The Dueling Bandits Problem is an online learning framework in which actions are restricted to noisy comparisons between pairs of strategies (also called bandits). It models settings where absolute rewards are difficult to elicit but pairwise preferences are readily available. In this paper, we
Relative Upper Confidence Bound for the KArmed Dueling Bandit Problem
"... This paper proposes a new method for the Karmed dueling bandit problem, a variation on the regularKarmed bandit problem that offers only relative feedback about pairs of arms. Our approach extends the Upper Confidence Bound algorithm to the relative setting by using estimates of the pairwise pro ..."
Abstract
 Add to MetaCart
This paper proposes a new method for the Karmed dueling bandit problem, a variation on the regularKarmed bandit problem that offers only relative feedback about pairs of arms. Our approach extends the Upper Confidence Bound algorithm to the relative setting by using estimates of the pairwise
Results 1  10
of
848