DMCA
Nonparametric bandits with covariates (2010)
Cached
Download Links
Venue: | In COLT |
Citations: | 11 - 1 self |
Citations
817 | Finite-time analysis of the multiarmed bandit problem,”Machine Learing
- Auer, Cesa-Bianchi, et al.
- 2002
(Show Context)
Citation Context ...s for the regressogram defined above. Upper confidence bounds (UCB) policies are known to perform optimally in the traditional two armed bandit problem, i.e., without covariates (Lai & Robbins, 1985; =-=Auer et al., 2002-=-). The index of each arm is computed as the sum of the average past reward and a stochastic term accounting for the deviations of the observed average reward from the true average reward. In the UCBog... |
509 |
Asymptotically efficient adaptive allocation rules
- Lai, Robbins
- 1985
(Show Context)
Citation Context ...pper confidence bounds for the regressogram defined above. Upper confidence bounds (UCB) policies are known to perform optimally in the traditional two armed bandit problem, i.e., without covariates (=-=Lai & Robbins, 1985-=-; Auer et al., 2002). The index of each arm is computed as the sum of the average past reward and a stochastic term accounting for the deviations of the observed average reward from the true average r... |
473 | Some aspects of the sequential design of experiments - Robbins - 1952 |
241 |
Introduction to Nonparametric Estimation
- Tsybakov
- 2010
(Show Context)
Citation Context ...plicit. In view of Lemma 3.1, it is sufficient to prove (14). To do so we reduce our problem to a hypothesis testing problems; an approach this is quite standard in the nonparametric literature, cf. (=-=Tsybakov, 2009-=-, the joint distribution of the collection of Chapter 2). For any policy π, and any t = 1, . . . , n, denote by IP t π,f pairs (X1, Y (π1(X1)) 1 ), . . . , (Xt, Y (πt(Xt)) t ) where IE[Y (1)|X] = f(X)... |
220 | Optimal aggregation of classifiers in statistical learning
- Tsybakov
- 2004
(Show Context)
Citation Context ...d margin condition, as it has been come be known in the full information setup; cf. Tsybakov (2004). In that setting, it has been shown to critically affect the complexity of classification problems (=-=Tsybakov, 2004-=-; Boucheron et al., 2005; Audibert & Tsybakov, 2007). In the bandit setup, this condition encodes the “separation” between the functions that describe the arms’ responses and was originally studied by... |
173 |
Prediction, learning, and games
- Cesa-Bianchi, Lugosi
- 2006
(Show Context)
Citation Context ... number of bins M as a function of the horizon n, while in practice one does not have foreknowledge of this value. This limitation can be easily circumvented by using the so-called doubling argument (=-=Cesa-Bianchi & Lugosi, 2006-=-) which consists of “reseting” the game at times 2 k , k = 1, 2, . . . The reader will note that when α = 1 there is a potentially superfluous log n factor appearing in the upper bound in the theorem.... |
146 | Smooth discrimination analysis
- Mammen, Tsybakov
- 1998
(Show Context)
Citation Context ... quantity. The second condition imposed gives a lower bound on this function though in a weaker globalsense. It is closely related to the margin condition employed in classification (Tsybakov, 2004; =-=Mammen & Tsybakov, 1999-=-), which drives the terminology employed here. MARGIN CONDITION. We say that the machine satisfies the margin condition with parameter α if there exists δ0 ∈ (0, 1), Cδ > 0 such that [ ] (1) (2) 0 < |... |
95 | Theory of classification: A survey of some recent advances. ESAIM: Probability and Statistics
- Boucheron, Bousquet, et al.
- 2005
(Show Context)
Citation Context ...on, as it has been come be known in the full information setup; cf. Tsybakov (2004). In that setting, it has been shown to critically affect the complexity of classification problems (Tsybakov, 2004; =-=Boucheron et al., 2005-=-; Audibert & Tsybakov, 2007). In the bandit setup, this condition encodes the “separation” between the functions that describe the arms’ responses and was originally studied by Goldenshluger and Zeevi... |
58 | The epoch-greedy algorithm for multiarmed bandits with side information. - Langford, Zhang - 2008 |
57 | Contextual Bandits with Similarity Information. - Slivkins - 2011 |
54 | Fast learning rates for plug-in classifiers.
- Audibert, Tsybakov
- 2007
(Show Context)
Citation Context ...be known in the full information setup; cf. Tsybakov (2004). In that setting, it has been shown to critically affect the complexity of classification problems (Tsybakov, 2004; Boucheron et al., 2005; =-=Audibert & Tsybakov, 2007-=-). In the bandit setup, this condition encodes the “separation” between the functions that describe the arms’ responses and was originally studied by Goldenshluger and Zeevi (2009) in the one armed ba... |
46 | Efficient bandit algorithms for online multiclass prediction. - Kakade, Shalev-Shwartz, et al. - 2008 |
30 | Fast rates for plug-in estimators of density level sets - Rigollet, Vert - 2006 |
21 | A one-armed bandit problem with a concomitant variable. - Woodroofe - 1979 |
10 | Woodroofes one-armed bandit problem revisited. - Goldenshluger, Zeevi - 2009 |
9 | Online learning with prior knowledge - Hazan, N - 2007 |
3 | Bandit problems with side observations. Automatic Control - Wang, Kulkarni, et al. - 2005 |
1 | Showing relevant ads via context multi-armed bandits - Lu, Pál, et al. - 2009 |