• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Nonparametric bandits with covariates (2010)

by P Rigollet, A Zeevi
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 11
Next 10 →

Contextual Bandits with Similarity Information

by Aleksandrs Slivkins - 24TH ANNUAL CONFERENCE ON LEARNING THEORY , 2011
"... In a multi-armed bandit (MAB) problem, an online algorithm makes a sequence of choices. In each round it chooses from a time-invariant set of alternatives and receives the payoff associated with this alternative. While the case of small strategy sets is by now wellunderstood, a lot of recent work ha ..."
Abstract - Cited by 57 (9 self) - Add to MetaCart
In a multi-armed bandit (MAB) problem, an online algorithm makes a sequence of choices. In each round it chooses from a time-invariant set of alternatives and receives the payoff associated with this alternative. While the case of small strategy sets is by now wellunderstood, a lot of recent work has focused on MAB problems with exponentially or infinitely large strategy sets, where one needs to assume extra structure in order to make the problem tractable. In particular, recent literature considered information on similarity between arms. We consider similarity information in the setting of contextual bandits, a natural extension of the basic MAB problem where before each round an algorithm is given the context – a hint about the payoffs in this round. Contextual bandits are directly motivated by placing advertisements on webpages, one of the crucial problems in sponsored search. A particularly simple way to represent similarity information in the contextual bandit setting is via a similarity distance between the context-arm pairs which bounds from above the difference between the respective expected payoffs. Prior work

On the fundamental limits of adaptive sensing

by Ery Arias-castro, Emmanuel J. C, Mark A. Davenport , 2011
"... Suppose we can sequentially acquire arbitrary linear measurements of an n-dimensional vector x resulting in the linear model y = Ax + z, where z represents measurement noise. If the signal is known to be sparse, one would expect the following folk theorem to be true: choosing an adaptive strategy wh ..."
Abstract - Cited by 25 (3 self) - Add to MetaCart
Suppose we can sequentially acquire arbitrary linear measurements of an n-dimensional vector x resulting in the linear model y = Ax + z, where z represents measurement noise. If the signal is known to be sparse, one would expect the following folk theorem to be true: choosing an adaptive strategy which cleverly selects the next row of A based on what has been previously observed should do far better than a nonadaptive strategy which sets the rows of A ahead of time, thus not trying to learn anything about the signal in between observations. This paper shows that the folk theorem is false. We prove that the advantages offered by clever adaptive strategies and sophisticated estimation procedures—no matter how intractable—over classical compressed acquisition/recovery schemes are, in general, minimal.
(Show Context)

Citation Context

...results from [29] to establish a bound on the minimax rate for binary classification (see the references therein for additional literature on active learning). Other examples include the recent paper =-=[26]-=-, which derives lower bounds for bandit problems, and [24] which develops an information theoretic approach suitable for stochastic optimization, a form of online learning, and gives bounds about the ...

Multiclass classification with bandit feedback using adaptive regularization

by Koby Crammer - In ICML , 2011
"... We present a new multiclass algorithm in the bandit framework, where after making a prediction, the learning algorithm receives only partial feedback, i.e., a single bit of right-orwrong, rather then the true label. Our algorithm is based on the 2nd-order Perceptron, and uses upper-confidence bounds ..."
Abstract - Cited by 20 (5 self) - Add to MetaCart
We present a new multiclass algorithm in the bandit framework, where after making a prediction, the learning algorithm receives only partial feedback, i.e., a single bit of right-orwrong, rather then the true label. Our algorithm is based on the 2nd-order Perceptron, and uses upper-confidence bounds to trade off exploration and exploitation. We analyze this algorithm in a partial adversarial setting, where instances are chosen adversarially, while the labels are chosen according to a linear probabilistic model, which is also chosen adversarially. We show a regret of O ( √ T logT), which improvesoverthe current best bounds of O(T 2/3) in the fully adversarial setting. We evaluate our algorithm on nine real-world text classification problems, obtaining state-of-the-art results, even comparedwith non-bandit online algorithms, especially when label noise is introduced. 1.
(Show Context)

Citation Context

... learning tasks than ours. We are aware of at least three more papers that define multi-armed banditproblemswithsideinformation,alsocalledbandits with covariates: (Wang et al., 2005; Lu et al., 2010; =-=Rigollet & Zeevi, 2010-=-). However, the models in these paper are very different from ours, and not easily adapted to our multiclass problem. Another paper somewhat related to this work is (Walsh et al., 2009), where the aut...

THE MULTI-ARMED BANDIT PROBLEM WITH COVARIATES

by Vianney Perchet, Philippe Rigollet , 2013
"... We consider a multi-armed bandit problem in a setting where each arm produces a noisy reward realization which depends on an observable random covariate. As opposed to the traditional static multi-armed bandit prob-lem, this setting allows for dynamically changing rewards that better describe applic ..."
Abstract - Cited by 7 (1 self) - Add to MetaCart
We consider a multi-armed bandit problem in a setting where each arm produces a noisy reward realization which depends on an observable random covariate. As opposed to the traditional static multi-armed bandit prob-lem, this setting allows for dynamically changing rewards that better describe applications where side information is available. We adopt a nonparamet-ric model where the expected rewards are smooth functions of the covariate and where the hardness of the problem is captured by a margin parameter. To maximize the expected cumulative reward, we introduce a policy called Adaptively Binned Successive Elimination (ABSE) that adaptively decom-poses the global problem into suitably “localized ” static bandit problems. This policy constructs an adaptive partition using a variant of the Successive Elimination (SE) policy. Our results include sharper regret bounds for the SE policy in a static bandit problem and minimax optimal regret bounds for the ABSE policy in the dynamic problem.

Bounded regret in stochastic multi-armed bandits

by Sébastien Bubeck, Vianney Perchet, et al. - JMLR: WORKSHOP AND CONFERENCE PROCEEDINGS VOL (2013) 1–13 , 2013
"... We study the stochastic multi-armed bandit problem when one knows the valueµ (⋆) of an optimal arm, as a well as a positive lower bound on the smallest positive gap∆. We propose a new randomized policy that attains a regret uniformly bounded over time in this setting. We also prove several lower bou ..."
Abstract - Cited by 5 (2 self) - Add to MetaCart
We study the stochastic multi-armed bandit problem when one knows the valueµ (⋆) of an optimal arm, as a well as a positive lower bound on the smallest positive gap∆. We propose a new randomized policy that attains a regret uniformly bounded over time in this setting. We also prove several lower bounds, which show in particular that bounded regret is not possible if one only knows ∆, and bounded regret of order1/ ∆ is not possible if one only knowsµ (⋆).

Ranked Bandits in Metric Spaces: Learning Diverse Rankings over Large Document Collections

by Aleksandrs Slivkins, Filip Radlinski, Sreenivas Gollapudi , 2013
"... Most learning to rank research has assumed that the utility of different documents is independent, which results in learned ranking functions that return redundant results. The few approaches that avoid this have rather unsatisfyingly lacked theoretical foundations, or do not scale. We present a lea ..."
Abstract - Cited by 3 (0 self) - Add to MetaCart
Most learning to rank research has assumed that the utility of different documents is independent, which results in learned ranking functions that return redundant results. The few approaches that avoid this have rather unsatisfyingly lacked theoretical foundations, or do not scale. We present a learning-to-rank formulation that optimizes the fraction of satisfied users, with several scalable algorithms that explicitly takes document similarity and ranking context into account. Our formulation is a non-trivial common generalization of two multi-armed bandit models from the literature: ranked bandits (Radlinski et al., 2008) and Lipschitz bandits (Kleinberg et al., 2008b). We present theoretical justifications for this approach, as well as a near-optimal algorithm. Our evaluation adds optimizations that improve empirical performance, and shows that our algorithms learn orders of magnitude more quickly than previous approaches.

Clustered Bandits

by Loc Bui, Ramesh Johari, Shie Mannor , 2012
"... ..."
Abstract - Add to MetaCart
Abstract not found

Stochastic Optimization

by Lauren A. Hannah , 2014
"... ..."
Abstract - Add to MetaCart
Abstract not found
(Show Context)

Citation Context

...thods approximate the value of the arms as a function of the side information using regression methods like linear combinations of basis functions (Li et al. 2010), discretization of the state space (=-=Rigollet & Zeevi 2010-=-, Perchet & Rigollet 2013), random histograms (Yang & Zhu 2002), nearest neighbors (Yang & Zhu 2002) or adaptive partitioning (Slivkins 2011). While all bandit problems are broadly applicable to many ...

Contextual Multi-armed Bandits for Web Server Defense

by Tobias Jung, Sylvain Martin, Damien Ernst, Guy Leduc
"... Abstract—In this paper we argue that contextual multi-armed bandit algorithms could open avenues for designing self-learning security modules for computer networks and related tasks. The paper has two contributions: a conceptual and an algorithmical one. The conceptual contribution is to formulate t ..."
Abstract - Add to MetaCart
Abstract—In this paper we argue that contextual multi-armed bandit algorithms could open avenues for designing self-learning security modules for computer networks and related tasks. The paper has two contributions: a conceptual and an algorithmical one. The conceptual contribution is to formulate the real-world problem of preventing HTTP-based attacks on web servers as a one-shot sequential learning problem, namely as a contextual multi-armed bandit. Our second contribution is to present CMABFAS, a new and computationally very cheap algorithm for general contextual multi-armed bandit learning that specifically targets domains with finite actions. We illustrate how CMABFAS could be used to design a fully self-learning meta filter for web servers that does not rely on feedback from the end-user (i.e., does not require labeled data) and report first convincing simulation results. I.
(Show Context)

Citation Context

...previously observed outcomes for similar cases. Contextual MAB are nowadays an active research topic with many relevant real-world applications, e.g., placement of web advertisements. See [15], [10], =-=[11]-=-, [13], [8], [7] for some examples. We believe that contextual MAB (but not standard MAB) are a good description of what the HTTP meta-filter motivated in Section 1 is trying to achieve: the contexts ...

2 Sparse reward processes

by Christos Dimitrakakis , 2013
"... ar ..."
Abstract - Add to MetaCart
Abstract not found
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University