Results 1  10
of
38
Approximation Algorithms and Online Mechanisms for Item Pricing
 IN ACM CONFERENCE ON ELECTRONIC COMMERCE
, 2005
"... We present approximation and online algorithms for a number of problems of pricing items for sale so as to maximize seller's revenue in an unlimited supply setting. Our first result is an O(k)approximation algorithm for pricing items to singleminded bidders who each want at most k items. This impr ..."
Abstract

Cited by 58 (9 self)
 Add to MetaCart
We present approximation and online algorithms for a number of problems of pricing items for sale so as to maximize seller's revenue in an unlimited supply setting. Our first result is an O(k)approximation algorithm for pricing items to singleminded bidders who each want at most k items. This improves over recent independent work of Briest and Krysta [6] who achieve an O(k ) bound. For the case k = 2, where we obtain a 4approximation, this can be viewed as the following graph vertex pricing problem: given a (multi) graph G with valuations w e on the edges, find prices p i 0 for the vertices to maximize (p i + p j ) .
Competing in the dark: An efficient algorithm for bandit linear optimization
 In Proceedings of the 21st Annual Conference on Learning Theory (COLT
, 2008
"... We introduce an efficient algorithm for the problem of online linear optimization in the bandit setting which achieves the optimal O ∗ ( √ T) regret. The setting is a natural generalization of the nonstochastic multiarmed bandit problem, and the existence of an efficient optimal algorithm has bee ..."
Abstract

Cited by 47 (8 self)
 Add to MetaCart
We introduce an efficient algorithm for the problem of online linear optimization in the bandit setting which achieves the optimal O ∗ ( √ T) regret. The setting is a natural generalization of the nonstochastic multiarmed bandit problem, and the existence of an efficient optimal algorithm has been posed as an open problem in a number of recent papers. We show how the difficulties encountered by previous approaches are overcome by the use of a selfconcordant potential function. Our approach presents a novel connection between online learning and interior point methods. 1
MultiArmed Bandits in Metric Spaces
 STOC'08
, 2008
"... In a multiarmed bandit problem, an online algorithm chooses from a set of strategies in a sequence of n trials so as to maximize the total payoff of the chosen strategies. While the performance of bandit algorithms with a small finite strategy set is quite well understood, bandit problems with larg ..."
Abstract

Cited by 44 (7 self)
 Add to MetaCart
In a multiarmed bandit problem, an online algorithm chooses from a set of strategies in a sequence of n trials so as to maximize the total payoff of the chosen strategies. While the performance of bandit algorithms with a small finite strategy set is quite well understood, bandit problems with large strategy sets are still a topic of very active investigation, motivated by practical applications such as online auctions and web advertisement. The goal of such research is to identify broad and natural classes of strategy sets and payoff functions which enable the design of efficient solutions. In this work we study a very general setting for the multiarmed bandit problem in which the strategies form a metric space, and the payoff function satisfies a Lipschitz condition with respect to the metric. We refer to this problem as the Lipschitz MAB problem. We present a complete solution for the multiarmed problem in this setting. That is, for every metric space (L, X) we define an isometry invariant MaxMinCOV(X) which bounds from below the performance of Lipschitz MAB algorithms for X, and we present an algorithm which comes arbitrarily close to meeting this bound. Furthermore, our technique gives even better results for benign payoff functions.
Regret minimization and the price of total anarchy
 In STOC ’08: Proceedings of the fortieth annual ACM symposium on Theory of computing
, 2007
"... We propose weakening the assumption made when studying the price of anarchy: Rather than assume that selfinterested players will play according to a Nash equilibrium (which may even be computationally hard to find), we assume only that selfish players play so as to minimize their own regret. Regret ..."
Abstract

Cited by 38 (7 self)
 Add to MetaCart
We propose weakening the assumption made when studying the price of anarchy: Rather than assume that selfinterested players will play according to a Nash equilibrium (which may even be computationally hard to find), we assume only that selfish players play so as to minimize their own regret. Regret minimization can be done via simple, efficient algorithms even in many settings where the number of action choices for each player is exponential in the natural parameters of the problem. We prove that despite our weakened assumptions, in several broad classes of games, this “price of total anarchy ” matches the Nash price of anarchy, even though play may never converge to Nash equilibrium. In contrast to the price of anarchy and the recently introduced price of sinking [15], which require all players to behave in a prescribed manner, we show that the price of total anarchy is in many cases resilient to the presence of Byzantine players, about whom we make no assumptions. Finally, because the price of total anarchy is an upper bound on the price of anarchy even in mixed strategies, for some games our results yield as corollaries previously unknown bounds on the price of anarchy in mixed strategies. 1
Playing games with approximation algorithms
 In Proceedings of the 39 th annual ACM Symposium on Theory of Computing
, 2007
"... Abstract. In an online linear optimization problem, on each period t, an online algorithm chooses st ∈ S from a fixed (possibly infinite) set S of feasible decisions. Nature (who may be adversarial) chooses a weight vector wt ∈ R n, and the algorithm incurs cost c(st, wt), where c is a fixed cost fu ..."
Abstract

Cited by 20 (2 self)
 Add to MetaCart
Abstract. In an online linear optimization problem, on each period t, an online algorithm chooses st ∈ S from a fixed (possibly infinite) set S of feasible decisions. Nature (who may be adversarial) chooses a weight vector wt ∈ R n, and the algorithm incurs cost c(st, wt), where c is a fixed cost function that is linear in the weight vector. In the fullinformation setting, the vector wt is then revealed to the algorithm, and in the bandit setting, only the cost experienced, c(st, wt), is revealed. The goal of the online algorithm is to perform nearly as well as the best fixed s ∈ S in hindsight. Many repeated decisionmaking problems with weights fit naturally into this framework, such as online shortestpath, online TSP, online clustering, and online weighted set cover. Previously, it was shown how to convert any efficient exact offline optimization algorithm for such a problem into an efficient online algorithm in both the fullinformation and the bandit settings, with average cost nearly as good as that of the best fixed s ∈ S in hindsight. However, in the case where the offline algorithm is an approximation algorithm with ratio α> 1, the previous approach only worked for special types of approximation algorithms. We show how to convert any offline approximation algorithm for a linear optimization problem into a corresponding online approximation algorithm, with a polynomial blowup in runtime. If the offline algorithm has an αapproximation guarantee, then the expected cost of the online algorithm on any sequence is not much larger than α times that of the best s ∈ S, where the best is chosen with the benefit of hindsight. Our main innovation is combining Zinkevich’s algorithm for convex optimization with a geometric transformation that can be applied to any approximation algorithm. Standard techniques generalize the above result to the bandit setting, except that a “Barycentric Spanner ” for the problem is also (provably) necessary as input. Our algorithm can also be viewed as a method for playing large repeated games, where one can only compute approximate bestresponses, rather than bestresponses. 1. Introduction. In the 1950’s
Combinatorial Bandits
"... We study sequential prediction problems in which, at each time instance, the forecaster chooses a binary vector from a certain fixed set S ⊆ {0, 1} d and suffers a loss that is the sum of the losses of those vector components that equal to one. The goal of the forecaster is to achieve that, in the l ..."
Abstract

Cited by 16 (4 self)
 Add to MetaCart
We study sequential prediction problems in which, at each time instance, the forecaster chooses a binary vector from a certain fixed set S ⊆ {0, 1} d and suffers a loss that is the sum of the losses of those vector components that equal to one. The goal of the forecaster is to achieve that, in the long run, the accumulated loss is not much larger than that of the best possible vector in the class. We consider the “bandit ” setting in which the forecaster has only access to the losses of the chosen vectors. We introduce a new general forecaster achieving a regret bound that, for a variety of concrete choices of S, is of order √ nd ln S  where n is the time horizon. This is not improvable in general and is better than previously known bounds. We also point out that computationally efficient implementations for various interesting choices of S exist. 1
Contextual Bandits with Similarity Information
 24TH ANNUAL CONFERENCE ON LEARNING THEORY
, 2011
"... In a multiarmed bandit (MAB) problem, an online algorithm makes a sequence of choices. In each round it chooses from a timeinvariant set of alternatives and receives the payoff associated with this alternative. While the case of small strategy sets is by now wellunderstood, a lot of recent work ha ..."
Abstract

Cited by 15 (3 self)
 Add to MetaCart
In a multiarmed bandit (MAB) problem, an online algorithm makes a sequence of choices. In each round it chooses from a timeinvariant set of alternatives and receives the payoff associated with this alternative. While the case of small strategy sets is by now wellunderstood, a lot of recent work has focused on MAB problems with exponentially or infinitely large strategy sets, where one needs to assume extra structure in order to make the problem tractable. In particular, recent literature considered information on similarity between arms. We consider similarity information in the setting of contextual bandits, a natural extension of the basic MAB problem where before each round an algorithm is given the context – a hint about the payoffs in this round. Contextual bandits are directly motivated by placing advertisements on webpages, one of the crucial problems in sponsored search. A particularly simple way to represent similarity information in the contextual bandit setting is via a similarity distance between the contextarm pairs which bounds from above the difference between the respective expected payoffs. Prior work
HighProbability Regret Bounds for Bandit Online Linear Optimization
"... We present a modification of the algorithm of Dani et al. [8] for the online linear optimization problem in the bandit setting, which with high probability has regret at most O ∗ ( √ T) against an adaptive adversary. This improves on the previous algorithm [8] whose regret is bounded in expectatio ..."
Abstract

Cited by 15 (0 self)
 Add to MetaCart
We present a modification of the algorithm of Dani et al. [8] for the online linear optimization problem in the bandit setting, which with high probability has regret at most O ∗ ( √ T) against an adaptive adversary. This improves on the previous algorithm [8] whose regret is bounded in expectation against an oblivious adversary. We obtain the same dependence on the dimension (n 3/2) as that exhibited by Dani et al. The results of this paper rest firmly on those of [8] and the remarkable technique of Auer et al. [2] for obtaining highprobability bounds via optimistic estimates. This paper answers an open question: it eliminates the gap between the highprobability bounds obtained in the fullinformation vs bandit settings. 1
Characterizing truthful multiarmed bandit mechanisms
 In ACMEC
, 2009
"... We consider a multiround auction setting motivated by payperclick auctions for Internet advertising. In each round the auctioneer selects an advertiser and shows her ad, which is then either clicked or not. An advertiser derives value from clicks; the value of a click is her private information. I ..."
Abstract

Cited by 13 (0 self)
 Add to MetaCart
We consider a multiround auction setting motivated by payperclick auctions for Internet advertising. In each round the auctioneer selects an advertiser and shows her ad, which is then either clicked or not. An advertiser derives value from clicks; the value of a click is her private information. Initially, neither the auctioneer nor the advertisers have any information about the likelihood of clicks on the advertisements. The auctioneer’s goal is to design a (dominant strategies) truthful mechanism that (approximately) maximizes the social welfare. If the advertisers bid their true private values, our problem is equivalent to the multiarmed bandit problem, and thus can be viewed as a strategic version of the latter. In particular, for both problems the quality of an algorithm can be characterized by regret, the difference in social welfare between the algorithm and the benchmark which always selects the same“best”advertisement. We investigate how the design of multiarmed bandit algorithms is affected by the restriction that the resulting mechanism must be truthful. We find that truthful mechanisms have certain strong structural properties – essentially, they must separate exploration from exploitation – and they incur much higher regret than the optimal multiarmed bandit algorithms. Moreover, we provide a truthful mechanism which (essentially) matches our lower bound on regret.
Adapting to a Changing Environment: the Brownian Restless Bandits
"... In the multiarmed bandit (MAB) problem there are k distributions associated with the rewards of playing each of k strategies (slot machine arms). The reward distributions are initially unknown to the player. The player iteratively plays one strategy per round, observes the associated reward, and de ..."
Abstract

Cited by 12 (3 self)
 Add to MetaCart
In the multiarmed bandit (MAB) problem there are k distributions associated with the rewards of playing each of k strategies (slot machine arms). The reward distributions are initially unknown to the player. The player iteratively plays one strategy per round, observes the associated reward, and decides on the strategy for the next iteration. The goal is to maximize the reward by balancing exploitation: the use of acquired information, with exploration: learning new information. We introduce and study a dynamic MAB problem in which the reward functions stochastically and gradually change in time. Specifically, the expected reward of each arm follows a Brownian motion, a discrete random walk, or similar processes. In this setting a player has to continuously keep exploring in order to adapt to the changing environment. Our formulation is (roughly) a special case of the notoriously intractable restless MAB problem. Our goal here is to characterize the cost of learning and adapting to the changing environment, in terms of the stochastic rate of the change. We consider an infinite time horizon, and strive to minimize the average cost per step which we define with respect to a hypothetical algorithm that at every step plays the arm with the maximum expected reward at this step. A related line of work on the adversarial MAB problem used a significantly weaker benchmark, the best timeinvariant policy. The dynamic MAB problem models a variety of practical online, gameagainst nature type optimization settings. While building on prior work, algorithms and steadystate analysis for the dynamic setting require a novel approach based on different stochastic tools.