Results 1  10
of
20
MultiArmed Bandits in Metric Spaces
 STOC'08
, 2008
"... In a multiarmed bandit problem, an online algorithm chooses from a set of strategies in a sequence of n trials so as to maximize the total payoff of the chosen strategies. While the performance of bandit algorithms with a small finite strategy set is quite well understood, bandit problems with larg ..."
Abstract

Cited by 46 (8 self)
 Add to MetaCart
In a multiarmed bandit problem, an online algorithm chooses from a set of strategies in a sequence of n trials so as to maximize the total payoff of the chosen strategies. While the performance of bandit algorithms with a small finite strategy set is quite well understood, bandit problems with large strategy sets are still a topic of very active investigation, motivated by practical applications such as online auctions and web advertisement. The goal of such research is to identify broad and natural classes of strategy sets and payoff functions which enable the design of efficient solutions. In this work we study a very general setting for the multiarmed bandit problem in which the strategies form a metric space, and the payoff function satisfies a Lipschitz condition with respect to the metric. We refer to this problem as the Lipschitz MAB problem. We present a complete solution for the multiarmed problem in this setting. That is, for every metric space (L, X) we define an isometry invariant MaxMinCOV(X) which bounds from below the performance of Lipschitz MAB algorithms for X, and we present an algorithm which comes arbitrarily close to meeting this bound. Furthermore, our technique gives even better results for benign payoff functions.
Regret minimization and the price of total anarchy
 In STOC ’08: Proceedings of the fortieth annual ACM symposium on Theory of computing
, 2007
"... We propose weakening the assumption made when studying the price of anarchy: Rather than assume that selfinterested players will play according to a Nash equilibrium (which may even be computationally hard to find), we assume only that selfish players play so as to minimize their own regret. Regret ..."
Abstract

Cited by 38 (7 self)
 Add to MetaCart
We propose weakening the assumption made when studying the price of anarchy: Rather than assume that selfinterested players will play according to a Nash equilibrium (which may even be computationally hard to find), we assume only that selfish players play so as to minimize their own regret. Regret minimization can be done via simple, efficient algorithms even in many settings where the number of action choices for each player is exponential in the natural parameters of the problem. We prove that despite our weakened assumptions, in several broad classes of games, this “price of total anarchy ” matches the Nash price of anarchy, even though play may never converge to Nash equilibrium. In contrast to the price of anarchy and the recently introduced price of sinking [15], which require all players to behave in a prescribed manner, we show that the price of total anarchy is in many cases resilient to the presence of Byzantine players, about whom we make no assumptions. Finally, because the price of total anarchy is an upper bound on the price of anarchy even in mixed strategies, for some games our results yield as corollaries previously unknown bounds on the price of anarchy in mixed strategies. 1
An online algorithm for maximizing submodular functions
, 2007
"... We present an algorithm for solving a broad class of online resource allocation jobs arrive one at a time, and one can complete the jobs by investing time in a number of abstract activities, according to some schedule. We assume that the fraction of jobs completed by a schedule is a monotone, submod ..."
Abstract

Cited by 30 (9 self)
 Add to MetaCart
We present an algorithm for solving a broad class of online resource allocation jobs arrive one at a time, and one can complete the jobs by investing time in a number of abstract activities, according to some schedule. We assume that the fraction of jobs completed by a schedule is a monotone, submodular function of a set of pairs (v, τ), where τ is the time invested in activity v. Under this assumption, our online algorithm performs nearoptimally according to two natural metrics: (i) the fraction of jobs completed within time T, for some fixed deadline T> 0, and (ii) the average time required to complete each job. We evaluate our algorithm experimentally by using it to learn, online, a schedule for allocating CPU time among solvers entered in the 2007 SAT solver competition. 1
Contextual Bandits with Similarity Information
 24TH ANNUAL CONFERENCE ON LEARNING THEORY
, 2011
"... In a multiarmed bandit (MAB) problem, an online algorithm makes a sequence of choices. In each round it chooses from a timeinvariant set of alternatives and receives the payoff associated with this alternative. While the case of small strategy sets is by now wellunderstood, a lot of recent work ha ..."
Abstract

Cited by 17 (4 self)
 Add to MetaCart
In a multiarmed bandit (MAB) problem, an online algorithm makes a sequence of choices. In each round it chooses from a timeinvariant set of alternatives and receives the payoff associated with this alternative. While the case of small strategy sets is by now wellunderstood, a lot of recent work has focused on MAB problems with exponentially or infinitely large strategy sets, where one needs to assume extra structure in order to make the problem tractable. In particular, recent literature considered information on similarity between arms. We consider similarity information in the setting of contextual bandits, a natural extension of the basic MAB problem where before each round an algorithm is given the context – a hint about the payoffs in this round. Contextual bandits are directly motivated by placing advertisements on webpages, one of the crucial problems in sponsored search. A particularly simple way to represent similarity information in the contextual bandit setting is via a similarity distance between the contextarm pairs which bounds from above the difference between the respective expected payoffs. Prior work
Online Learning for Global Cost Functions
, 2009
"... We consider an online learning setting where at each time step the decision maker has to choose how to distribute the future loss between k alternatives, and then observes the loss of each alternative. Motivated by load balancing and job scheduling, we consider a global cost function (over the losse ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
We consider an online learning setting where at each time step the decision maker has to choose how to distribute the future loss between k alternatives, and then observes the loss of each alternative. Motivated by load balancing and job scheduling, we consider a global cost function (over the losses incurred by each alternative), rather than a summation of the instantaneous losses as done traditionally in online learning. Such global cost functions include the makespan (the maximum over the alternatives) and the Ld norm (over the alternatives). Based on approachability theory, we design an algorithm that guarantees vanishing regret for this setting, where the regret is measured with respect to the best static decision that selects the same distribution over alternatives at every time step. For the special case of makespan cost we devise a simple and efficient algorithm. In contrast, we show that for concave global cost functions, such as Ld norms for d < 1, the worstcase average regret does not vanish.
NetworkWide Deployment of Intrusion Detection and Prevention Systems
, 2010
"... Traditional research efforts for scaling NIDS and NIPS systems using parallelization and hardwareassisted acceleration have largely focused on a singlevantagepoint view. In this chapter, we explore a different design alternative that exploits spatial, networkwide opportunities for distributing NI ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
Traditional research efforts for scaling NIDS and NIPS systems using parallelization and hardwareassisted acceleration have largely focused on a singlevantagepoint view. In this chapter, we explore a different design alternative that exploits spatial, networkwide opportunities for distributing NIDS and NIPS functions throughout a network. We present systematic models that capture the operational constraints and requirements in deploying networkwide NIDS and NIPS capabilities. These formulations enable network administrators to optimally leverage their infrastructure toward their security objectives. For the NIDS case, we design a linear programming formulation for partitioning NIDS functions across a network to ensure that no node is overloaded. We also describe and evaluate a prototype implementation using Bro. For NIPS, we show how to maximally reduce unwanted traffic using special hardwareassisted capabilities. In this case, the hardware constraints make the optimization problem NPhard, and we design and implement practical approximation algorithms based on randomized rounding. These results have immediate practical implications as: (1) enterprise networks become larger and their traffic volumes increase; and (2) ISPs increasingly deploy NIDS/NIPS capabilities as innetwork defenses. By leveraging networkwide opportunities for distributing NIDS/NIPS responsibilities, our work effectively complements efforts to scale
Beyond equilibria: Mechanisms for repeated combinatorial auctions
, 2009
"... We study the design of mechanisms in combinatorial auction domains. We focus on settings where the auction is repeated, motivated by auctions for licenses or advertising space. We consider models of agent behaviour in which they either apply common learning techniques to minimize the regret of thei ..."
Abstract

Cited by 5 (4 self)
 Add to MetaCart
We study the design of mechanisms in combinatorial auction domains. We focus on settings where the auction is repeated, motivated by auctions for licenses or advertising space. We consider models of agent behaviour in which they either apply common learning techniques to minimize the regret of their bidding strategies, or apply shortsighted bestresponse strategies. We ask: when can a blackbox approximation algorithm for the base auction problem be converted into a mechanism that approximately preserves the original algorithm’s approximation factor on average over many iterations? We present a general reduction for a broad class of algorithms when agents minimize external regret. We also present a mechanism for the combinatorial auction problem that attains an O (√m) approximation on average when agents apply bestresponse dynamics.
Approximation Algorithms for Reliable Stochastic Combinatorial Optimization
, 2010
"... We consider optimization problems that can be formulated as minimizing the cost of a feasible solution wTx over an arbitrary combinatorial feasible set F ⊂ {0, 1} n. For these problems we describe a broad class of corresponding stochastic problems where the cost vector W has independent random compo ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
We consider optimization problems that can be formulated as minimizing the cost of a feasible solution wTx over an arbitrary combinatorial feasible set F ⊂ {0, 1} n. For these problems we describe a broad class of corresponding stochastic problems where the cost vector W has independent random components, unknown at the time of solution. A natural and important objective that incorporates risk in this stochastic setting is to look for a feasible solution whose stochastic cost has a small tail or a small convex combination of mean and standard deviation. Our models can be equivalently reformulated as nonconvex programs for which no efficient algorithms are known. In this paper, we make progress on these hard problems. Our results are several efficient generalpurpose approximation schemes. They use as a blackbox (exact or approximate) the solution to the underlying deterministic problem and thus immediately apply to arbitrary combinatorial problems. For example, from an available δapproximation algorithm to the linear problem, we construct a δ(1 + ǫ)approximation algorithm for the stochastic problem, which invokes the linear algorithm only a logarithmic number of times in the problem input (and polynomial in 1 ǫ), for any desired accuracy level ǫ> 0. The algorithms are based on a geometric analysis of the curvature and approximability of the nonlinear level sets of the objective functions.
Multiarmed bandits on implicit metric spaces
"... slivkins at microsoft.com The multiarmed bandit (MAB) setting is a useful abstraction of many online learning tasks which focuses on the tradeoff between exploration and exploitation. In this setting, an online algorithm has a fixed set of alternatives (“arms”), and in each round it selects one ar ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
slivkins at microsoft.com The multiarmed bandit (MAB) setting is a useful abstraction of many online learning tasks which focuses on the tradeoff between exploration and exploitation. In this setting, an online algorithm has a fixed set of alternatives (“arms”), and in each round it selects one arm and then observes the corresponding reward. While the case of small number of arms is by now wellunderstood, a lot of recent work has focused on multiarmed bandits with (infinitely) many arms, where one needs to assume extra structure in order to make the problem tractable. In particular, in the Lipschitz MAB problem there is an underlying similarity metric space, known to the algorithm, such that any two arms that are close in this metric space have similar payoffs. In this paper we consider the more realistic scenario in which the metric space is implicit – it is defined by the available structure but not revealed to the algorithm directly. Specifically, we assume that an algorithm is given a treebased classification of arms. For any given problem instance such a classification implicitly defines a similarity metric space, but the numerical similarity information is not available to the algorithm. We provide an algorithm for this setting, whose performance guarantees (almost) match the best known guarantees for the corresponding instance of the Lipschitz MAB problem. 1
Using Online Algorithms to Solve NPHard Problems More Efficiently in Practice
, 2007
"... as representing the official policies of the U.S. Government. ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
as representing the official policies of the U.S. Government.