Results 11  20
of
33
Adapting to a Changing Environment: the Brownian Restless Bandits
"... In the multiarmed bandit (MAB) problem there are k distributions associated with the rewards of playing each of k strategies (slot machine arms). The reward distributions are initially unknown to the player. The player iteratively plays one strategy per round, observes the associated reward, and de ..."
Abstract

Cited by 12 (3 self)
 Add to MetaCart
In the multiarmed bandit (MAB) problem there are k distributions associated with the rewards of playing each of k strategies (slot machine arms). The reward distributions are initially unknown to the player. The player iteratively plays one strategy per round, observes the associated reward, and decides on the strategy for the next iteration. The goal is to maximize the reward by balancing exploitation: the use of acquired information, with exploration: learning new information. We introduce and study a dynamic MAB problem in which the reward functions stochastically and gradually change in time. Specifically, the expected reward of each arm follows a Brownian motion, a discrete random walk, or similar processes. In this setting a player has to continuously keep exploring in order to adapt to the changing environment. Our formulation is (roughly) a special case of the notoriously intractable restless MAB problem. Our goal here is to characterize the cost of learning and adapting to the changing environment, in terms of the stochastic rate of the change. We consider an infinite time horizon, and strive to minimize the average cost per step which we define with respect to a hypothetical algorithm that at every step plays the arm with the maximum expected reward at this step. A related line of work on the adversarial MAB problem used a significantly weaker benchmark, the best timeinvariant policy. The dynamic MAB problem models a variety of practical online, gameagainst nature type optimization settings. While building on prior work, algorithms and steadystate analysis for the dynamic setting require a novel approach based on different stochastic tools.
Optimization in the Private Value Model: Competitive Analysis Applied to Auction Design
, 2003
"... We consider the study of a class of optimization problems with applications towards profit maximization. One feature of the classical treatment of optimization problems is that the space over which the optimization is being performed, i.e., the input description of the problem, is assumed to be pu ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
We consider the study of a class of optimization problems with applications towards profit maximization. One feature of the classical treatment of optimization problems is that the space over which the optimization is being performed, i.e., the input description of the problem, is assumed to be publicly known to the optimizer. This assumption does not always accurately represent the situation in practical applications. Recently, with the advent of the Internet as one of the most important arenas for resource sharing between parties with diverse and selfish interests, this distinction has become more readily apparent. The inputs to many optimizations being performed are not publicly known in advance. Instead they must be solicited from companies, computerized agents, individuals, etc. that may act selfishly to promote their own selfinterests. In particular, they may lie about their values or may not adhere to specified protocols if it benefits them. An auction is
Learning on a budget: posted price mechanisms for online procurement
 In the 13th ACM Conference on Electronic Commerce (EC
, 2012
"... We study online procurement markets where agents arrive in a sequential order and a mechanism must make an irrevocable decision whether or not to procure the service as the agent arrives. Our mechanisms are subject to a budget constraint and are designed for stochastic settings in which the bidders ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
We study online procurement markets where agents arrive in a sequential order and a mechanism must make an irrevocable decision whether or not to procure the service as the agent arrives. Our mechanisms are subject to a budget constraint and are designed for stochastic settings in which the bidders are either identically distributed or, more generally, permuted in random order. Thus, the problems we study contribute to the literature on budgetfeasible mechanisms as well as the literature on secretary problems and online learning in auctions. Our main positive results are as follows. We present a constantcompetitive posted price mechanism when agents are identically distributed and the buyer has a symmetric submodular utility function. For nonsymmetric submodular utilities, under the random ordering assumption we give a posted price mechanism that is O(log n)competitive and a truthful mechanism that is O(1)competitive but uses bidding rather than posted pricing. 1.
Blind nonparametric revenue management: Asymptotic optimality of a joint learning and pricing method. Working Paper
, 2006
"... We consider a general class of network revenue management problems in which multiple products are linked by various resource constraints. Demand is modeled as a multivariate Poisson process whose instantaneous rate at each point in time is determined by a vector of prices set by the decision maker. ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
We consider a general class of network revenue management problems in which multiple products are linked by various resource constraints. Demand is modeled as a multivariate Poisson process whose instantaneous rate at each point in time is determined by a vector of prices set by the decision maker. The objective is to price the products so as to maximize expected revenues over a finite sales horizon. The decision maker observes realized demand over time, but is otherwise “blind ” to the underlying demand function which maps prices into the instantaneous demand rate. Few structural assumptions are made with regard to the demand function, in particular, it need not admit any parametric representation. We introduce a general method for solving such blind revenue management problems: first a learning phase experiments with a “small ” number of prices over an initial “short ” time interval; then a simple optimization problem is solved using an estimate of the demand function obtained from the previous stage, and a nearoptimal price is fixed for the remainder of the time horizon. To evaluate the performance of the proposed method we compare the revenues it generates to those corresponding to the optimal dynamic pricing policy that knows the demand function a priori. In a regime where the sales volume grows large, we prove that the gap in performance is suitably small; in that sense, the proposed method is asymptotically optimal.
Designing and learning optimal finite support auctions
 In Proc. 18th ACM Symp. on Discrete Algorithms
, 2007
"... Abstract. A classical paper of Myerson [18] shows how to construct an optimal (revenuemaximizing) auction in a model where bidders ’ values are drawn from known continuous distributions. In this paper we show how to adapt this approach to finite support distributions that may be partially unknown. ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Abstract. A classical paper of Myerson [18] shows how to construct an optimal (revenuemaximizing) auction in a model where bidders ’ values are drawn from known continuous distributions. In this paper we show how to adapt this approach to finite support distributions that may be partially unknown. We demonstrate that a Myersonstyle auction can be constructed in time polynomial in the number of bidders and the size of the support sets. Next, we consider the scenario where the mechanism designer knows the support sets, but not the probability of each value. In this situation, we show that the optimal auction may be learned in polynomial time using a weak oracle that, given two candidate auctions, returns one with a higher expected revenue. To study this problem, we introduce a new class of truthful mechanisms which we call orderbased auctions. We show that the optimal mechanism is an orderbased auction and use the internal structure of this class to prove the correctness of our learning algorithm as well as to bound its running time. 1
Posting prices with unknown distributions
 In Innovations in Computer Science (ICS), 2011. [BH08] L. Blumrosen and
"... Abstract: We consider a dynamic auction model, where bidders sequentially arrive to the market. The values of the bidders for the item for sale are independently drawn from a distribution, but this distribution is unknown to the seller. The seller offers a takeitorleaveit price for each arriving ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Abstract: We consider a dynamic auction model, where bidders sequentially arrive to the market. The values of the bidders for the item for sale are independently drawn from a distribution, but this distribution is unknown to the seller. The seller offers a takeitorleaveit price for each arriving bidder (possibly different for different bidders), and aims to maximize revenue. We study how well can such sequential postedprice mechanisms approximate the optimal revenue that is achieved when the distribution is known to the seller. On the negative side, we show that sequential postedprice mechanisms cannot guarantee a constant fraction of this revenue when the class of candidate distributions is unrestricted. We show that this impossibility holds even if the set of possible distributions is very small, or when the seller has a prior distribution over the candidate distributions. On the positive side, we devise a postedprice mechanism that guarantees a constant fraction of the knowndistribution revenue when all candidate distributions exhibit the monotone hazard rate property.
Minimax Regret of Finite PartialMonitoring Games in Stochastic Environments ∗
"... In a partial monitoring game, the learner repeatedly chooses an action, the environment responds with an outcome, and then the learner suffers a loss and receives a feedback signal, both of which are fixed functions of the action and the outcome. The goal of the learner is to minimize his regret, wh ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
In a partial monitoring game, the learner repeatedly chooses an action, the environment responds with an outcome, and then the learner suffers a loss and receives a feedback signal, both of which are fixed functions of the action and the outcome. The goal of the learner is to minimize his regret, which is the difference between his total cumulative loss and the total loss of the best fixed action in hindsight. Assuming that the outcomes are generated in an i.i.d. fashion from an arbitrary and unknown probability distribution, we characterize the minimax regret of any partial monitoring game with finitely many actions and outcomes. It turns out that the minimax regret of any such game is either zero, ˜ Θ ( √ T), Θ(T 2/3), or Θ(T). We provide a computationally efficient learning algorithm that achieves the minimax regret within logarithmic factor for any game.
Equilibria in Online Games
"... We initiate the study of scenarios that combine online decision making with interaction between noncooperative agents. To this end we introduce online games that model such scenarios as noncooperative games, and lay the foundations for studying this model. Roughly speaking, an online game captures ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
We initiate the study of scenarios that combine online decision making with interaction between noncooperative agents. To this end we introduce online games that model such scenarios as noncooperative games, and lay the foundations for studying this model. Roughly speaking, an online game captures systems in which independent agents serve requests in a common environment. The requests arrive in an online fashion and each is designated to be served by a different agent. The cost incurred by serving a request is paid for by the serving agent, and naturally, the agents seek to minimize the total cost they pay. Since the agents are independent, it is unlikely that some central authority can enforce a policy or an algorithm (centralized or distributed) on them, and thus, the agents can be viewed as selfish players in a noncooperative game. In this game, the players have to choose as a strategy an online algorithm according to which requests are served. To further facilitate the game theoretic approach, we suggest the measure of competitive analysis as the players ’ decision criterion. As the expected result of noncooperative games is an equilibrium, the question of finding the equilibria of a game is of central importance, and thus, it is the central issue we concentrate on in this paper. We study some natural examples for online games; in order to obtain general insights and develop generic techniques, we present an abstract model for the study of online games generalizing metrical task systems. We suggest a method for constructing equilibria in this model and further devise techniques for implementing it.
Auction protocols
"... The word “auction ” generally refers to a mechanism for allocating one or more resources to one or more parties (or bidders). Generally, once the allocation is determined, some amount of money changes hands; the precise monetary transfers are determined by the auction process. While in some auction ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
The word “auction ” generally refers to a mechanism for allocating one or more resources to one or more parties (or bidders). Generally, once the allocation is determined, some amount of money changes hands; the precise monetary transfers are determined by the auction process. While in some auction protocols, such as the English auction, bidders repeatedly increase their bids in an attempt to outbid each other, this is not an essential component of an auction. There are many other auction protocols, and we will study some of them in this chapter. Auctions have traditionally been studied mostly by economists. In recent years, computer scientists have also become interested in auctions, for a variety of reasons. Auctions can be useful for allocating various computing resources across users. In artificial intelligence, they can be used to allocate resources and tasks across multiple artificially intelligent “agents. ” Auctions are also important in electronic commerce: there are of course several wellknown auction websites, but additionally, search engines use auctions to sell advertising space on their results pages. Finally, increased computing power and improved algorithms have made new types of auctions possible—most notably combinatorial auctions, in which
Balloon Popping With Applications to Ascending Auctions
"... We study the power of ascending auctions in a scenario in which a seller is selling a collection of identical items to anonymous unitdemand bidders. We show that even with full knowledge of the set of bidders ’ private valuations for the items, if the bidders are exante identical, no ascending auc ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
We study the power of ascending auctions in a scenario in which a seller is selling a collection of identical items to anonymous unitdemand bidders. We show that even with full knowledge of the set of bidders ’ private valuations for the items, if the bidders are exante identical, no ascending auction can extract more than a constant times the revenue of the best fixed price scheme. This problem is equivalent to the problem of coming up with an optimal strategy for blowing up indistinguishable balloons with known capacities in order to maximize the amount of contained air. We show that the algorithm which simply inflates all balloons to a fixed volume is close to optimal in this setting.