Results 1  10
of
16
Minimax optimal algorithms for unconstrained linear optimization
 In Advances in Neural Information Processing Systems 27
, 2013
"... We design and analyze minimaxoptimal algorithms for online linear optimization games where the player’s choice is unconstrained. The player strives to minimize regret, the difference between his loss and the loss of a posthoc benchmark strategy. While the standard benchmark is the loss of the bes ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
We design and analyze minimaxoptimal algorithms for online linear optimization games where the player’s choice is unconstrained. The player strives to minimize regret, the difference between his loss and the loss of a posthoc benchmark strategy. While the standard benchmark is the loss of the best strategy chosen from a bounded comparator set, we consider a very broad range of benchmark functions. The problem is cast as a sequential multistage zerosum game, and we give a thorough analysis of the minimax behavior of the game, providing characterizations for the value of the game, as well as both the player’s and the adversary’s optimal strategy. We show how these objects can be computed efficiently under certain circumstances, and by selecting an appropriate benchmark, we construct a novel hedging strategy for an unconstrained betting game. 1
Learning Prices for Repeated Auctions with Strategic Buyers
"... Inspired by realtime ad exchanges for online display advertising, we consider the problem of inferring a buyer’s value distribution for a good when the buyer is repeatedly interacting with a seller through a postedprice mechanism. We model the buyer as a strategic agent, whose goal is to maximize ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Inspired by realtime ad exchanges for online display advertising, we consider the problem of inferring a buyer’s value distribution for a good when the buyer is repeatedly interacting with a seller through a postedprice mechanism. We model the buyer as a strategic agent, whose goal is to maximize her longterm surplus, and we are interested in mechanisms that maximize the seller’s longterm revenue. We define the natural notion of strategic regret — the lost revenue as measured against a truthful (nonstrategic) buyer. We present seller algorithms that are no(strategic)regret when the buyer discounts her future surplus — i.e. the buyer prefers showing advertisements to users sooner rather than later. We also give a lower bound on strategic regret that increases as the buyer’s discounting weakens and shows, in particular, that any seller algorithm will suffer linear strategic regret if there is no discounting. 1
Online Learning for Adversaries with Memory: Price of Past Mistakes
"... Abstract The framework of online learning with memory naturally captures learning problems with temporal effects, and was previously studied for the experts setting. In this work we extend the notion of learning with memory to the general Online Convex Optimization (OCO) framework, and present two ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract The framework of online learning with memory naturally captures learning problems with temporal effects, and was previously studied for the experts setting. In this work we extend the notion of learning with memory to the general Online Convex Optimization (OCO) framework, and present two algorithms that attain low regret. The first algorithm applies to Lipschitz continuous loss functions, obtaining optimal regret bounds for both convex and strongly convex losses. The second algorithm attains the optimal regret bounds and applies more broadly to convex losses without requiring Lipschitz continuity, yet is more complicated to implement. We complement the theoretical results with two applications: statistical arbitrage in finance, and multistep ahead prediction in statistics.
TTIC
"... We consider a Markov decision process with deterministic state transition dynamics, adversarially generated rewards that change arbitrarily from round to round, and a bandit feedback model in which the decision maker only observes the rewards it receives. In this setting, we present a novel and effi ..."
Abstract
 Add to MetaCart
We consider a Markov decision process with deterministic state transition dynamics, adversarially generated rewards that change arbitrarily from round to round, and a bandit feedback model in which the decision maker only observes the rewards it receives. In this setting, we present a novel and efficient online decision making algorithm named MarcoPolo. Under mild assumptions on the structure of the transition dynamics, we prove that MarcoPolo enjoys a regret of O(T 3/4√log T) against the best deterministic policy in hindsight. Specifically, our analysis does not rely on the stringent unichain assumption, which dominates much of the previous work on this topic. 1
Secondary User Data Capturing for Cognitive Radio Network Forensics under Capturing Uncertainty
"... Abstract—Secondary user data capturing is a fundamental building block for cognitive radio network forensics. It faces great challenges mainly due to the unknown secondary user behavior, wide spectrum, and packet capturing uncertainty. There is a lack of fundamental understanding of the data capturi ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract—Secondary user data capturing is a fundamental building block for cognitive radio network forensics. It faces great challenges mainly due to the unknown secondary user behavior, wide spectrum, and packet capturing uncertainty. There is a lack of fundamental understanding of the data capturing problem in theory. In this paper, for the first time, we formulate the dynamic sniffer channel assignment problem without the knowledge of users ’ behavior patterns as a nonstochastic multiarmed bandit (MAB) problem. Moreover, we consider a more practical scenario with the consideration of packet capturing uncertainty and switching cost. We then propose an efficient solution to solve the problem and analyze the regret of our policy. Finally, a simulation study validates the convergence of our method. I.
TTIC
"... We consider a Markov decision process with deterministic state transition dynamics, adversarially generated rewards that change arbitrarily from round to round, and a bandit feedback model in which the decision maker only observes the rewards it receives. In this setting, we present a novel and e ..."
Abstract
 Add to MetaCart
We consider a Markov decision process with deterministic state transition dynamics, adversarially generated rewards that change arbitrarily from round to round, and a bandit feedback model in which the decision maker only observes the rewards it receives. In this setting, we present a novel and efficient online decision making algorithm named MarcoPolo. Under mild assumptions on the structure of the transition dynamics, we prove that MarcoPolo enjoys a regret of O(T 3/4 log T) against the best deterministic policy in hindsight. Specifically, our analysis does not rely on the stringent unichain assumption, which dominates much of the previous work on this topic. 1
NonMyopic Learning in Repeated Stochastic Games
"... This paper addresses learning in repeated stochastic games (RSGs) played against unknown associates. Learning in RSGs is extremely challenging due to their inherently large strategy spaces. Furthermore, these games typically have multiple (often infinite) equilibria, making attempts to solve them ..."
Abstract
 Add to MetaCart
This paper addresses learning in repeated stochastic games (RSGs) played against unknown associates. Learning in RSGs is extremely challenging due to their inherently large strategy spaces. Furthermore, these games typically have multiple (often infinite) equilibria, making attempts to solve them via equilibrium analysis and rationality assumptions wholly insufficient. As such, previous learning algorithms for RSGs either learn very slowly or make extremely limiting assumptions about the game structure or associates ’ behaviors. In this paper, we propose and evaluate the notion of game abstraction by experts (Gabe) for twoplayer generalsum RSGs. Gabe reduces an RSG to a multiarmed bandit problem, which can then be solved using an expert algorithm. Gabe maintains many aspects of the original game, including security and Pareto optimal Nash equilibria. We demonstrate that Gabe substantially outperforms existing algorithms in many scenarios. 1
Revenue Optimization in PostedPrice Auctions with Strategic Buyers
"... We study revenue optimization learning algorithms for postedprice auctions with strategic buyers. We analyze a very broad family of monotone regret minimization algorithms for this problem, which includes the previously best known algorithm, and show that no algorithm in that family admits a strate ..."
Abstract
 Add to MetaCart
We study revenue optimization learning algorithms for postedprice auctions with strategic buyers. We analyze a very broad family of monotone regret minimization algorithms for this problem, which includes the previously best known algorithm, and show that no algorithm in that family admits a strategic regret more favorable than Ω( T). We then introduce a new algorithm that achieves a strategic regret differing from the lower bound only by a factor in O(log T), an exponential improvement upon the previous best algorithm. Our new algorithm admits a natural analysis and simpler proofs, and the ideas behind its design are general. We also report the results of empirical evaluations comparing our algorithm with the previous state of the art and show a consistent exponential improvement in several different scenarios. 1
Conditional Swap Regret and Conditional Correlated Equilibrium
"... We introduce a natural extension of the notion of swap regret, conditional swap regret, that allows for action modifications conditioned on the player’s action history. We prove a series of new results for conditional swap regret minimization. We present algorithms for minimizing conditional swap r ..."
Abstract
 Add to MetaCart
(Show Context)
We introduce a natural extension of the notion of swap regret, conditional swap regret, that allows for action modifications conditioned on the player’s action history. We prove a series of new results for conditional swap regret minimization. We present algorithms for minimizing conditional swap regret with bounded conditioning history. We further extend these results to the case where conditional swaps are considered only for a subset of actions. We also define a new notion of equilibrium, conditional correlated equilibrium, that is tightly connected to the notion of conditional swap regret: when all players follow conditional swap regret minimization strategies, then the empirical distribution approaches this equilibrium. Finally, we extend our results to the multiarmed bandit scenario. 1