Results 1 
9 of
9
Nearoptimal rates for limiteddelay universal lossy source coding.” in preparation
"... Abstract—We consider the problem of limiteddelay lossy coding of individual sequences. Here the goal is to design (fixedrate) compression schemes to minimize the normalized expected distortion redundancy relative to a reference class of coding schemes, measured as the difference between the averag ..."
Abstract

Cited by 8 (4 self)
 Add to MetaCart
(Show Context)
Abstract—We consider the problem of limiteddelay lossy coding of individual sequences. Here the goal is to design (fixedrate) compression schemes to minimize the normalized expected distortion redundancy relative to a reference class of coding schemes, measured as the difference between the average distortion of the algorithm and that of the best coding scheme in the reference class. In compressing a sequence of length T, the best schemes available in the literature achieve an O(T −1/3) normalized distortion redundancy relative to finite reference classes of limited delay and limited memory. It has also been shown that the distortion redundancy is at least of order 1 / √ T in certain cases. In this paper we narrow the gap between the upper and lower bounds, and give a compression scheme whose distortion redundancy is O ( √ ln(T)/T), only a logarithmic factor larger than the lower bound. The method is based on the recently introduced Shrinking Dartboard prediction algorithm, a variant of the exponentially weighted average prediction. Our method is also applied to the problem of zerodelay scalar quantization, where O(ln(T) / √ T) distortion redundancy is achieved relative to the (infinite) class of scalar quantizers of a given rate, almost achieving the known lower bound of order 1 / √ T. I.
Follow the Leader with Dropout Perturbations
 JMLR: WORKSHOP AND CONFERENCE PROCEEDINGS VOL 35:1–26, 2014
, 2014
"... We consider online prediction with expert advice. Over the course of many trials, the goal of the learning algorithm is to achieve small additional loss (i.e. regret) compared to the loss of the best from a set of K experts. The two most popular algorithms are Hedge/Weighted Majority and Follow the ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
We consider online prediction with expert advice. Over the course of many trials, the goal of the learning algorithm is to achieve small additional loss (i.e. regret) compared to the loss of the best from a set of K experts. The two most popular algorithms are Hedge/Weighted Majority and Follow the Perturbed Leader (FPL). The latter algorithm first perturbs the loss of each expert by independent additive noise drawn from a fixed distribution, and then predicts with the expert of minimum perturbed loss (“the leader”) where ties are broken uniformly at random. To achieve the optimal worstcase regret as a function of the loss L ∗ of the best expert in hindsight, the two types of algorithms need to tune their learning rate or noise magnitude, respectively, as a function of L∗. Instead of perturbing the losses of the experts with additive noise, we randomly set them to 0 or 1 before selecting the leader. We show that our perturbations are an instance of dropout — because experts may be interpreted as features — although for nonbinary losses the dropout probability needs to be made dependent on the losses to get good regret bounds. We show that this simple, tuningfree version of the FPL algorithm achieves two feats: optimal worstcase O( L ∗ lnK +
Importance weighting without importance weights: An efficient algorithm for combinatorial semibandits.
, 2015
"... Abstract We propose a sampleefficient alternative for importance weighting for situations where one only has sample access to the probability distribution that generates the observations. Our new method, called Geometric Resampling (GR), is described and analyzed in the context of online combinato ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Abstract We propose a sampleefficient alternative for importance weighting for situations where one only has sample access to the probability distribution that generates the observations. Our new method, called Geometric Resampling (GR), is described and analyzed in the context of online combinatorial optimization under semibandit feedback, where a learner sequentially selects its actions from a combinatorial decision set so as to minimize its cumulative loss. In particular, we show that the wellknown FollowthePerturbedLeader (FPL) prediction method coupled with Geometric Resampling yields the first computationally efficient reduction from offline to online optimization in this setting. We provide a thorough theoretical analysis for the resulting algorithm, showing that its performance is on par with previous, inefficient solutions. Our main contribution is showing that, despite the relatively large variance induced by the GR procedure, our performance guarantees hold with high probability rather than only in expectation. As a side result, we also improve the best known regret bounds for FPL in online combinatorial optimization with full feedback, closing the perceived performance gap between FPL and exponential weights in this setting.
Towards Minimax Online Learning with Unknown Time Horizon
"... We consider online learning when the time horizon is unknown. We apply a minimax analysis, beginning with the fixed horizon case, and then moving on to two unknownhorizon settings, one that assumes the horizon is chosen randomly according to some distribution, and the other which allows the adver ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
We consider online learning when the time horizon is unknown. We apply a minimax analysis, beginning with the fixed horizon case, and then moving on to two unknownhorizon settings, one that assumes the horizon is chosen randomly according to some distribution, and the other which allows the adversary full control over the horizon. For the random horizon setting with restricted losses, we derive a fully optimal minimax algorithm. And for the adversarial horizon setting, we prove a nontrivial lower bound which shows that the adversary obtains strictly more power than when the horizon is fixed and known. Based on the minimax solution of the random horizon setting, we then propose a new adaptive algorithm which “pretends ” that the horizon is drawn from a distribution from a special family, but no matter how the actual horizon is chosen, the worstcase regret is of the optimal rate. Furthermore, our algorithm can be combined and applied in many ways, for instance, to online convex optimization, follow the perturbed leader, exponential weights algorithm and first order bounds. Experiments show that our algorithm outperforms many other existing algorithms in an online linear optimization setting. 1.
RandomWalk Perturbations for Online Combinatorial Optimization
, 2013
"... AbstractWe study online combinatorial optimization problems where a learner is interested in minimizing its cumulative regret in the presence of switching costs. To solve such problems, we propose a version of the followtheperturbedleader algorithm in which the cumulative losses are perturbed by ..."
Abstract
 Add to MetaCart
(Show Context)
AbstractWe study online combinatorial optimization problems where a learner is interested in minimizing its cumulative regret in the presence of switching costs. To solve such problems, we propose a version of the followtheperturbedleader algorithm in which the cumulative losses are perturbed by independent symmetric random walks. In the general setting, our forecaster is shown to enjoy nearoptimal guarantees on both quantities of interest, making it the best known efficient algorithm for the studied problem. In the special case of prediction with expert advice, we show that the forecaster achieves an expected regret of the optimal order O( √ n log N ) where n is the time horizon and N is the number of experts, while guaranteeing that the predictions are switched at most O( √ n log N ) times, in expectation. Index TermsOnline learning, Online combinatorial optimization, Follow the Perturbed Leader, Random walk I. PRELIMINARIES In this paper we study the problem of online prediction with expert advice (see The usual goal for the standard prediction problem is to devise an algorithm such that the cumulative loss L n = Parameters: set of actions S ⊆ R d , number of rounds n; The environment chooses the loss vector t ∈ [0, 1] d for all t = 1, . . . , n. For all t = 1, 2, . . . , n, repeat 1) The forecaster chooses a probability distribution p t over S. 2) The forecaster draws an action V t randomly according to p t . 3) The environment reveals t . 4) The forecaster suffers loss V T t t . with high probability (where probability is with respect to the forecaster's randomization). Since we do not make any assumption on how the environment generates the losses t , we cannot hope to minimize the above loss. Instead, a meaningful goal is to minimize the performance gap between our algorithm and the strategy that selects the best action chosen in hindsight. This performance gap is called the regret and is defined formally as where we have also introduced the notation L * n = min v∈S v T n t=1 t . To gain simplicity in the presentation, we restrict our attention to the case of online combinatorial optimization in which S ⊂ {0, 1} d , that is, each action is represented as a binary vector. This special case arguably contains most important applications such as the online shortest path problem. In this example, a fixed directed acyclic graph of d edges is given with two distinguished vertices u and w. The forecaster, at every time instant t, chooses a directed path from u to w. Such a path is represented by its binary incidence vector v ∈ {0, 1} d . The components of the loss vector t ∈ [0, 1] d represent losses assigned to the d edges and v T t is the total loss assigned to the path v. Another (nonessential) simplifying assumption is that every action v ∈ S has the same number of 1's: v 1 = m for all v ∈ S. The value of m plays an important role in the bounds presented in the paper. A fundamental special case of the framework above is prediction with expert advice. In this setting, we have m = 1, d = N , and the learner has access to the unit vectors S = {e i } N i=1 as the decision set. Minimizing the regret in this setting is a wellstudied problem (see the book of CesaBianchi
Fighting Bandits with a New Kind of Smoothness
"... We provide a new analysis framework for the adversarial multiarmed bandit problem. Using the notion of convex smoothing, we define a novel family of algorithms with minimax optimal regret guarantees. First, we show that regularization via the Tsallis entropy, which includes EXP3 as a special case, ..."
Abstract
 Add to MetaCart
(Show Context)
We provide a new analysis framework for the adversarial multiarmed bandit problem. Using the notion of convex smoothing, we define a novel family of algorithms with minimax optimal regret guarantees. First, we show that regularization via the Tsallis entropy, which includes EXP3 as a special case, matches the O( NT) minimax regret with a smaller constant factor. Second, we show that a wide class of perturbation methods achieve a nearoptimal regret as low as O( NT logN), as long as the perturbation distribution has a bounded hazard function. For example, the Gumbel, Weibull, Frechet, Pareto, and Gamma distributions all satisfy this key property and lead to nearoptimal algorithms. 1
JMLR: Workshop and Conference Proceedings vol 35:1–17, 2014 Online Linear Optimization via Smoothing
"... We present a new optimizationtheoretic approach to analyzing FollowtheLeader style algorithms, particularly in the setting where perturbations are used as a tool for regularization. We show that adding a strongly convex penalty function to the decision rule and adding stochastic perturbations to ..."
Abstract
 Add to MetaCart
(Show Context)
We present a new optimizationtheoretic approach to analyzing FollowtheLeader style algorithms, particularly in the setting where perturbations are used as a tool for regularization. We show that adding a strongly convex penalty function to the decision rule and adding stochastic perturbations to data correspond to deterministic and stochastic smoothing operations, respectively. We establish an equivalence between “Follow the Regularized Leader ” and “Follow the Perturbed Leader ” up to the smoothness properties. This intuition leads to a new generic analysis framework that recovers and improves the previous known regret bounds of the class of algorithms commonly known as Follow the Perturbed Leader. 1.
PursuitEvasion Without Regret, with an Application to Trading
"... We propose a statebased variant of the classical online learning problem of tracking the best expert. In our setting, the actions of the algorithm and experts correspond to local moves through a continuous and bounded state space. At each step, Nature chooses payoffs as a function of each player ..."
Abstract
 Add to MetaCart
We propose a statebased variant of the classical online learning problem of tracking the best expert. In our setting, the actions of the algorithm and experts correspond to local moves through a continuous and bounded state space. At each step, Nature chooses payoffs as a function of each player’s current position and action. Our model therefore integrates the problem of prediction with expert advice with the stateful formalisms of reinforcement learning. Traditional noregret learning approaches no longer apply, but we propose a simple algorithm that provably achieves noregret when the state space is any convex Euclidean region. Our algorithm combines techniques from online learning with results from the literature on pursuitevasion games. We describe a natural quantitative trading application in which the convex region captures inventory risk constraints, and local moves limit market impact. Using historical market data, we show experimentally that our algorithm has a strong advantage over classic noregret approaches. 1.