Results 1 -
8 of
8
Efficient Tracking of Large Classes of Experts
, 2011
"... In the framework for prediction of individual sequences, sequential prediction methods are to be constructed that perform nearly as well as the best expert from a given class. We consider prediction strategies that compete with the class of switching strategies that can segment a given sequence into ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
(Show Context)
In the framework for prediction of individual sequences, sequential prediction methods are to be constructed that perform nearly as well as the best expert from a given class. We consider prediction strategies that compete with the class of switching strategies that can segment a given sequence into several blocks, and follow the advice of a different “base ” expert in each block. As usual, the performance of the algorithm is measured by the regret defined as the excess loss relative to the best switching strategy selected in hindsight for the particular sequence to be predicted. In this paper we construct prediction strategies of low computational cost for the case where the set of base experts is large. In particular we derive a family of efficient tracking algorithms that, for any prediction algorithm A designed for the base class, can be implemented with time and space complexity O(n γ log n) times larger than that of A, where n is the time horizon and γ ≥ 0 is a parameter of the algorithm. With A properly chosen, our algorithm achieves a regret bound of optimal order for γ> 0, and only O(log n) times larger than the optimal order for γ = 0 for all typical regret bound types we examined. For example, for predicting binary sequences with switching parameters, our method achieves the optimal O(log n) regret rate with time complexity O(n 1+γ log n) for any γ ∈ (0, 1).
Prediction by Random-Walk Perturbation
"... We propose a version of the follow-the-perturbed-leader online prediction algorithm in which the cumulative losses are perturbed by independent symmetric random walks. The forecaster is shown to achieve an expected regret of the optimal order O ( √ n log N) where n is the time horizon and N is the ..."
Abstract
-
Cited by 9 (3 self)
- Add to MetaCart
We propose a version of the follow-the-perturbed-leader online prediction algorithm in which the cumulative losses are perturbed by independent symmetric random walks. The forecaster is shown to achieve an expected regret of the optimal order O ( √ n log N) where n is the time horizon and N is the number of experts. More importantly, it is shown that the forecaster changes its prediction at most O ( √ n log N) times, in expectation. We also extend the analysis to online combinatorial optimization and show that even in this more general setting, the forecaster rarely switches between experts while having a regret of near-optimal order.
Exploiting easy data in online optimization
"... Abstract We consider the problem of online optimization, where a learner chooses a decision from a given decision set and suffers some loss associated with the decision and the state of the environment. The learner's objective is to minimize its cumulative regret against the best fixed decisio ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Abstract We consider the problem of online optimization, where a learner chooses a decision from a given decision set and suffers some loss associated with the decision and the state of the environment. The learner's objective is to minimize its cumulative regret against the best fixed decision in hindsight. Over the past few decades numerous variants have been considered, with many algorithms designed to achieve sub-linear regret in the worst case. However, this level of robustness comes at a cost. Proposed algorithms are often over-conservative, failing to adapt to the actual complexity of the loss sequence which is often far from the worst case. In this paper we introduce a general algorithm that, provided with a "safe" learning algorithm and an opportunistic "benchmark", can effectively combine good worst-case guarantees with much improved performance on "easy" data. We derive general theoretical bounds on the regret of the proposed algorithm and discuss its implementation in a wide range of applications, notably in the problem of learning with shifting experts (a recent COLT open problem). Finally, we provide numerical simulations in the setting of prediction with expert advice with comparisons to the state of the art.
Online Learning for Adversaries with Memory: Price of Past Mistakes
"... Abstract The framework of online learning with memory naturally captures learning problems with temporal effects, and was previously studied for the experts setting. In this work we extend the notion of learning with memory to the general Online Convex Optimization (OCO) framework, and present two ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract The framework of online learning with memory naturally captures learning problems with temporal effects, and was previously studied for the experts setting. In this work we extend the notion of learning with memory to the general Online Convex Optimization (OCO) framework, and present two algorithms that attain low regret. The first algorithm applies to Lipschitz continuous loss functions, obtaining optimal regret bounds for both convex and strongly convex losses. The second algorithm attains the optimal regret bounds and applies more broadly to convex losses without requiring Lipschitz continuity, yet is more complicated to implement. We complement the theoretical results with two applications: statistical arbitrage in finance, and multi-step ahead prediction in statistics.
Random-Walk Perturbations for Online Combinatorial Optimization
, 2013
"... Abstract-We study online combinatorial optimization problems where a learner is interested in minimizing its cumulative regret in the presence of switching costs. To solve such problems, we propose a version of the follow-the-perturbedleader algorithm in which the cumulative losses are perturbed by ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract-We study online combinatorial optimization problems where a learner is interested in minimizing its cumulative regret in the presence of switching costs. To solve such problems, we propose a version of the follow-the-perturbedleader algorithm in which the cumulative losses are perturbed by independent symmetric random walks. In the general setting, our forecaster is shown to enjoy near-optimal guarantees on both quantities of interest, making it the best known efficient algorithm for the studied problem. In the special case of prediction with expert advice, we show that the forecaster achieves an expected regret of the optimal order O( √ n log N ) where n is the time horizon and N is the number of experts, while guaranteeing that the predictions are switched at most O( √ n log N ) times, in expectation. Index Terms-Online learning, Online combinatorial optimization, Follow the Perturbed Leader, Random walk I. PRELIMINARIES In this paper we study the problem of online prediction with expert advice (see The usual goal for the standard prediction problem is to devise an algorithm such that the cumulative loss L n = Parameters: set of actions S ⊆ R d , number of rounds n; The environment chooses the loss vector t ∈ [0, 1] d for all t = 1, . . . , n. For all t = 1, 2, . . . , n, repeat 1) The forecaster chooses a probability distribution p t over S. 2) The forecaster draws an action V t randomly according to p t . 3) The environment reveals t . 4) The forecaster suffers loss V T t t . with high probability (where probability is with respect to the forecaster's randomization). Since we do not make any assumption on how the environment generates the losses t , we cannot hope to minimize the above loss. Instead, a meaningful goal is to minimize the performance gap between our algorithm and the strategy that selects the best action chosen in hindsight. This performance gap is called the regret and is defined formally as where we have also introduced the notation L * n = min v∈S v T n t=1 t . To gain simplicity in the presentation, we restrict our attention to the case of online combinatorial optimization in which S ⊂ {0, 1} d , that is, each action is represented as a binary vector. This special case arguably contains most important applications such as the online shortest path problem. In this example, a fixed directed acyclic graph of d edges is given with two distinguished vertices u and w. The forecaster, at every time instant t, chooses a directed path from u to w. Such a path is represented by its binary incidence vector v ∈ {0, 1} d . The components of the loss vector t ∈ [0, 1] d represent losses assigned to the d edges and v T t is the total loss assigned to the path v. Another (non-essential) simplifying assumption is that every action v ∈ S has the same number of 1's: v 1 = m for all v ∈ S. The value of m plays an important role in the bounds presented in the paper. A fundamental special case of the framework above is prediction with expert advice. In this setting, we have m = 1, d = N , and the learner has access to the unit vectors S = {e i } N i=1 as the decision set. Minimizing the regret in this setting is a well-studied problem (see the book of Cesa-Bianchi
Technion
"... We study a new class of online learning problems where each of the online algorithm’s actions is assigned an adversarial value, and the loss of the algorithm at each step is a known and determin-istic function of the values assigned to its recent actions. This class includes problems where the algor ..."
Abstract
- Add to MetaCart
We study a new class of online learning problems where each of the online algorithm’s actions is assigned an adversarial value, and the loss of the algorithm at each step is a known and determin-istic function of the values assigned to its recent actions. This class includes problems where the algorithm’s loss is the minimum over the recent adversarial values, the maximum over the recent values, or a linear combination of the recent values. We analyze the minimax regret of this class of problems when the algorithm receives bandit feedback, and prove that when the minimum or maximum functions are used, the minimax regret is Ω̃(T 2/3) (so called hard online learning prob-lems), and when a linear function is used, the minimax regret is Õ( T) (so called easy learning problems). Previously, the only online learning problem that was known to be provably hard was the multi-armed bandit with switching costs. 1.