Results 1  10
of
39
Adaptive Caching by Refetching
 In Advances in Neural Information Processing Systems 15
, 2002
"... We are constructing caching policies that have 1320% lower miss rates than the best of twelve baseline policies over a large variety of request streams. This represents an improvement of 4963% over Least Recently Used, the most commonly implemented policy. We achieve this not by designing a s ..."
Abstract

Cited by 31 (10 self)
 Add to MetaCart
We are constructing caching policies that have 1320% lower miss rates than the best of twelve baseline policies over a large variety of request streams. This represents an improvement of 4963% over Least Recently Used, the most commonly implemented policy. We achieve this not by designing a specific new policy but by using online Machine Learning algorithms to dynamically shift between the standard policies based on their observed miss rates. A thorough experimental evaluation of our techniques is given, as well as a discussion of what makes caching an interesting online learning problem.
Using Additive Expert Ensembles to Cope with Concept Drift
 In Proceedings of the 22nd International Conference on Machine Learning (ICML2005
, 2005
"... We consider online learning where the target concept can change over time. Previous work on expert prediction algorithms has bounded the worstcase performance on any subsequence of the training data relative to the performance of the best expert. However, because these “experts ” may be difficult t ..."
Abstract

Cited by 30 (1 self)
 Add to MetaCart
We consider online learning where the target concept can change over time. Previous work on expert prediction algorithms has bounded the worstcase performance on any subsequence of the training data relative to the performance of the best expert. However, because these “experts ” may be difficult to implement, we take a more general approach and bound performance relative to the actual performance of any online learner on this single subsequence. We present the additive expert ensemble algorithm AddExp, a new, general method for using any online learner for drifting concepts. We adapt techniques for analyzing expert prediction algorithms to prove mistake and loss bounds for a discrete and a continuous version of AddExp. Finally, we present pruning methods and empirical results for data sets with concept drift. 1.
Learning permutations with exponential weights
 In 20th Annual Conference on Learning Theory
, 2007
"... Abstract. We give an algorithm for learning a permutation online. The algorithm maintains its uncertainty about the target permutation as a doubly stochastic matrix. This matrix is updated by multiplying the current matrix entries by exponential factors. These factors destroy the doubly stochastic ..."
Abstract

Cited by 27 (5 self)
 Add to MetaCart
Abstract. We give an algorithm for learning a permutation online. The algorithm maintains its uncertainty about the target permutation as a doubly stochastic matrix. This matrix is updated by multiplying the current matrix entries by exponential factors. These factors destroy the doubly stochastic property of the matrix and an iterative procedure is needed to renormalize the rows and columns. Even though the result of the normalization procedure does not have a closed form, we can still bound the additional loss of our algorithm over the loss of the best permutation chosen in hindsight. 1
Randomized PCA algorithms with regret bounds that are logarithmic in the dimension
 In Advances in Neural Information Processing Systems 19 (NIPS 06
, 2006
"... We design an online algorithm for Principal Component Analysis. The instances are projected into a probabilistically chosen low dimensional subspace. The total expected quadratic approximation error equals the total quadratic approximation error of the best subspace chosen in hindsight plus some ad ..."
Abstract

Cited by 19 (7 self)
 Add to MetaCart
We design an online algorithm for Principal Component Analysis. The instances are projected into a probabilistically chosen low dimensional subspace. The total expected quadratic approximation error equals the total quadratic approximation error of the best subspace chosen in hindsight plus some additional term that grows linearly in dimension of the subspace but logarithmically in the dimension of the instances. 1
The online shortest path problem under partial monitoring
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2007
"... The online shortest path problem is considered under partial monitoring scenarios. At each round, a decision maker has to choose a path between two distinguished vertices of a weighted directed acyclic graph whose edge weights can change in an arbitrary (adversarial) way such that the loss of the ..."
Abstract

Cited by 19 (6 self)
 Add to MetaCart
The online shortest path problem is considered under partial monitoring scenarios. At each round, a decision maker has to choose a path between two distinguished vertices of a weighted directed acyclic graph whose edge weights can change in an arbitrary (adversarial) way such that the loss of the chosen path (defined as the sum of the weights of its composing edges) be small. In the multiarmed bandit setting, after choosing a path, the decision maker learns only the weights of those edges that belong to the chosen path. For this scenario, an algorithm is given whose average cumulative loss in n rounds exceeds that of the best path, matched offline to the entire sequence of the edge weights, by a quantity that is proportional to 1 / √n and depends only polynomially on the number of edges of the graph. The algorithm can be implemented with linear complexity in the number of rounds n and in the number of edges. This result improves earlier banditalgorithms which have performance bounds that either depend exponentially on the number of edges or converge to zero at a slower rate than O(1 / √n). An extension to the socalled label efficient setting is also given, where the decision maker is informed about the weight of the chosen path only with probability ɛ < 1. Applications to routing in packet switched networks along with simulation results are also presented.
Multitask learning with expert advice
, 2007
"... Abstract. We consider the problem of prediction with expert advice in the setting where a forecaster is presented with several online prediction tasks. Instead of competing against the best expert separately on each task, we assume the tasks are related, and thus we expect that a few experts will pe ..."
Abstract

Cited by 15 (4 self)
 Add to MetaCart
Abstract. We consider the problem of prediction with expert advice in the setting where a forecaster is presented with several online prediction tasks. Instead of competing against the best expert separately on each task, we assume the tasks are related, and thus we expect that a few experts will perform well on the entire set of tasks. That is, our forecaster would like, on each task, to compete against the best expert chosen from a small set of experts. While we describe the “ideal ” algorithm and its performance bound, we show that the computation required for this algorithm is as hard as computation of a matrix permanent. We present an efficient algorithm based on mixing priors, and prove a bound that is nearly as good for the sequential task presentation case. We also consider a harder case where the task may change arbitrarily from round to round, and we develop an efficient approximate randomized algorithm based on Markov chain Monte Carlo techniques. 1
Tracking the best of many experts
 in Proceedings of the 18th Annual Conference on Learning Theory, COLT 2005
, 2005
"... Abstract. An algorithm is presented for online prediction that allows to track the best expert efficiently even if the number of experts is exponentially large, provided that the set of experts has a certain structure allowing efficient implementations of the exponentially weighted average predictor ..."
Abstract

Cited by 14 (10 self)
 Add to MetaCart
Abstract. An algorithm is presented for online prediction that allows to track the best expert efficiently even if the number of experts is exponentially large, provided that the set of experts has a certain structure allowing efficient implementations of the exponentially weighted average predictor. As an example we work out the case where each expert is represented by a path in a directed graph and the loss of each expert is the sum of the weights over the edges in the path. 1
Hedging structured concepts
 In COLT
, 2010
"... We develop an online algorithm called Component Hedge for learning structured concept classes when the loss of a structured concept sums over its components. Example classes include paths through a graph (composed of edges) and partial permutations (composed of assignments). The algorithm maintains ..."
Abstract

Cited by 14 (3 self)
 Add to MetaCart
We develop an online algorithm called Component Hedge for learning structured concept classes when the loss of a structured concept sums over its components. Example classes include paths through a graph (composed of edges) and partial permutations (composed of assignments). The algorithm maintains a parameter vector with one nonnegative weight per component, which always lies in the convex hull of the structured concept class. The algorithm predicts by decomposing the current parameter vector into a convex combination of concepts and choosing one of those concepts at random. The parameters are updated by first performing a multiplicative update and then projecting back into the convex hull. We show that Component Hedge has optimal regret bounds for a large variety of structured concept classes. 1
Universal switching linear least squares prediction
 in Proc. of the 2006 Information Theory and its Applications Workshop. La Jolla, CA: UCSD
, 2006
"... In this paper we consider sequential regression of individual sequences under the squareerror loss. We focus on the class of switching linear predictors that can segment a given individual sequence into an arbitrary number of blocks within each of which a fixed linear regressor is applied. Using a ..."
Abstract

Cited by 13 (4 self)
 Add to MetaCart
In this paper we consider sequential regression of individual sequences under the squareerror loss. We focus on the class of switching linear predictors that can segment a given individual sequence into an arbitrary number of blocks within each of which a fixed linear regressor is applied. Using a competitive algorithm framework, we construct sequential algorithms that are competitive with the best linear regression algorithms for any segmenting of the data as well as the best partitioning of the data into any fixed number of segments, where both the segmenting of the data and the linear predictors within each segment can be tuned to the underlying individual sequence. The algorithms do not require knowledge of the data length or the number of piecewise linear segments used by the members of the competing class, yet can achieve the performance of the best member that can choose both the partitioning of the sequence as well as the best regressor within each segment. We use a transition diagram [1] to compete with an exponential number of algorithms in the class, using complexity that is linear in the data length. The regret with respect to the best member is O(ln(n)) per transition for not knowing the best transition times and O(ln(n)) for not knowing the best regressor within each segment, where n is the data length. We construct lower bounds on the performance of any sequential algorithm, demonstrating a form of minmax optimality under certain settings. We also consider the case where the members are restricted to choose the best algorithm in each segment from a finite collection of candidate algorithms. Performance on synthetic and real data are given along with a Matlab implementation of the universal switching linear predictor.