Results 1 - 10
of
27
Tracking the best expert
- In Proceedings of the 12th International Conference on Machine Learning
, 1995
"... Abstract. We generalize the recent relative loss bounds for on-line algorithms where the additional loss of the algorithm on the whole sequence of examples over the loss of the best expert is bounded. The generalization allows the sequence to be partitioned into segments, and the goal is to bound th ..."
Abstract
-
Cited by 157 (17 self)
- Add to MetaCart
Abstract. We generalize the recent relative loss bounds for on-line algorithms where the additional loss of the algorithm on the whole sequence of examples over the loss of the best expert is bounded. The generalization allows the sequence to be partitioned into segments, and the goal is to bound the additional loss of the algorithm over the sum of the losses of the best experts for each segment. This is to model situations in which the examples change and different experts are best for certain segments of the sequence of examples. In the single segment case, the additional loss is proportional to log n, where n is the number of experts and the constant of proportionality depends on the loss function. Our algorithms do not produce the best partition; however the loss bound shows that our predictions are close to those of the best partition. When the number of segments is k +1and the sequence is of length ℓ, we can bound the additional loss of our algorithm over the best partition by O(k log n + k log(ℓ/k)). For the case when the loss per trial is bounded by one, we obtain an algorithm whose additional loss over the loss of the best partition is independent of the length of the sequence. The additional loss becomes O(k log n + k log(L/k)), where L is the loss of the best partition with k +1segments. Our algorithms for tracking the predictions of the best expert are simple adaptations of Vovk’s original algorithm for the single best expert case. As in the original algorithms, we keep one weight per expert, and spend O(1) time per weight in each trial.
Adaptive and Self-Confident On-Line Learning Algorithms
, 2000
"... We study on-line learning in the linear regression framework. Most of the performance bounds for on-line algorithms in this framework assume a constant learning rate. To achieve these bounds the learning rate must be optimized based on a posteriori information. This information depends on the wh ..."
Abstract
-
Cited by 50 (4 self)
- Add to MetaCart
We study on-line learning in the linear regression framework. Most of the performance bounds for on-line algorithms in this framework assume a constant learning rate. To achieve these bounds the learning rate must be optimized based on a posteriori information. This information depends on the whole sequence of examples and thus it is not available to any strictly on-line algorithm. We introduce new techniques for adaptively tuning the learning rate as the data sequence is progressively revealed. Our techniques allow us to prove essentially the same bounds as if we knew the optimal learning rate in advance. Moreover, such techniques apply to a wide class of on-line algorithms, including p-norm algorithms for generalized linear regression and Weighted Majority for linear regression with absolute loss. Our adaptive tunings are radically dierent from previous techniques, such as the so-called doubling trick. Whereas the doubling trick restarts the on-line algorithm several ti...
Tracking a Small Set of Experts by Mixing Past Posteriors
- JOURNAL OF MACHINE LEARNING RESEARCH
, 2002
"... In this paper, we examine on-line learning problems in which the target concept is allowed to change over time. In each trial a master algorithm receives predictions from a large set of n experts. Its goal is to predict almost as well as the best sequence of such experts chosen off-line by partit ..."
Abstract
-
Cited by 49 (8 self)
- Add to MetaCart
In this paper, we examine on-line learning problems in which the target concept is allowed to change over time. In each trial a master algorithm receives predictions from a large set of n experts. Its goal is to predict almost as well as the best sequence of such experts chosen off-line by partitioning the training sequence into k + 1 sections and then choosing the best expert for each section. We build on methods developed by Herbster and Warmuth and consider an open problem posed by Freund where the experts in the best partition are from a small pool of size m. Since k >> m, the best expert shifts back and forth between the experts of the small pool. We propose algorithms that solve this open problem by mixing the past posteriors maintained by the master algorithm. We relate the number of bits needed for encoding the best partition to the loss bounds of the algorithms. Instead of paying log n for choosing the best expert in each section we first pay log bits in the bounds for identifying the pool of m experts and then log m bits per new section. In the bounds we also pay twice for encoding the boundaries of the sections.
Competitive on-line statistics
- International Statistical Review
, 1999
"... A radically new approach to statistical modelling, which combines mathematical techniques of Bayesian statistics with the philosophy of the theory of competitive on-line algorithms, has arisen over the last decade in computer science (to a large degree, under the influence of Dawid’s prequential sta ..."
Abstract
-
Cited by 39 (7 self)
- Add to MetaCart
A radically new approach to statistical modelling, which combines mathematical techniques of Bayesian statistics with the philosophy of the theory of competitive on-line algorithms, has arisen over the last decade in computer science (to a large degree, under the influence of Dawid’s prequential statistics). In this approach, which we call “competitive on-line statistics”, it is not assumed that data are generated by some stochastic mechanism; the bounds derived for the performance of competitive on-line statistical procedures are guaranteed to hold (and not just hold with high probability or on the average). This paper reviews some results in this area; the new material in it includes the proofs for the performance of the Aggregating Algorithm in the problem of linear regression with square loss. Keywords: Bayes’s rule, competitive on-line algorithms, linear regression, prequential statistics, worst-case analysis.
Online learning of non-stationary sequences
- In Advances in Neural Information Processing Systems
, 2003
"... We consider an online learning scenario in which the learner can make predictions on the basis of a fixed set of experts. The performance of each expert may change over time in a manner unknown to the learner. We formulate a class of universal learning algorithms for this problem by expressing them ..."
Abstract
-
Cited by 17 (2 self)
- Add to MetaCart
We consider an online learning scenario in which the learner can make predictions on the basis of a fixed set of experts. The performance of each expert may change over time in a manner unknown to the learner. We formulate a class of universal learning algorithms for this problem by expressing them as simple Bayesian algorithms operating on models analogous to Hidden Markov Models (HMMs). We derive a new performance bound for such algorithms which is considerably simpler than existing bounds. The bound provides the basis for learning the rate at which the identity of the optimal expert switches over time. We find an analytic expression for the aprioriresolution at which we need to learn the rate parameter. We extend our scalar switching-rate result to models of the switching-rate that aregoverned by a matrix of parameters, i.e. arbitrary homogeneous HMMs. We apply and examine our algorithm in the context of the problem of energy management in
Tracking the best of many experts
- in Proceedings of the 18th Annual Conference on Learning Theory, COLT 2005
, 2005
"... Abstract. An algorithm is presented for online prediction that allows to track the best expert efficiently even if the number of experts is exponentially large, provided that the set of experts has a certain structure allowing efficient implementations of the exponentially weighted average predictor ..."
Abstract
-
Cited by 11 (8 self)
- Add to MetaCart
Abstract. An algorithm is presented for online prediction that allows to track the best expert efficiently even if the number of experts is exponentially large, provided that the set of experts has a certain structure allowing efficient implementations of the exponentially weighted average predictor. As an example we work out the case where each expert is represented by a path in a directed graph and the loss of each expert is the sum of the weights over the edges in the path. 1
The on-line shortest path problem under partial monitoring
- JOURNAL OF MACHINE LEARNING RESEARCH
, 2007
"... The on-line shortest path problem is considered under partial monitoring scenarios. At each round, a decision maker has to choose a path between two distinguished vertices of a weighted directed acyclic graph whose edge weights can change in an arbitrary (adversarial) way such that the loss of the ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
The on-line shortest path problem is considered under partial monitoring scenarios. At each round, a decision maker has to choose a path between two distinguished vertices of a weighted directed acyclic graph whose edge weights can change in an arbitrary (adversarial) way such that the loss of the chosen path (defined as the sum of the weights of its composing edges) be small. In the multi-armed bandit setting, after choosing a path, the decision maker learns only the weights of those edges that belong to the chosen path. For this scenario, an algorithm is given whose average cumulative loss in n rounds exceeds that of the best path, matched off-line to the entire sequence of the edge weights, by a quantity that is proportional to 1 / √n and depends only polynomially on the number of edges of the graph. The algorithm can be implemented with linear complexity in the number of rounds n and in the number of edges. This result improves earlier bandit-algorithms which have performance bounds that either depend exponentially on the number of edges or converge to zero at a slower rate than O(1 / √n). An extension to the so-called label efficient setting is also given, where the decision maker is informed about the weight of the chosen path only with probability ɛ < 1. Applications to routing in packet switched networks along with simulation results are also presented.
Tracking the best quantizer
, 2008
"... An algorithm is presented for online prediction that allows to track the best expert efficiently even when the number of experts is exponentially large, provided that the set of experts has a certain additive structure. As an example, we work out the case where each expert is represented by a path ..."
Abstract
-
Cited by 8 (4 self)
- Add to MetaCart
An algorithm is presented for online prediction that allows to track the best expert efficiently even when the number of experts is exponentially large, provided that the set of experts has a certain additive structure. As an example, we work out the case where each expert is represented by a path in a directed graph and the loss of each expert is the sum of the weights over the edges in the path. These results are then used to construct universal limiteddelay schemes for lossy coding of individual sequences. In particular, we consider the problem of tracking the best scalar quantizer that is adaptively matched to the source sequence with piecewise different behavior. A randomized algorithm is presented which can perform, on any source sequence, asymptotically as well as the best scalar quantization algorithm that is matched to the sequence and is allowed to change the employed quantizer for a given number of times. The complexity of the algorithm is quadratic in the sequence length, but at the price of some deterioration in performance, the complexity can be made linear. Analogous results are obtained for sequential multiresolution and multiple description scalar quantization of individual sequences.
Universal switching linear least squares prediction
- in Proc. of the 2006 Information Theory and its Applications Workshop. La Jolla, CA: UCSD
, 2006
"... In this paper we consider sequential regression of individual sequences under the square-error loss. We focus on the class of switching linear predictors that can segment a given individual sequence into an arbitrary number of blocks within each of which a fixed linear regressor is applied. Using a ..."
Abstract
-
Cited by 8 (4 self)
- Add to MetaCart
In this paper we consider sequential regression of individual sequences under the square-error loss. We focus on the class of switching linear predictors that can segment a given individual sequence into an arbitrary number of blocks within each of which a fixed linear regressor is applied. Using a competitive algorithm framework, we construct sequential algorithms that are competitive with the best linear regression algorithms for any segmenting of the data as well as the best partitioning of the data into any fixed number of segments, where both the segmenting of the data and the linear predictors within each segment can be tuned to the underlying individual sequence. The algorithms do not require knowledge of the data length or the number of piecewise linear segments used by the members of the competing class, yet can achieve the performance of the best member that can choose both the partitioning of the sequence as well as the best regressor within each segment. We use a transition diagram [1] to compete with an exponential number of algorithms in the class, using complexity that is linear in the data length. The regret with respect to the best member is O(ln(n)) per transition for not knowing the best transition times and O(ln(n)) for not knowing the best regressor within each segment, where n is the data length. We construct lower bounds on the performance of any sequential algorithm, demonstrating a form of min-max optimality under certain settings. We also consider the case where the members are restricted to choose the best algorithm in each segment from a finite collection of candidate algorithms. Performance on synthetic and real data are given along with a Matlab implementation of the universal switching linear predictor.
Learning with online constraints: shifting concepts and active learning
- PHD THESIS. MIT COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE LAB
, 2006
"... Many practical problems such as forecasting, real-time decision making, streaming data applications, and resource-constrained learning, can be modeled as learning with online constraints. This thesis is concerned with analyzing and designing algorithms for learning under the following online constra ..."
Abstract
-
Cited by 7 (5 self)
- Add to MetaCart
Many practical problems such as forecasting, real-time decision making, streaming data applications, and resource-constrained learning, can be modeled as learning with online constraints. This thesis is concerned with analyzing and designing algorithms for learning under the following online constraints: i) The algorithm has only sequential, or one-at-time, access to data. ii) The time and space complexity of the algorithm must not scale with the number of observations. We analyze learning with online constraints in a variety of settings, including active learning. The active learning model is applicable to any domain in which unlabeled data is easy to come by and there exists a (potentially difficult or expensive) mechanism by which to attain labels. First, we

