Results 1 
7 of
7
Sequential prediction of individual sequences under general loss functions
 IEEE Trans. Inform. Theory
, 1998
"... ..."
Tight WorstCase Loss Bounds for Predicting With Expert Advice
, 1994
"... this paper is somewhat different from the one just described. Assume that there are N experts E i , i = 1; : : : ; N , each trying to predict the outcomes y t as best they can. Let x t;i be the prediction of the ith expert E i about the ..."
Abstract

Cited by 51 (10 self)
 Add to MetaCart
this paper is somewhat different from the one just described. Assume that there are N experts E i , i = 1; : : : ; N , each trying to predict the outcomes y t as best they can. Let x t;i be the prediction of the ith expert E i about the
Tracking the Best Linear Predictor
 Journal of Machine Learning Research
, 2001
"... In most online learning research the total online loss of the algorithm is compared to the total loss of the best offline predictor u from a comparison class of predictors. We call such bounds static bounds. The interesting feature of these bounds is that they hold for an arbitrary sequence of ex ..."
Abstract

Cited by 51 (10 self)
 Add to MetaCart
In most online learning research the total online loss of the algorithm is compared to the total loss of the best offline predictor u from a comparison class of predictors. We call such bounds static bounds. The interesting feature of these bounds is that they hold for an arbitrary sequence of examples. Recently some work has been done where the predictor u t at each trial t is allowed to change with time, and the total online loss of the algorithm is compared to the sum of the losses of u t at each trial plus the total "cost" for shifting to successive predictors. This is to model situations in which the examples change over time, and different predictors from the comparison class are best for different segments of the sequence of examples. We call such bounds shifting bounds. They hold for arbitrary sequences of examples and arbitrary sequences of predictors. Naturally shifting bounds are much harder to prove. The only known bounds are for the case when the comparison class consists of a sequences of experts or boolean disjunctions. In this paper we develop the methodology for lifting known static bounds to the shifting case. In particular we obtain bounds when the comparison class consists of linear neurons (linear combinations of experts). Our essential technique is to project the hypothesis of the static algorithm at the end of each trial into a suitably chosen convex region. This keeps the hypothesis of the algorithm wellbehaved and the static bounds can be converted to shifting bounds.
OnLine Learning of Linear Functions
 Computational Complexity
, 1991
"... this paper, we present nearoptimal strategies for combining opinions in situations like this. In more abstract terms, we study the online learning of linear functions. We assume that learning proceeds in a sequence of trials. At trial number t the learning algorithm (the advisor) is presented with ..."
Abstract

Cited by 41 (18 self)
 Add to MetaCart
this paper, we present nearoptimal strategies for combining opinions in situations like this. In more abstract terms, we study the online learning of linear functions. We assume that learning proceeds in a sequence of trials. At trial number t the learning algorithm (the advisor) is presented with an instance ~x t 2 [0; 1]
Worstcase Quadratic Loss Bounds for Online Prediction of Linear Functions by Gradient Descent
 IEEE Transactions on Neural Networks
, 1993
"... this paper we study the performance of gradient descent when applied to the problem of online linear prediction in arbitrary inner product spaces. We show worstcase bounds on the sum of the squared prediction errors under various assumptions concerning the amount of a priori information about the ..."
Abstract

Cited by 31 (12 self)
 Add to MetaCart
this paper we study the performance of gradient descent when applied to the problem of online linear prediction in arbitrary inner product spaces. We show worstcase bounds on the sum of the squared prediction errors under various assumptions concerning the amount of a priori information about the sequence to predict. The algorithms we use are variants and extensions of online gradient descent. Whereas our algorithms always predict using linear functions as hypotheses, none of our results requires the data to be linearly related. In fact, the bounds proved on the total prediction loss are typically expressed as a function of the total loss of the best fixed linear predictor with bounded norm. All the upper bounds are tight to within constants. Matching lower bounds are provided in some cases. Finally, we apply our results to the problem of online prediction for classes of smooth functions. Keywords: prediction, WidrowHoff algorithm, gradient descent, smoothing, inner product spaces, computational learning theory, online learning, linear systems.
Worstcase Quadratic Loss Bounds for Prediction Using Linear Functions and Gradient Descent
, 1996
"... In this paper we study the performance of gradient descent when applied to the problem of online linear prediction in arbitrary inner product spaces. We show worstcase bounds on the sum of the squared prediction errors under various assumptions concerning the amount of a priori information about t ..."
Abstract

Cited by 27 (4 self)
 Add to MetaCart
In this paper we study the performance of gradient descent when applied to the problem of online linear prediction in arbitrary inner product spaces. We show worstcase bounds on the sum of the squared prediction errors under various assumptions concerning the amount of a priori information about the sequence to predict. The algorithms we use are variants and extensions of online gradient descent. Whereas our algorithms always predict using linear functions as hypotheses, none of our results requires the data to be linearly related. In fact, the bounds proved on the total prediction loss are typically expressed as a function of the total loss of the best fixed linear predictor with bounded norm. All the upper bounds are tight to within constants. Matching lower bounds are provided in some cases. Finally, we apply our results to the problem of online prediction for classes of smooth functions.
On the complexity of function learning
 in ``Proceedings, Sixth Annual ACM Conference on Computational Learning Theory
, 1993
"... ..."