Results 1 
3 of
3
Worstcase Quadratic Loss Bounds for Online Prediction of Linear Functions by Gradient Descent
 IEEE Transactions on Neural Networks
, 1993
"... this paper we study the performance of gradient descent when applied to the problem of online linear prediction in arbitrary inner product spaces. We show worstcase bounds on the sum of the squared prediction errors under various assumptions concerning the amount of a priori information about the ..."
Abstract

Cited by 31 (12 self)
 Add to MetaCart
this paper we study the performance of gradient descent when applied to the problem of online linear prediction in arbitrary inner product spaces. We show worstcase bounds on the sum of the squared prediction errors under various assumptions concerning the amount of a priori information about the sequence to predict. The algorithms we use are variants and extensions of online gradient descent. Whereas our algorithms always predict using linear functions as hypotheses, none of our results requires the data to be linearly related. In fact, the bounds proved on the total prediction loss are typically expressed as a function of the total loss of the best fixed linear predictor with bounded norm. All the upper bounds are tight to within constants. Matching lower bounds are provided in some cases. Finally, we apply our results to the problem of online prediction for classes of smooth functions. Keywords: prediction, WidrowHoff algorithm, gradient descent, smoothing, inner product spaces, computational learning theory, online learning, linear systems.
Worstcase Quadratic Loss Bounds for Prediction Using Linear Functions and Gradient Descent
, 1996
"... In this paper we study the performance of gradient descent when applied to the problem of online linear prediction in arbitrary inner product spaces. We show worstcase bounds on the sum of the squared prediction errors under various assumptions concerning the amount of a priori information about t ..."
Abstract

Cited by 27 (4 self)
 Add to MetaCart
In this paper we study the performance of gradient descent when applied to the problem of online linear prediction in arbitrary inner product spaces. We show worstcase bounds on the sum of the squared prediction errors under various assumptions concerning the amount of a priori information about the sequence to predict. The algorithms we use are variants and extensions of online gradient descent. Whereas our algorithms always predict using linear functions as hypotheses, none of our results requires the data to be linearly related. In fact, the bounds proved on the total prediction loss are typically expressed as a function of the total loss of the best fixed linear predictor with bounded norm. All the upper bounds are tight to within constants. Matching lower bounds are provided in some cases. Finally, we apply our results to the problem of online prediction for classes of smooth functions.
On the Complexity of Function Learning
 In Proc. 6th Annu. Workshop on Comput. Learning Theory
, 1994
"... The majority of results in computational learning theory are concerned with concept learning, i.e. with the special case of function learning for classes of functions with range f0; 1g. Much less is known about the theory of learning functions with a larger range such as IN or IR. In particular rel ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
The majority of results in computational learning theory are concerned with concept learning, i.e. with the special case of function learning for classes of functions with range f0; 1g. Much less is known about the theory of learning functions with a larger range such as IN or IR. In particular relatively few results exist about the general structure of common models for function learning, and there are only very few nontrivial function classes for which positive learning results have been exhibited in any of these models. We introduce in this paper the notion of a binary branching adversary tree for function learning, which allows us to give a somewhat surprising equivalent characterization of the optimal learning cost for learning a class of realvalued functions (in terms of a maxmin definition which does not involve any "learning" model). Another general structural result of this paper relates the cost for learning a union of function classes to the learning costs for the individ...