Results 1  10
of
16
Sequential procedures for aggregating arbitrary estimators of a conditional mean
, 2005
"... In this paper we describe and analyze a sequential procedure for aggregating linear combinations of a finite family of regression estimates, with particular attention to linear combinations having coefficients in the generalized simplex. The procedure is based on exponential weighting, and has a com ..."
Abstract

Cited by 28 (1 self)
 Add to MetaCart
In this paper we describe and analyze a sequential procedure for aggregating linear combinations of a finite family of regression estimates, with particular attention to linear combinations having coefficients in the generalized simplex. The procedure is based on exponential weighting, and has a computationally tractable approximation. Analysis of the procedure is based in part on techniques from the sequential prediction of nonrandom sequences. Here these techniques are applied in a stochastic setting to obtain cumulative loss bounds for the aggregation procedure. From the cumulative loss bounds we derive an oracle inequality for the aggregate estimator for an unbounded response having a suitable moment generating function. The inequality shows that the risk of the aggregate estimator is less than the risk of the best candidate linear combination in the generalized simplex, plus a complexity term that depends on the size of the coefficient set. The inequality readily yields convergence rates for aggregation over the unit simplex that are within logarithmic factors of known minimax bounds. Some preliminary results on model selection are also presented.
Strategies for Sequential Prediction of Stationary Time Series
, 2001
"... We present simple procedures for the prediction of a real valued sequence. The algorithms are based on a combination of several simple predictors. We show that if the sequence is a realization of a bounded stationary and ergodic random process then the average of squared errors converges, almost sur ..."
Abstract

Cited by 19 (7 self)
 Add to MetaCart
We present simple procedures for the prediction of a real valued sequence. The algorithms are based on a combination of several simple predictors. We show that if the sequence is a realization of a bounded stationary and ergodic random process then the average of squared errors converges, almost surely, to that of the optimum, given by the Bayes predictor. We oer an analog result for the prediction of stationary gaussian processes. The work of the second author was supported by DGES grant PB960300 0 1
Universal switching linear least squares prediction
 in Proc. of the 2006 Information Theory and its Applications Workshop. La Jolla, CA: UCSD
, 2006
"... In this paper we consider sequential regression of individual sequences under the squareerror loss. We focus on the class of switching linear predictors that can segment a given individual sequence into an arbitrary number of blocks within each of which a fixed linear regressor is applied. Using a ..."
Abstract

Cited by 13 (4 self)
 Add to MetaCart
In this paper we consider sequential regression of individual sequences under the squareerror loss. We focus on the class of switching linear predictors that can segment a given individual sequence into an arbitrary number of blocks within each of which a fixed linear regressor is applied. Using a competitive algorithm framework, we construct sequential algorithms that are competitive with the best linear regression algorithms for any segmenting of the data as well as the best partitioning of the data into any fixed number of segments, where both the segmenting of the data and the linear predictors within each segment can be tuned to the underlying individual sequence. The algorithms do not require knowledge of the data length or the number of piecewise linear segments used by the members of the competing class, yet can achieve the performance of the best member that can choose both the partitioning of the sequence as well as the best regressor within each segment. We use a transition diagram [1] to compete with an exponential number of algorithms in the class, using complexity that is linear in the data length. The regret with respect to the best member is O(ln(n)) per transition for not knowing the best transition times and O(ln(n)) for not knowing the best regressor within each segment, where n is the data length. We construct lower bounds on the performance of any sequential algorithm, demonstrating a form of minmax optimality under certain settings. We also consider the case where the members are restricted to choose the best algorithm in each segment from a finite collection of candidate algorithms. Performance on synthetic and real data are given along with a Matlab implementation of the universal switching linear predictor.
Universal piecewise linear prediction via context trees
 IEEE Transactions on Signal Processing, p. Accepted
, 2006
"... Abstract—This paper considers the problem of piecewise linear prediction from a competitive algorithm approach. In prior work, prediction algorithms have been developed that are “universal” with respect to the class of all linear predictors, such that they perform nearly as well, in terms of total s ..."
Abstract

Cited by 6 (4 self)
 Add to MetaCart
Abstract—This paper considers the problem of piecewise linear prediction from a competitive algorithm approach. In prior work, prediction algorithms have been developed that are “universal” with respect to the class of all linear predictors, such that they perform nearly as well, in terms of total squared prediction error, as the best linear predictor that is able to observe the entire sequence in advance. In this paper, we introduce the use of a “context tree, ” to compete against a doubly exponential number of piecewise linear (affine) models. We use the context tree to achieve the total squared prediction error performance of the best piecewise linear model that can choose both its partitioning of the regressor space and its realvalued prediction parameters within each region of the partition, based on observing the entire sequence in advance, uniformly, for every bounded individual sequence. This performance is achieved with a prediction algorithm whose complexity is only linear in the depth of the context tree per prediction. Upper bounds on the regret with respect to the best piecewise linear predictor are given for both the scalar and higher order case, and lower bounds on the regret are given for the scalar case. An explicit algorithmic description and examples demonstrating the performance of the algorithm are given. Index Terms—Context tree, piecewise linear, prediction, universal. I.
Universal piecewise linear least squares prediction
 in Proceedings of ISIT
, 2004
"... Abstract — We consider the problem of sequential prediction of realvalued sequences using piecewise linear models under the squareerror loss function. In this context, we demonstrate a sequential algorithm for prediction whose accumulated squared error for every bounded sequence is asymptotically ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
Abstract — We consider the problem of sequential prediction of realvalued sequences using piecewise linear models under the squareerror loss function. In this context, we demonstrate a sequential algorithm for prediction whose accumulated squared error for every bounded sequence is asymptotically as small as that of the best fixed predictor for that sequence taken from the class of piecewise linear predictors. We also show that this predictor is optimal in certain settings in a particular minmax sense. This approach can also be applied to the class of piecewise constant predictors, for which a similar universal sequential algorithm can be derived with corresponding minmax optimality. I. Summary In this paper, we consider the problem of predicting a sequence x n = {x[t]} n t=1 as well as the best piecewise linear predictor out of a large, continuous class of piecewise linear predictors. The realvalued sequence x n is assumed to be bounded, i.e. x[t]  ≤A for some A<∞, for all t. Rather than assuming a statistical ensemble of sequences, and attempting to achieve optimal performance according to some statistical criterion, our goal is to predict any sequence x n as well as the best predictor out of a large class of predictors. We first consider the class of fixed scalar piecewise linear predictors as our competition class. For a scalar piecewise linear predictor, the past observation space x[t − 1] ∈ [−A, A] is parsed into K disjoint regions Rj where ⋃K j=1 Rj =[−A, A]. At each time t, the competing predictor forms its prediction as ˆxw j [t] = wjx[t − 1], wj ∈ R, when x[t − 1] ∈ Rj. We assume that the number of regions and the region boundaries are known. Here, we seek to minimize the following regret: n∑ sup x n t=1 (x[t] − ˆxq[t]) 2 − inf
Universal Context Tree Least Squares Prediction
"... Abstract — We investigate the problem of sequential prediction of individual sequences using a competitive algorithm approach. We have previously developed prediction algorithms that are universal with respect to the class of all linear predictors, such that the prediction algorithm competes against ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
Abstract — We investigate the problem of sequential prediction of individual sequences using a competitive algorithm approach. We have previously developed prediction algorithms that are universal with respect to the class of all linear predictors, such that the prediction algorithm competes against a continuous class of prediction algorithms, under the square error loss. In this paper, we introduce the use of a “context tree, ” to compete against a doubly exponential number of piecewise linear models. We use the context tree to achieve the performance of the best piecewise linear model that can choose its partition of the real line and realvalued prediction parameters, based on observing the entire sequence in advance, for the square error loss, uniformly, for any individual sequence. This performance is achieved with a prediction algorithm whose complexity is only linear in the depth of the context tree. I.
A Lower bound on the Performance of Sequential Prediction
"... Abstract We consider the problem of sequential linear prediction of realvalued sequences under the squareerror loss function. For this problem, a prediction algorithm has been demonstrated [l][2] whose accumulated squared prediction error, for every bounded sequence, is asymptotically as small as ..."
Abstract
 Add to MetaCart
Abstract We consider the problem of sequential linear prediction of realvalued sequences under the squareerror loss function. For this problem, a prediction algorithm has been demonstrated [l][2] whose accumulated squared prediction error, for every bounded sequence, is asymptotically as small as the best fixed linear predictor for that sequence, taken from the class of all linear predictors of a given order p. The redundancy, or excess prediction error above that of the best predictor for that sequence, is upper bounded by A2pln(n)/n, where n is the data length and the sequence is assumed to be bounded by some A. In this paper, we show that this predictor is optimal in a minmax sense, by deriving a corresponding lower bound, such that no sequential predictor can ever do better than a redundancy of A2p In(n)/n.
Universal Linear Least Squares Prediction:
"... Abstract—We consider the problem of sequential linear prediction of realvalued sequences under the squareerror loss function. For this problem, a prediction algorithm has been demonstrated [1]–[3] whose accumulated squared prediction error, for every bounded sequence, is asymptotically as small as ..."
Abstract
 Add to MetaCart
Abstract—We consider the problem of sequential linear prediction of realvalued sequences under the squareerror loss function. For this problem, a prediction algorithm has been demonstrated [1]–[3] whose accumulated squared prediction error, for every bounded sequence, is asymptotically as small as the best fixed linear predictor for that sequence, taken from the class of all linear predictors of a given order. The redundancy, or excess prediction error above that of the best predictor for that sequence, is upperbounded by ln ( ) , where is the data length and the sequence is assumed to be bounded by some this correspondence, we provide an alternative proof of this result by connecting it with universal probability assignment. We then show that this predictor is optimal in a min–max sense, by deriving a corresponding lower bound, such that no sequential predictor can ever do better than a redundancy of ln ( ). Index Terms—Min–max, prediction, sequential probability assignment, universal algorithms. I.
Competitive Prediction Under Additive Noise
"... Abstract—In this correspondence, we consider sequential prediction of a realvalued individual signal from its past noisy samples, under square error loss. We refrain from making any stochastic assumptions on the generation of the underlying desired signal and try to achieve uniformly good performan ..."
Abstract
 Add to MetaCart
Abstract—In this correspondence, we consider sequential prediction of a realvalued individual signal from its past noisy samples, under square error loss. We refrain from making any stochastic assumptions on the generation of the underlying desired signal and try to achieve uniformly good performance for any deterministic and arbitrary individual signal. We investigate this problem in a competitive framework, where we construct algorithms that perform as well as the best algorithm in a competing class of algorithms for each desired signal. Here, the best algorithm in the competition class can be tuned to the underlying desired clean signal even before processing any of the data. Three different frameworks under additive noise are considered: the class of a finite number of algorithms; the class of all th order linear predictors (for some fixed order); and finally the class of all switching th order linear predictors. Index Terms—Additive noise, competitive, real valued, sequential decisions, universal prediction. I.