Results 1  10
of
30
How to Use Expert Advice
 JOURNAL OF THE ASSOCIATION FOR COMPUTING MACHINERY
, 1997
"... We analyze algorithms that predict a binary value by combining the predictions of several prediction strategies, called experts. Our analysis is for worstcase situations, i.e., we make no assumptions about the way the sequence of bits to be predicted is generated. We measure the performance of the ..."
Abstract

Cited by 323 (66 self)
 Add to MetaCart
We analyze algorithms that predict a binary value by combining the predictions of several prediction strategies, called experts. Our analysis is for worstcase situations, i.e., we make no assumptions about the way the sequence of bits to be predicted is generated. We measure the performance of the algorithm by the difference between the expected number of mistakes it makes on the bit sequence and the expected number of mistakes made by the best expert on this sequence, where the expectation is taken with respect to the randomization in the predictions. We show that the minimum achievable difference is on the order of the square root of the number of mistakes of the best expert, and we give efficient algorithms that achieve this. Our upper and lower bounds have matching leading constants in most cases. We then show howthis leads to certain kinds of pattern recognition/learning algorithms with performance bounds that improve on the best results currently known in this context. We also compare our analysis to the case in which log loss is used instead of the expected number of mistakes.
Universal prediction of individual sequences
 IEEE Transactions on Information Theory
, 1992
"... AbstructThe problem of predicting the next outcome of an individual binary sequence using finite memory, is considered. The finitestate predictability of an infinite sequence is defined as the minimum fraction of prediction errors that can be made by any finitestate (FS) predictor. It is proved t ..."
Abstract

Cited by 161 (13 self)
 Add to MetaCart
AbstructThe problem of predicting the next outcome of an individual binary sequence using finite memory, is considered. The finitestate predictability of an infinite sequence is defined as the minimum fraction of prediction errors that can be made by any finitestate (FS) predictor. It is proved that this FS predictability can be attained by universal sequential prediction schemes. Specifically, an efficient prediction procedure based on the incremental parsing procedure of the LempelZiv data compression algorithm is shown to achieve asymptotically the FS predictability. Finally, some relations between compressibility and predictability are pointed out, and the predictability is proposed as an additional measure of the complexity of a sequence. Index TermsPredictability, compressibility, complexity, finitestate machines, Lempel Ziv algorithm.
Universal prediction
 IEEE TRANSACTIONS ON INFORMATION THEORY
, 1998
"... This paper consists of an overview on universal prediction from an informationtheoretic perspective. Special attention is given to the notion of probability assignment under the selfinformation loss function, which is directly related to the theory of universal data compression. Both the probabili ..."
Abstract

Cited by 137 (11 self)
 Add to MetaCart
This paper consists of an overview on universal prediction from an informationtheoretic perspective. Special attention is given to the notion of probability assignment under the selfinformation loss function, which is directly related to the theory of universal data compression. Both the probabilistic setting and the deterministic setting of the universal prediction problem are described with emphasis on the analogy and the differences between results in the two settings.
Universal Portfolios with Side Information
 IEEE Transactions on Information Theory
, 1996
"... We present a sequential investment algorithm, the ¯weighted universal portfolio with sideinformation, which achieves, to first order in the exponent, the same wealth as the best sideinformation dependent investment strategy (the best stateconstant rebalanced portfolio) determined in hindsight fr ..."
Abstract

Cited by 94 (4 self)
 Add to MetaCart
(Show Context)
We present a sequential investment algorithm, the ¯weighted universal portfolio with sideinformation, which achieves, to first order in the exponent, the same wealth as the best sideinformation dependent investment strategy (the best stateconstant rebalanced portfolio) determined in hindsight from observed market and sideinformation outcomes. This is an individual sequence result which shows that the difference between the exponential growth rates of wealth of the best stateconstant rebalanced portfolio and the universal portfolio with sideinformation is uniformly less than (d=(2n)) log(n + 1) + (k=n) log 2 for every stock market and sideinformation sequence and for all time n. Here d = k(m \Gamma 1) is the number of degrees of freedom in the stateconstant rebalanced portfolio with k states of sideinformation and m stocks. The proof of this result establishes a close connection between universal investment and universal data compression. Keywords: Universal investment, univ...
Fatshattering and the learnability of realvalued functions
 Journal of Computer and System Sciences
, 1996
"... We consider the problem of learning realvalued functions from random examples when the function values are corrupted with noise. With mild conditions on independent observation noise, we provide characterizations of the learnability of a realvalued function class in terms of a generalization of th ..."
Abstract

Cited by 62 (10 self)
 Add to MetaCart
(Show Context)
We consider the problem of learning realvalued functions from random examples when the function values are corrupted with noise. With mild conditions on independent observation noise, we provide characterizations of the learnability of a realvalued function class in terms of a generalization of the VapnikChervonenkis dimension, the fatshattering function, introduced by Kearns and Schapire. We show that, given some restrictions on the noise, a function class is learnable in our model if and only if its fatshattering function is finite. With different (also quite mild) restrictions, satisfied for example by gaussian noise, we show that a function class is learnable from polynomially many examples if and only if its fatshattering function grows polynomially. We prove analogous results in an agnostic setting, where there is no assumption of an underlying function class. 1
Universal linear prediction by model order weighting
 IEEE Transactions on Signal Processing
, 1999
"... Abstract—A common problem that arises in adaptive filtering, autoregressive modeling, or linear prediction is the selection of an appropriate order for the underlying linear parametric model. We address this problem for linear prediction, but instead of fixing a specific model order, we develop a se ..."
Abstract

Cited by 39 (18 self)
 Add to MetaCart
(Show Context)
Abstract—A common problem that arises in adaptive filtering, autoregressive modeling, or linear prediction is the selection of an appropriate order for the underlying linear parametric model. We address this problem for linear prediction, but instead of fixing a specific model order, we develop a sequential prediction algorithm whose sequentially accumulated average squared prediction error for any bounded individual sequence is as good as the performance attainable by the best sequential linear predictor of order less than some w. This predictor is found by transforming linear prediction into a problem analogous to the sequential probability assignment problem from universal coding theory. The resulting universal predictor uses essentially a performanceweighted average of all predictors for model orders less than w. Efficient lattice filters are used to generate the predictions of all the models recursively, resulting in a complexity of the universal algorithm that is no larger than that of the largest model order. Examples of prediction performance are provided for autoregressive and speech data as well as an example of adaptive data equalization. Index Terms—Adaptive filters, Bayes procedures, learning systems, least squares methods, model order, prediction methods,
Learning Changing Concepts by Exploiting the Structure of Change
, 1996
"... This paper examines learning problems in which the target function is allowed to change. The learner sees a sequence of random examples, labelled according to a sequence of functions, and must provide an accurate estimate of the target function sequence. We consider a variety of restrictions on how ..."
Abstract

Cited by 21 (0 self)
 Add to MetaCart
(Show Context)
This paper examines learning problems in which the target function is allowed to change. The learner sees a sequence of random examples, labelled according to a sequence of functions, and must provide an accurate estimate of the target function sequence. We consider a variety of restrictions on how the target function is allowed to change, including infrequent but arbitrary changes, sequences that correspond to slow walks on a graph whose nodes are functions, and changes that are small on average, as measured by the probability of disagreements between consecutive functions. We first study estimation, in which the learner sees a batch of examples and is then required to give an accurate estimate of the function sequence. Our results provide bounds on the sample complexity and allowable drift rate for these problems. We also study prediction, in which the learner must produce online a hypothesis after each labelled example and the average misclassification probability over this hypothes...
The cost of achieving the best portfolio in hindsight
 Math. Oper. Res
, 1998
"... For a market withm assets consider the minimum over all possible sequences of asset prices through time n of the ratio of the nal wealth of a nonanticipating investment strategy to the wealth obtained by the best constant rebalanced portfolio computed in hindsight for that price sequence. We show t ..."
Abstract

Cited by 21 (1 self)
 Add to MetaCart
For a market withm assets consider the minimum over all possible sequences of asset prices through time n of the ratio of the nal wealth of a nonanticipating investment strategy to the wealth obtained by the best constant rebalanced portfolio computed in hindsight for that price sequence. We show that the maximum value of this ratio over all nonanticipating investment strategies is V n
Universal Linear Least Squares Prediction: Upper and Lower Bounds
 IEEE Trans. Inf. Theory
, 2002
"... Universal linear least squares prediction of realvalued bounded individual sequences in the presence of additive bounded noise is considered. It is shown that there is a sequential predictor observing noisy samples of the sequence to be predicted only, whose loss in terms of the noisefree sequence ..."
Abstract

Cited by 18 (12 self)
 Add to MetaCart
(Show Context)
Universal linear least squares prediction of realvalued bounded individual sequences in the presence of additive bounded noise is considered. It is shown that there is a sequential predictor observing noisy samples of the sequence to be predicted only, whose loss in terms of the noisefree sequence is asymptotically as small as that of the best batch predictor out of the class of all linear predictors with knowledge of the entire noisy sequence in advance. Index Terms — Prediction, least squares, linear, noise 1.
On Sequential Strategies for Loss Functions with Memory
"... The problem of optimal sequential decision for individual sequences, relative to a class of competing oline reference strategies, is studied for general loss functions with memory. This problem is motivated by applications in which actions may have \long term" eects, or there is a cost for swi ..."
Abstract

Cited by 14 (3 self)
 Add to MetaCart
The problem of optimal sequential decision for individual sequences, relative to a class of competing oline reference strategies, is studied for general loss functions with memory. This problem is motivated by applications in which actions may have \long term" eects, or there is a cost for switching from one action to another. As a rst step, we consider the case in which the reference strategies are taken from a nite set of generic \experts." We then focus on nitestate reference strategies, assuming nite action and observation spaces. We show that key properties that hold for nitestate strategies in the context of memoryless loss functions, do not carry over to the case of loss functions with memory. As a result, an innite family of randomized nitestate strategies is seen to be the most appropriate reference class for this case, and the problem is basically dierent from its memoryless counterpart. Based on Vovk's exponential weighting technique, innitehorizon online decision schemes are devised. For an arbitrary sequence of observations of length n, the excess normalized loss of these schemes relative to the best expert in a corresponding reference class is shown to be upperbounded by an O(n 1=3 ) term in the case of a nite class, or an O([(ln n)=n] 1=3 ) term for the class of randomized nitestate strategies. These results parallel the O(n 1=2 ) bounds attained by previous schemes for memoryless loss functions. By letting the number of states in the reference class grow, the notion of nitestate predictability is also extended. Index Terms: Sequential decision, online algorithms, general loss functions, prediction, expert advice, randomized expert. Parts of this paper were presented at the 2000 International Symposium on Information Theo...