## Switching between Predictors with an Application in Density Estimation (2007)

### BibTeX

@MISC{Erven07switchingbetween,

author = {Tim Van Erven and Steven De Rooij and Peter Grünwald},

title = {Switching between Predictors with an Application in Density Estimation},

year = {2007}

}

### OpenURL

### Abstract

Universal coding is the standard technique for combining multiple predictors. This technique is explicitly used in minimum description length modeling, and implicitly in Bayesian modeling. Using universal coding, one can predict nearly as well as the best single predictor. When the predictors are themselves universal codes for models (sets of predictors) with varying number of parameters, however, we may often achieve smaller loss by switching between predictors in a different manner, which takes the local relative behaviour of the predictors into account. In this paper we present the switch-code, which implements this idea. It can be applied to coding, model selection, prediction and density estimation problems. As a proof of concept we give a particular application to histogram density estimation. We show that the switch-code achieves smaller redundancy, O(n 1/3 log log n), than standard universal coding, which achieves O(n 1/3 (log n) 2/3). 1

### Citations

189 |
Statistical theory: the prequential approach
- Dawid
- 1984
(Show Context)
Citation Context ...ategies (predictors): functions from finite sequences over a sample space X to probability distributions on the next outcome. Such predictors are sometimes also called prequential forecasting systems =-=[3]-=-. We write P(xn+1|x1,...,xn) for the probability of xn+1 ∈ X given previous observations xn = (x1,...,xn) ∈ X n , and we abbreviate P(xn |xm) := ∏n i=m+1 P(xi|xi−1 ) and write P(xn ) when m = 0. The p... |

168 | T.: The context-tree weighting method: Basic properties
- Willems, Shtarkov, et al.
- 1995
(Show Context)
Citation Context ...Bayesian and MDL approaches to statistical model selection, prediction and density estimation problems [4]; and (b) in some state-of-the-art data compressors such as the context-tree weighting method =-=[12]-=-. In this preliminary paper, we merely highlight a single application where the use of the switch-code improves over other, existing methods: nonparametric density estimation based on histograms with ... |

150 |
The Minimum Description Length Principle
- Grünwald
- 2007
(Show Context)
Citation Context ...nd M2. This setting is sometimes called twice-universal prediction [9]. It is encountered in (a) Bayesian and MDL approaches to statistical model selection, prediction and density estimation problems =-=[4]-=-; and (b) in some state-of-the-art data compressors such as the context-tree weighting method [12]. In this preliminary paper, we merely highlight a single application where the use of the switch-code... |

135 |
The performance of universal encoding
- Krichevsky, Trofimov
- 1981
(Show Context)
Citation Context ...ed than M1. We provide an example in Figure 1, which shows the difference in accumulated loss for two predictors P1 and P2 on “The War of the Worlds” by H.G. Wells. P1 is the Krichevsky-Trofimov (KT) =-=[5]-=- predictor for first-order Markov chains, P2 is the KT predictor for second-order Markov chains. Clearly P1 is the best predictor for about the first 50 000 outcomes, after which it is overtaken by P2... |

61 | Tracking a small set of experts by mixing past posteriors
- Bousquet, Warmuth
- 2002
(Show Context)
Citation Context ...m Density Estimation Rissanen, Speed and Yu [8] consider density estimation based on histogram models with equal-width bins relative to a restricted set T of ‘true’ densities on the unit interval X = =-=[0, 1]-=-. The restriction on T is that there should exist constants 0 < c0 < 1 < c1 such that for every f ∈ T , for all x ∈ X, c0 ≤ f(x) ≤ c1 and |f ′ (x)| ≤ cf, where f ′ denotes the first derivative of f an... |

50 |
Density estimation by stochastic complexity
- Rissanen, Speed, et al.
- 1992
(Show Context)
Citation Context ... improves over other, existing methods: nonparametric density estimation based on histograms with bins of equal width. Density estimation with histograms is considered by Rissanen, Speed and Yu (RSY) =-=[8, 13]-=-, who show that in estimating a differentiable density that is bounded away from zero and infinity, it is asymptotically optimal in expectation to let the number of histogram bins increase as ⌈n 1/3 ⌉... |

46 |
Stochastic complexity in statistical inquiry, volume 15
- Rissanen
- 1989
(Show Context)
Citation Context ...itive integers. L ∗ (n) = c+log n+log log n+... for n ∈ Z + , where the sequence of nested logarithms includes all positive terms, Z + denotes the positive integers and c ≈ 1.5 is a positive constant =-=[7]-=-. 5 Discussion Related Work The idea of universal coding and prediction is to construct a “universal” predictor that performs nearly as well as the best single predictor in some given comparison class... |

21 | A comparison of scientific and engineering criteria for bayesian model selection
- Heckerman, Chickering
- 1997
(Show Context)
Citation Context ...not just the past behaviour, but also the future behaviour of each predictor. The fact that the two are related, but different, is investigated from a Bayesian perspective by Chickering and Heckerman =-=[2]-=-. In this context, we should also point out a note by MacKay [6] on the relation between Bayesian model comparison and leave-one-out cross-validation, where he predicts that “cross-validation would be... |

15 |
Twice-universal coding
- Ryabko
- 1984
(Show Context)
Citation Context ... universal predictors (like e.g. Pmix) whenever P1 and P2 are themselves universal predictors relative to some underlying models M1 and M2. This setting is sometimes called twice-universal prediction =-=[9]-=-. It is encountered in (a) Bayesian and MDL approaches to statistical model selection, prediction and density estimation problems [4]; and (b) in some state-of-the-art data compressors such as the con... |

14 |
Weighting techniques in data compression: Theory and algorithms
- Volf
- 2002
(Show Context)
Citation Context ...ors, rather than the best single predictor, is not at all new. It was considered earlier, by, for example, Bousquet and Warmuth [1]; in a data compression context, a similar idea was explored by Volf =-=[11]-=-. The main difference from these earlier works is that the switch-code has been specifically designed for settings where we would normally consider twice-universal coding. Then it commonly happens tha... |

7 |
Data compression and histograms
- Yu, Speed
- 1992
(Show Context)
Citation Context ... improves over other, existing methods: nonparametric density estimation based on histograms with bins of equal width. Density estimation with histograms is considered by Rissanen, Speed and Yu (RSY) =-=[8, 13]-=-, who show that in estimating a differentiable density that is bounded away from zero and infinity, it is asymptotically optimal in expectation to let the number of histogram bins increase as ⌈n 1/3 ⌉... |

2 |
Bayesian methods for neural networks - FAQ, 2004. www.inference.phy.cam.ac.uk/mackay/Bayes FAQ.html
- MacKay
(Show Context)
Citation Context ...ch predictor. The fact that the two are related, but different, is investigated from a Bayesian perspective by Chickering and Heckerman [2]. In this context, we should also point out a note by MacKay =-=[6]-=- on the relation between Bayesian model comparison and leave-one-out cross-validation, where he predicts that “cross-validation would be the better method for predicting generalisation error”.Future ... |

1 |
The momentum problem in MDL and Bayesian prediction
- Erven
- 2006
(Show Context)
Citation Context ...mulates less loss on those outcomes! The Switch-Code For such cases, we have developed an alternative method to combine predictors P1 and P2 into a single predictor Psw, which we call the switch-code =-=[10]-=-. Given a switch-point s at which to switch from P1 to P2, it predicts according to Psw(xn+1|x n { P1(xn+1|x ,s) := n ) if n < s P2(xn+1|xn (2) ) otherwise. The optimal switch-point, however, will typ... |