## The Context Tree Weighting Method: Basic Properties (1995)

Venue: | IEEE Transactions on Information Theory |

Citations: | 79 - 1 self |

### BibTeX

@ARTICLE{Willems95thecontext,

author = {Frans M. J. Willems and Yuri M. Shtarkov and Tjalling J. Tjalkens},

title = {The Context Tree Weighting Method: Basic Properties},

journal = {IEEE Transactions on Information Theory},

year = {1995},

volume = {41},

pages = {653--664}

}

### Years of Citing Articles

### OpenURL

### Abstract

We describe a sequential universal data compression procedure for binary tree sources that performs the "double mixture". Using a context tree, this method weights in an efficient recursive way the coding distributions corresponding to all bounded memory tree sources, and achieves a desirable coding distribution for tree sources with an unknown model and unknown parameters. Computational and storage complexity of the proposed procedure are both linear in the source sequence length. We derive a natural upper bound on the cumulative redundancy of our method for individual sequences. The three terms in this bound can be identified as coding, parameter and model redundancy. The bound holds for all source sequence lengths, not only for asymptotically large lengths. The analysis that leads to this bound is based on standard techniques and turns out to be extremely simple. Our upper bound on the redundancy shows that the proposed context tree weighting procedure is optimal in the sense that i...

### Citations

8565 |
Elements of information theory
- Cover, Thomas
- 1991
(Show Context)
Citation Context ... this functional relationship we write c L (x T 1 jx 0 1\GammaD ): The length of the codeword, in binary digits, is denoted as L(x T 1 jx 0 1\GammaD ). We restrict ourselves to prefix codes here (see =-=[3]-=-, chapter 5). These codes are not only uniquely decodable but also instantaneous or self-punctuating which implies that you can immediately recognize a codeword when you see it. The set of codewords t... |

498 |
Stochastic Complexity
- Rissanen
- 1989
(Show Context)
Citation Context ...\Delta 1 p a + b ( a a + b ) a ( b a + b ) b : (10) The sequential behavior of the KT-estimator was studied by Shtarkov[17]. An other estimator, the Laplace estimator, is investigated by Rissanen[10],=-=[11]-=-. This estimator can be obtained by weighting (1 \Gamma `) a ` b with ` uniform over [0; 1]. For the KT-estimator the parameter redundancy can be uniformly bounded, using the lowerbound (see lemma 1) ... |

285 |
Universal coding, information, prediction, and estimation
- Rissanen
- 1984
(Show Context)
Citation Context ...the current state of the tree source. These authors were also able to demonstrate that their coding procedure achieves asymptotically the lower bound on the average redundancy, as stated by Rissanen (=-=[9]-=-, theorem 1, or [10], theorem 1). Recently Weinberger, Rissanen and Feder[21] could prove the optimality, in the sense of achieving Rissanen's lower bound on the redundancy, of an algorithm similar to... |

160 |
A universal data compression system
- Rissanen
- 1983
(Show Context)
Citation Context ...4. 1 string (context) the number of zeros and the number of ones that have followed this context, in the source sequence seen so far. The standard approach (see e.g. Rissanen and Langdon[12], Rissanen=-=[8]-=-,[10], and Weinberger, Lempel and Ziv[19]) is that, given the past source symbols, one uses this context tree to estimate the actual `state' of the finite memory tree source. Subsequently this state i... |

149 |
Information Theory and Coding
- Abramson
- 1963
(Show Context)
Citation Context ... of a semi-infinite sequence determine its suffix in S. To each suffix s in S there corresponds a parameter ` s . Each parameter (i.e. the probability of a source symbol being one) assumes a value in =-=[0; 1]-=- and specifies a probability distribution over f0; 1g. Together, all parameters form the parameter vector \Theta S \Delta = f` s : s 2 Sg. If the tree source has emitted the semi-infinite sequence x t... |

128 |
The performance of universal encoding
- Krichevsky, Trohov
- 1981
(Show Context)
Citation Context ...sequence with a zeros and b ones is (1 \Gamma `) a ` b . If we weight this probability over all ` with a ( 1 2 ; 1 2 )-Dirichlet distribution we obtain the so-called Krichevsky-Trofimov estimate (see =-=[5]-=-). Definition 4 : The Krichevski-Trofimov(KT) estimated probability for a sequence containing as0 zeros and bs0 ones is defined as P e (a; b) \Delta = Z 1 0 1 q (1 \Gamma `)` (1 \Gamma `) a ` b d`: (8... |

125 |
Universal Sequential Coding of Single Messages
- Shtar‘kov
- 1987
(Show Context)
Citation Context ...e (a; b); (9) 2. satisfies, for a + bs1, the following inequality P e (a; b)s1 2 \Delta 1 p a + b ( a a + b ) a ( b a + b ) b : (10) The sequential behavior of the KT-estimator was studied by Shtarkov=-=[17]-=-. An other estimator, the Laplace estimator, is investigated by Rissanen[10],[11]. This estimator can be obtained by weighting (1 \Gamma `) a ` b with ` uniform over [0; 1]. For the KT-estimator the p... |

108 |
Generalized Kraft inequality and arithmetic coding
- Rissanen
(Show Context)
Citation Context ... are based on the Elias algorithm (unpublished, but described by Abramson[1] and Jelinek[4]) or on enumeration (e.g. Schalkwijk[15] and Cover[2]). Arithmetic coding became feasable only after Rissanen=-=[7]-=-, and Pasco[6], had solved the accuracy issues that were involved. We will not discuss such issues here. Instead we will assume that all computations are carried out with infinite precision. 1 The bas... |

104 |
Universal modeling and coding
- Rissanen, Langdon
- 1984
(Show Context)
Citation Context ..., Moscow, GSP-4. 1 string (context) the number of zeros and the number of ones that have followed this context, in the source sequence seen so far. The standard approach (see e.g. Rissanen and Langdon=-=[12]-=-, Rissanen[8],[10], and Weinberger, Lempel and Ziv[19]) is that, given the past source symbols, one uses this context tree to estimate the actual `state' of the finite memory tree source. Subsequently... |

67 |
A mathematical theory of communication," Bell Sys
- Shannon
- 1948
(Show Context)
Citation Context ...owledged here. A Elias algorithm The first idea behind the Elias algorithm is that to each source sequence x T 1 there corresponds a subinterval of [0; 1). This principle can be traced back to Shannon=-=[16]-=-. Definition 9 : The interval I(x t 1 ) corresponding to x t 1 2 f0; 1g t ; t = 0; 1; \Delta \Delta \Delta ; T is defined as I(x t 1 ) \Delta = \Theta B(x t 1 ); B(x t 1 ) + P c (x t 1 ) \Delta (33) w... |

65 |
Prediction of random sequences and universal coding
- Ryabko
- 1988
(Show Context)
Citation Context ... only on the average but for each individual sequence. Model weighting (twice-universal coding) is not new. It was first suggested by Ryabko[13] for the class of finite order Markov sources (see also =-=[14]-=- for a similar approach to prediction). The known literature on model weighting resulted however in probability assignments that require complicated sequential updating procedures. Instead of finding ... |

58 |
Enumerative source encoding
- Cover
- 1973
(Show Context)
Citation Context ...o reduce the redundancy per source symbol. Arithmetic codes are based on the Elias algorithm (unpublished, but described by Abramson[1] and Jelinek[4]) or on enumeration (e.g. Schalkwijk[15] and Cover=-=[2]-=-). Arithmetic coding became feasable only after Rissanen[7], and Pasco[6], had solved the accuracy issues that were involved. We will not discuss such issues here. Instead we will assume that all comp... |

52 |
Source coding algorithms for fast data compression
- Pasco
- 1976
(Show Context)
Citation Context ...the Elias algorithm (unpublished, but described by Abramson[1] and Jelinek[4]) or on enumeration (e.g. Schalkwijk[15] and Cover[2]). Arithmetic coding became feasable only after Rissanen[7], and Pasco=-=[6]-=-, had solved the accuracy issues that were involved. We will not discuss such issues here. Instead we will assume that all computations are carried out with infinite precision. 1 The basis of the log(... |

52 |
Complexity of strings in the class of Markov sources
- Rissanen
- 1986
(Show Context)
Citation Context ... string (context) the number of zeros and the number of ones that have followed this context, in the source sequence seen so far. The standard approach (see e.g. Rissanen and Langdon[12], Rissanen[8],=-=[10]-=-, and Weinberger, Lempel and Ziv[19]) is that, given the past source symbols, one uses this context tree to estimate the actual `state' of the finite memory tree source. Subsequently this state is use... |

45 |
Probabilistic Information Theory
- Jelinek
- 1968
(Show Context)
Citation Context ...uences with a large length T . This is often needed to reduce the redundancy per source symbol. Arithmetic codes are based on the Elias algorithm (unpublished, but described by Abramson[1] and Jelinek=-=[4]-=-) or on enumeration (e.g. Schalkwijk[15] and Cover[2]). Arithmetic coding became feasable only after Rissanen[7], and Pasco[6], had solved the accuracy issues that were involved. We will not discuss s... |

39 |
Optimal sequential probability assignment for individual sequences
- Weinberger, Merhav, et al.
- 1994
(Show Context)
Citation Context ... 1 2 ; 1 2 )-Dirichlet distributed. Therefore we may say that P c (x T 1 jx 0 1\GammaD ) is a weighting over all models S 2 CD and all parameter vectors \Theta S , also called a "double mixture&q=-=uot; (see [20]-=-). We should stress that the context tree weighting method induces a certain weighting over all models (see lemma 2), which can be changed as e.g. in section 7 in order to achieve specific model redun... |

33 |
A sequential algorithm for the universal coding of finite memory sources,” submitted to
- Weinberger, Lempel, et al.
(Show Context)
Citation Context ...s and the number of ones that have followed this context, in the source sequence seen so far. The standard approach (see e.g. Rissanen and Langdon[12], Rissanen[8],[10], and Weinberger, Lempel and Ziv=-=[19]-=-) is that, given the past source symbols, one uses this context tree to estimate the actual `state' of the finite memory tree source. Subsequently this state is used to estimate the distribution that ... |

30 |
An algorithm for source coding
- Schalkwijk
- 1972
(Show Context)
Citation Context ...often needed to reduce the redundancy per source symbol. Arithmetic codes are based on the Elias algorithm (unpublished, but described by Abramson[1] and Jelinek[4]) or on enumeration (e.g. Schalkwijk=-=[15]-=- and Cover[2]). Arithmetic coding became feasable only after Rissanen[7], and Pasco[6], had solved the accuracy issues that were involved. We will not discuss such issues here. Instead we will assume ... |

9 |
Context Weighting: General Finite Context Sources
- Willems, Shtarkov, et al.
(Show Context)
Citation Context ...ork for all source sequences, i.e. does not achieve the lower bound. Finite accuracy implementations of the context tree weighting method in combination with arithmetic coding are studied in [24]. In =-=[23]-=- context weighting methods are described that perform on more general model classes than the one that we have studied here. These model classes are still bounded memory, and the proposed schemes for t... |

7 |
Context tree weighting: a sequential universal source coding procedure for FSMX sources
- Willems, Shtarkov, et al.
- 1993
(Show Context)
Citation Context ...t the terms that tell us about the model redundancy. The context tree weighting procedure was presented first at the 1993 IEEE International Symposium on Information Theory in San Antonio, Texas (see =-=[22]-=-). At the same time Weinberger, Rissanen and Feder[21] studied finite memory tree sources and proposed a method that is based on state estimation. Again an (artificial) constant C and a function g(t) ... |

4 |
Twice-universal coding", Problems of Inform
- Ryabko
- 1984
(Show Context)
Citation Context ...antage of weighting procedures is that they perform well not only on the average but for each individual sequence. Model weighting (twice-universal coding) is not new. It was first suggested by Ryabko=-=[13]-=- for the class of finite order Markov sources (see also [14] for a similar approach to prediction). The known literature on model weighting resulted however in probability assignments that require com... |

3 | Sequential Weighting Algorithms for Multi-Alphabet Sources
- Tjalkens, Shtarkov, et al.
- 1993
(Show Context)
Citation Context ...g method that is described here. Although we have considered only binary sources here, there exist straightforward generalizations of the context tree weighting method to non-binary sources (see e.g. =-=[18]-=-). Acknowledgement This research was carried out in May 1992 while the second author visited the Information Theory Group at Eindhoven University. The authors thank the Eindhovens Universiteitsfonds f... |

2 |
The context tree weighting method : Truncated updating," submitted to IEEE Trans. lnfarn. Theor
- Willems
- 1994
(Show Context)
Citation Context ...oes not work for all source sequences, i.e. does not achieve the lower bound. Finite accuracy implementations of the context tree weighting method in combination with arithmetic coding are studied in =-=[24]-=-. In [23] context weighting methods are described that perform on more general model classes than the one that we have studied here. These model classes are still bounded memory, and the proposed sche... |

2 |
Extensions to the context tree weighting method
- Willems
- 1994
(Show Context)
Citation Context ...d, which is described here, has D as a parameter to be specified in advance, making the method work only for models S 2 CD , i.e. for models with memory not larger than D. It is however possible (see =-=[25]-=-) to modify the algorithm such that there is no constraint on the maximum memory depth D involved (Moreover it was demonstrated there that it is not necessary to have access to x 0 1\GammaD .) This im... |

1 |
A Universal Finite Memory Source," submitted for publication
- Weinberger, Rissanen, et al.
- 1992
(Show Context)
Citation Context ...trate that their coding procedure achieves asymptotically the lower bound on the average redundancy, as stated by Rissanen ([9], theorem 1, or [10], theorem 1). Recently Weinberger, Rissanen and Feder=-=[21]-=- could prove the optimality, in the sense of achieving Rissanen's lower bound on the redundancy, of an algorithm similar to that of Rissanen in [8]. An unpleasant fact about the standard approach is t... |