## Context tree estimation for not necessarily finite memory processes, via BIC and MDL (2006)

Venue: | IEEE Trans. Inf. Theory |

Citations: | 25 - 1 self |

### BibTeX

@ARTICLE{Csiszár06contexttree,

author = {Imre Csiszár and Zsolt Talata},

title = {Context tree estimation for not necessarily finite memory processes, via BIC and MDL},

journal = {IEEE Trans. Inf. Theory},

year = {2006},

volume = {52},

pages = {1007--1016}

}

### OpenURL

### Abstract

The concept of context tree, usually defined for finite memory processes, is extended to arbitrary stationary ergodic processes (with finite alphabet). These context trees are not necessarily complete, and may be of infinite depth. The familiar BIC and MDL principles are shown to provide strongly consistent estimators of the context tree, via optimization of a criterion for hypothetical context trees of finite depth, allowed to grow with the sample size n as o(log n). Algorithms are provided to compute these estimators in O(n) time, and to compute them on-line for all i ≤ n in o(n log n) time.

### Citations

2304 |
Estimating the dimension of a model
- Schwarz
- 1978
(Show Context)
Citation Context ... a known prior bound on the depth of the context tree, but using a bound allowed to grow with n. They asserted that standard statistical methods as the Bayesian information criterion (BIC) of Schwarz =-=[12]-=- and the minimum description length (MDL) principle of Rissanen [11], [2] were inappropriate for context tree estimation, due to computational infeasibility of comparing a very large number of hypothe... |

305 | The minimum description length principle in coding and modeling
- Barron, Rissanen, et al.
(Show Context)
Citation Context ...llowed to grow with n. They asserted that standard statistical methods as the Bayesian information criterion (BIC) of Schwarz [12] and the minimum description length (MDL) principle of Rissanen [11], =-=[2]-=- were inappropriate for context tree estimation, due to computational infeasibility of comparing a very large number of hypothetical models. Still, Willems, Shtarkov, and Tjalkens [15],sI. Csiszár and... |

160 |
A universal data compression system
- Rissanen
- 1983
(Show Context)
Citation Context ...ing the conditional probabilities—referred to as contexts—are of variable length, sometimes substantially shorter than the order k. Models of this kind and the term context tree date back to Rissanen =-=[10]-=-. These models are also called finite memory sources or tree sources [13], [14], [16] or variable length Markov chains [3]. We note that the terms context and context tree appear in the literature in ... |

158 | The context-tree weighting method: basic properties
- Willems, Shtarkov, et al.
- 1995
(Show Context)
Citation Context ...f Rissanen [11], [2] were inappropriate for context tree estimation, due to computational infeasibility of comparing a very large number of hypothetical models. Still, Willems, Shtarkov, and Tjalkens =-=[15]-=-,sI. Csiszár and Zs. Talata, Context Tree Estimation via BIC and MDL 3 [17] showed that time-consuming comparisons can be avoided by clever use of tree techniques. Their context tree maximizing (CTM) ... |

128 |
The performance of universal encoding
- Krichevsky, Trohov
- 1981
(Show Context)
Citation Context ... of tree techniques. Their context tree maximizing (CTM) algorithm computes in linear time the context tree estimator obtained by the version of MDL that uses the Krichevsky–Trofimov (KT) code length =-=[7]-=-, and this estimator is consistent, as they proved assuming a known upper bound on the depth of the context tree. Similar results were obtained also by Nohre [9]. Recent results on consistent context ... |

85 | Variable length markov chains
- Buhlmann, Wyner
- 1999
(Show Context)
Citation Context ...order k. Models of this kind and the term context tree date back to Rissanen [10]. These models are also called finite memory sources or tree sources [13], [14], [16] or variable length Markov chains =-=[3]-=-. We note that the terms context and context tree appear in the literature in various senses. Here, the context tree of a finite memory process means, in effect, the minimal tree admitting a tree sour... |

69 |
A universal finite memory source
- Weinberger, Rissanen, et al.
- 1995
(Show Context)
Citation Context ...h, sometimes substantially shorter than the order k. Models of this kind and the term context tree date back to Rissanen [10]. These models are also called finite memory sources or tree sources [13], =-=[14]-=-, [16] or variable length Markov chains [3]. We note that the terms context and context tree appear in the literature in various senses. Here, the context tree of a finite memory process means, in eff... |

55 | The consistency of the BIC Markov order estimator
- Csiszár, Shields
- 2000
(Show Context)
Citation Context ...e of prior results on context tree estimation via BIC. While BIC is commonly regarded as an approximate version of MDL, this is justified only when a finite number of model classes is considered, see =-=[4]-=-. We note that much of the literature of context tree models is motivated by universal source coding. In particular, CTM is a modification of the celebrated Context Tree Weighting data compression alg... |

35 | The context-tree weighting method: Extensions - Willems - 1998 |

33 |
A sequential algorithm for the universal coding of finite memory sources,” submitted to
- Weinberger, Lempel, et al.
(Show Context)
Citation Context ... length, sometimes substantially shorter than the order k. Models of this kind and the term context tree date back to Rissanen [10]. These models are also called finite memory sources or tree sources =-=[13]-=-, [14], [16] or variable length Markov chains [3]. We note that the terms context and context tree appear in the literature in various senses. Here, the context tree of a finite memory process means, ... |

22 |
Large-scale typicality of Markov sample paths and consistency of MDL order estimators. Special issue on Shannon theory: perspective, trends, and applications
- Csiszár
(Show Context)
Citation Context ...stimation with a stopping rule based on “stabilizing” of the estimator. The NML version of MDL was not considered for the context tree estimation problem (though it was for Markov order estimation in =-=[5]-=-), because the structure of the NML criterion, unlike BIC and KT, appears unsuitable for CTM implementation. Finally we note that in the definition of BIC (Definition 2.4), the factor (|A| − 1)|T |/2 ... |

19 |
Stochastic complexity in statistical inquiry. Singapore: World Scientific
- Rissanen
- 1989
(Show Context)
Citation Context ...ound allowed to grow with n. They asserted that standard statistical methods as the Bayesian information criterion (BIC) of Schwarz [12] and the minimum description length (MDL) principle of Rissanen =-=[11]-=-, [2] were inappropriate for context tree estimation, due to computational infeasibility of comparing a very large number of hypothetical models. Still, Willems, Shtarkov, and Tjalkens [15],sI. Csiszá... |

13 | Linear time universal coding and time reversal of tree sources via FSM closure
- Martin, Seroussi, et al.
- 2004
(Show Context)
Citation Context ...ssigned to an observed sequence, with an “indeterminate symbol” ε such that infinitely many ε’s may precede a finite number of symbols of the true alphabet. A concept of generalized context tree, see =-=[8]-=- and references there, admits edges labeled by strings rather than single symbols. That concept is not used here, but similarly to [8] we drop the completeness requirement, often made in the literatur... |

11 |
Some Topics in Descriptive Complexity
- Nohre
- 1994
(Show Context)
Citation Context ...e Krichevsky–Trofimov (KT) code length [7], and this estimator is consistent, as they proved assuming a known upper bound on the depth of the context tree. Similar results were obtained also by Nohre =-=[9]-=-. Recent results on consistent context tree estimation in linear time, assuming finite depth but no known upper bound on it, appear in [1], [8]. These references use tools as the Burrows–Wheeler trans... |

9 |
Estimation of the order of a finite Markov chain
- Finesso
- 1992
(Show Context)
Citation Context ...h the KT and the normalized maximum likelihood (NML) code length, are strongly consistent when the number of candidate model classes is finite, that is, when there is a known upper bound on the order =-=[6]-=-. The consistency of the BIC order estimator without such prior bound has been proved by Csiszár and Shields [4]. That paper also contains a counterexample to the consistency of the KT and NML version... |

8 | An O(N) semi-predictive universal encoder via the BWT
- Baron, Bresler
- 2004
(Show Context)
Citation Context ...the context tree. Similar results were obtained also by Nohre [9]. Recent results on consistent context tree estimation in linear time, assuming finite depth but no known upper bound on it, appear in =-=[1]-=-, [8]. These references use tools as the Burrows–Wheeler transform or generalized context trees. We are not aware of prior results on context tree estimation via BIC. While BIC is commonly regarded as... |

6 |
Contexttree maximizing
- Willems, Shtarkov, et al.
- 2000
(Show Context)
Citation Context ... computational infeasibility of comparing a very large number of hypothetical models. Still, Willems, Shtarkov, and Tjalkens [15],sI. Csiszár and Zs. Talata, Context Tree Estimation via BIC and MDL 3 =-=[17]-=- showed that time-consuming comparisons can be avoided by clever use of tree techniques. Their context tree maximizing (CTM) algorithm computes in linear time the context tree estimator obtained by th... |