## Two new Markov order estimators (2005)

Venue: | Arxiv Preprint Math/0506080 |

Citations: | 4 - 0 self |

### BibTeX

@INPROCEEDINGS{Peres05twonew,

author = {Yuval Peres and Paul Shields},

title = {Two new Markov order estimators},

booktitle = {Arxiv Preprint Math/0506080},

year = {2005}

}

### OpenURL

### Abstract

Abstract. We present two new methods for estimating the order (memory depth) of a finite alphabet Markov chain from observation of a sample path. One method is based on entropy estimation via recurrence times of patterns, and the other relies on a comparison of empirical conditional probabilities. The key to both methods is a qualitative change that occurs when a parameter (a candidate for the order) passes the true order. We also present extensions to order estimation for Markov random fields.

### Citations

9359 |
Elements of information theory
- Cover, Thomas
- 1991
(Show Context)
Citation Context ...eviations ≤ M theory shows that for any ǫ > 0, we have φM(x n 1 ) = o(n1/2+ǫ ), a.s., so M # n eventually a.s. 2 The entropy estimator method. We first review some elementary facts about entropy, see =-=[3]-=- or [17] for details. The conditional entropy of the next symbol given k previous symbols is defined by Hk = H(Xk+1|X k def 1 ) = − � a k+1 1 P(a k+1 1 ) logP(ak+1|a k 1 ). The sequence {Hk} is noninc... |

2831 |
Estimating the dimension of a model
- Schwarz
- 1978
(Show Context)
Citation Context ...and the MDL focus on selecting the correct class from a nested sequence of parametric model classes, M0 ⊂ M1 ⊂ M2 . . ., based on a sample path drawn from some P ∈ ∪Mk. The BIC, introduced by Schwarz =-=[16]-=-, is based on Bayesian principles and leads to the model estimator M ∗ BIC (xn � def 1 ) = arg min k − logPML(k)(x n 1 ) + φ(k) 2 � log n , where PML(k)(xn 1) is the k-th order maximum likelihood, i.e... |

1584 | Probability inequalities for sums of bounded random variables
- Hoeffding
- 1963
(Show Context)
Citation Context ...ak−1 1 )P(ak|a k−1 k−M ), and direct calculation shows that E(∆k(j)|X j−1 1 ) = 0 and �∆k(j)�∞ ≤ 1 for j > k. From the Hoeffding-Azuma large deviations bound for martingales with bounded differences, =-=[11, 1]-=-, the probability that |Zn| ≥ n 3/4 is at most 2 exp(−n 1/2 /2). A similar argument also shows that for Z ∗ def k (n) = Nn(a k k−M |xn k−1 1 ) − Nn−1(ak−M |xn−1 1 )P(ak|a k−1 k−M ), n ≥ k, the probabi... |

929 | Information theory. Coding theorems for discrete memoryless systems. Probability and Mathematical Statistics - Csiszár, Körner - 1981 |

314 | The minimum description length principle in coding and modeling
- Barron, Rissanen, et al.
- 1998
(Show Context)
Citation Context ...and related methods. Two important and related methods, the Bayesian Information Criterion (BIC) and the Minimum Description Length (MDL) Principle are the basis for many model selection methods, see =-=[2, 4, 6]-=- for discussion and references to these and other methods. Both the BIC and the MDL focus on selecting the correct class from a nested sequence of parametric model classes, M0 ⊂ M1 ⊂ M2 . . ., based o... |

246 |
Weighted sums of certain dependent random variables
- Azuma
- 1967
(Show Context)
Citation Context ...ak−1 1 )P(ak|a k−1 k−M ), and direct calculation shows that E(∆k(j)|X j−1 1 ) = 0 and �∆k(j)�∞ ≤ 1 for j > k. From the Hoeffding-Azuma large deviations bound for martingales with bounded differences, =-=[11, 1]-=-, the probability that |Zn| ≥ n 3/4 is at most 2 exp(−n 1/2 /2). A similar argument also shows that for Z ∗ def k (n) = Nn(a k k−M |xn k−1 1 ) − Nn−1(ak−M |xn−1 1 )P(ak|a k−1 k−M ), n ≥ k, the probabi... |

72 |
The Ergodic Theory of Discrete Sample Paths
- Shields
- 1996
(Show Context)
Citation Context ...ns ≤ M theory shows that for any ǫ > 0, we have φM(x n 1 ) = o(n1/2+ǫ ), a.s., so M # n eventually a.s. 2 The entropy estimator method. We first review some elementary facts about entropy, see [3] or =-=[17]-=- for details. The conditional entropy of the next symbol given k previous symbols is defined by Hk = H(Xk+1|X k def 1 ) = − � a k+1 1 P(a k+1 1 ) logP(ak+1|a k 1 ). The sequence {Hk} is nonincreasing ... |

65 |
Some asymptotic properties of the entropy of a stationary ergodic data source with applications to data compression. Information Theory
- AD, Ziv
- 1989
(Show Context)
Citation Context ...alphabet process X, the time until the opening n-block occurs again, Rn(x) def = min{r ≥ n : x r+n r+1 = x n 1}, 4 asgrows like e nH(X) , that is, (1/n) log Rn(x) → H(X) a.s. (Earlier, Wyner and Ziv, =-=[19]-=-, established convergence-in-probability for a related recurrence idea.) In our setting ℓ(n) = max{k: Rk ≤ n} and the Ornstein-Weiss recurrence theorem gives 1 lim n→∞ ℓ(n) log Rℓ(n)(x) = H(X), a.s. L... |

64 | Asymptotic behavior of the Lempel-Ziv parsing scheme and [in] digital search trees
- Jacquet, Szpankowski
- 1995
(Show Context)
Citation Context ...e for which an o(1) underestimation bound is not known is the Lempel-Ziv entropy estimator, [20]. An O((1/n) logn) underestimation bound for the class M0 of i.i.d. processes has been established, see =-=[8]-=-, a result we suspect can be extended to the class M. 12s4.2 The “flat spot” problem. For the Markov order estimation problem, it is tempting to take as order estimator the first k for which � hk(n) −... |

57 | The consistency of the BIC Markov order estimator
- Csisz'ar, Shields
- 2000
(Show Context)
Citation Context ...0. It is easy to see that this implies �hf(n)(n) → H, a.s., for log log n the case f(n) = . This proves (a). log |A| To establish part (b), suppose X ∈ M has order M. The BIC consistency theorem, see =-=[6]-=-, implies that |A| f(n) (|A| − 1) 2 log n + n � hf(n)(n) > |A|M (|A| − 1) 2 log n + n � hM(n), eventually a.s. Using the relation |A| f(n) = log n and the bound n � hM(n) ≥ nH − c log log n, which hol... |

41 |
Coding theorems for individual sequences
- Ziv
- 1978
(Show Context)
Citation Context ...pect there may be a more direct proof of Proposition 2(b) than the one we gave. Remark 5 An important example for which an o(1) underestimation bound is not known is the Lempel-Ziv entropy estimator, =-=[20]-=-. An O((1/n) logn) underestimation bound for the class M0 of i.i.d. processes has been established, see [8], a result we suspect can be extended to the class M. 12s4.2 The “flat spot” problem. For the... |

25 | Asymptotic recurrence and waiting times for stationary processes - Kontoyiannis - 1998 |

24 |
Large-scale typicality of Markov sample paths and consistency of MDL order estimators
- Csiszár
- 2002
(Show Context)
Citation Context ...and related methods. Two important and related methods, the Bayesian Information Criterion (BIC) and the Minimum Description Length (MDL) Principle are the basis for many model selection methods, see =-=[2, 4, 6]-=- for discussion and references to these and other methods. Both the BIC and the MDL focus on selecting the correct class from a nested sequence of parametric model classes, M0 ⊂ M1 ⊂ M2 . . ., based o... |

18 | A topological criterion for hypothesis testing - Peres - 1994 |

17 |
Entropy and recurrence rates for stationary random fields
- Ornstein, Weiss
- 2002
(Show Context)
Citation Context ...es to 0 very slowly, however, which suggests that its associated order estimator M ∗ n (xn 1 ) converges slowly to M. Furthermore, though the recurrence idea does generalize to higher dimensions, see =-=[15]-=-, a useful rate theory for it has not been established. In Section 4.1, we present another entropy estimator that has a more rapidly convergent underestimation bound and is extendable to higher dimens... |

9 |
Estimation of the order of a finite Markov chain
- Finesso
- 1992
(Show Context)
Citation Context ...k (|A| − 1). Schwarz [16] proved consistency if the model classes are i.i.d. exponential families and a bound on the number of models is assumed, a result later extended to the Markov case by Finesso =-=[10]-=-. The first consistency proofs for the Markov case without an order bound assumption are given in [6]. The proofs are surprisingly complicated, though they have been simplified somewhat in [4], which ... |

7 |
Entropy and data compression
- Ornstein, Weiss
- 1993
(Show Context)
Citation Context ...d, which we call the entropy estimator method, compares � hk(n) with the entropy estimator [ℓ(n)] −1 log n, where ℓ(n) denotes the length of the longest initial block in xn 1 that repeats in xn1 (see =-=[14]-=- and Section 2 below). Theorem 1 M ∗ n (xn def 1 ) = min{k: �hk(n) ≤ [ℓ(n)] −1 log n + 2(log n) −1/4 } is a consistent Markov order estimator. Our second method, which we call the maximal fluctuation ... |

4 |
B.: How sampling reveals a process. Ann
- Ornstein, Weiss
- 1990
(Show Context)
Citation Context ...osition 2 There a positive constant C such that for any X ∈ M, (a) � hf(n)(n) → H(X), a.s. (b) �hf(n)(n) ≥ H(X) − C log2 n , eventually a.s. n Proof. By the Ornstein-Weiss entropy estimation theorem, =-=[13]-=-, the per-symbol em1 pirical block entropy f(n) H( � Pf(n)(·)) → H, a.s. as f(n) → ∞, provided only that log n f(n) ≤ H+ǫ , for some ǫ > 0. It is easy to see that this implies �hf(n)(n) → H, a.s., for... |

3 |
Consistent estimation of the basic neighborhood of Markov random fields,” preprint
- Talata
- 2004
(Show Context)
Citation Context ... the discussion we focused on squares rather than diamonds which are more natural in Ising models. Our concepts and results can easily be converted to the latter setting. Remark 3 Csiszár and Talata, =-=[7]-=-, have recently shown the existence of a consistent range estimator for a restricted class of Markov random fields, namely, those for which, conditioned on any boundary, probabilities in a square are ... |

1 |
Recurrence revisited”, Lecture at the Ornsteinfest
- Shields
- 2004
(Show Context)
Citation Context ...ences ak 1 , it follows from (7) and an application of Borel-Cantelli that eventually a.s., φM(xn 1) ≤ n3/4 . This completes the proof of Theorem 2. Remark 1 After one of us lectured on these results =-=[18]-=-, B. Weiss noted that in recent joint work he did with G. Morvai, they independently developped the estimator M # n discussed in Theorem 2. 3.1 Markov Random Fields The method of maximum fluctuations ... |