## Nonparametric Entropy Estimation for Stationary Processes and Random Fields, with Applications to English Text (1998)

Citations: | 17 - 6 self |

### BibTeX

@MISC{Kontoyiannis98nonparametricentropy,

author = {I. Kontoyiannis and P. H. Algoet and Yu. M. Suhov and A. J. Wyner},

title = {Nonparametric Entropy Estimation for Stationary Processes and Random Fields, with Applications to English Text},

year = {1998}

}

### Years of Citing Articles

### OpenURL

### Abstract

We discuss a family of estimators for the entropy rate of a stationary ergodic process and prove their pointwise and mean consistency under a Doeblin-type mixing condition. The estimators are Ces`aro averages of longest match-lengths, and their consistency follows from a generalized ergodic theorem due to Maker. We provide examples of their performance on English text, and we generalize our results to countable alphabet processes and to random fields.

### Citations

9231 | Elements of Information Theory - Cover, Thomas - 1991 |

1221 | A universal algorithm for sequential data compression
- Ziv, Lempel
- 1977
(Show Context)
Citation Context ...ate upper and lower bounds for the process entropy. 4. Interpretation. The match-lengthsn i can be thought of as the length of the next phrase to be encoded by the sliding-window Lempel-Ziv algorithm =-=[30]-=- when the window size is n. In fact, the entropy estimator in (a) above is a special case of the sliding window estimator [8] defined by H k;n = 1 k k X i=1 n i log n ; (2) where k (as well as n) are ... |

779 | Compression of individual sequences via variable-rate coding
- Ziv, Lempel
- 1978
(Show Context)
Citation Context ... bound on the per-symbol description length when the process is losslessly encoded, and several universal data compression algorithms are known that actually achieve it. In particular, the Lempel-Ziv =-=[31]-=- algorithm attains the entropy lower bound when it is applied to almost every realization of a stationary ergodic source. A straightforward approach for estimating the entropy rate of an unknown sourc... |

148 |
Information and information stability of random variables and processes
- Pinsker
- 1960
(Show Context)
Citation Context ...0sX u ! m, m if X usm. The random field fX (m) u g is also stationary ergodic, and its entropy rate H (m) increases to the entropy rate H of the random field fX u g as m ! 1 (see Chapter 7 of Pinsker =-=[18]-=- for a general discussion). Let R (m) k be defined in terms of fX (m) u g in the same way as R k was defined in terms of fX u g. Then R ksR (m) k , so lim inf k 1 k d log R d kslim k 1 k d log R (m) d... |

71 |
Entropy and data compression schemes
- Ornstein, Weiss
- 1993
(Show Context)
Citation Context ... max fl : 0slsn; X l\Gamma1 0 = X \Gammaj +l\Gamma1 \Gammaj for some lsjsng: Wyner and Ziv [29] showed that, for every ergodic process, L n grows like (log n)=H in probability, and Ornstein and Weiss =-=[15]-=- later refined this to pointwise convergence, L n log n ! 1 H a.s. (1) where H is the entropy rate of fX i g (logarithms are to base 2 throughout this correspondence). At about the same time, Grassber... |

64 |
Some asymptotic properties of the entropy of stationary ergodic data source with applications to data compression
- Wyner, Ziv
- 1989
(Show Context)
Citation Context ...he specific application at hand, are often employed. After all, estimating the entropy is a simpler task, at least in principle, than compressing an unknown source to the entropy limit. Wyner and Ziv =-=[29]-=-, motivated in part by the problem of providing a pointwise asymptotic analysis of the Lempel-Ziv algorithm, revealed some deep connections between the entropy rate of a stationary ergodic process and... |

60 |
Asymptotic growth of a class of random trees
- Pittel
- 1985
(Show Context)
Citation Context ...e used in Sections III and IV. 6. A word of caution. Theorem 1 says that the Ces`aro averages of the quantitiessn i = log n and of the quantitiessi i = log i converge with probability one, but Pittel =-=[19]-=- and Szpankowski [24] have shown that the quantitiessn n = log n themselves keep fluctuating. Interpretingsn n as the length of a feasible path in a suffix tree they identify two natural constants H 1... |

51 | Modelling English Text
- Teahan
- 1998
(Show Context)
Citation Context ...g of the data. (There is extensive literature that deals with language- and text-modeling, and several special-purpose algorithms that provide very good estimates; see, for example, Teahan and Cleary =-=[25]-=- and the references therein.) The best results to date are those reported by Teahan and Cleary [26], using PPM-related methods. They obtain an estimate of 1.603 bpc for the complete works of Jane Aust... |

48 |
The individual ergodic theorem of information theory
- Breiman
(Show Context)
Citation Context ...to Maker [14] and conclude that the Ces`aro averages in Theorem 1 are pointwise consistent estimates for 1=H. Maker's theorem includes, as a special case, Breiman's ergodic theorem, which was used in =-=[3]-=- to prove the ShannonMcMillan -Breiman theorem. In the Appendix we present a simplified proof of Maker's generalized ergodic theorem, and some extensions that are used in Sections III and IV. 6. A wor... |

42 | Asymptotic properties of data compression and suffix trees
- Szpankowski
- 1993
(Show Context)
Citation Context ...I and IV. 6. A word of caution. Theorem 1 says that the Ces`aro averages of the quantitiessn i = log n and of the quantitiessi i = log i converge with probability one, but Pittel [19] and Szpankowski =-=[24]-=- have shown that the quantitiessn n = log n themselves keep fluctuating. Interpretingsn n as the length of a feasible path in a suffix tree they identify two natural constants H 1 and H 2 with H 1 ? H... |

40 |
The strong ergodic theorem of densities; Generalized
- Barron
(Show Context)
Citation Context ...ompletes the proof. Appendix Breiman [3] developed a generalized ergodic theorem and used it to prove pointwise convergence in what is now called the Shannon-McMillan-Breiman theorem. See also Barron =-=[2]-=- for a one-sided version and Algoet [1] for other applications. It turns out that Breiman's generalization is a special case of an older and more general ergodic theorem due to Maker [14]. We prove th... |

37 | The entropy of English using PPMbased models
- Teahan, Cleary
- 1996
(Show Context)
Citation Context ...ral special-purpose algorithms that provide very good estimates; see, for example, Teahan and Cleary [25] and the references therein.) The best results to date are those reported by Teahan and Cleary =-=[26]-=-, using PPM-related methods. They obtain an estimate of 1.603 bpc for the complete works of Jane Austen (over 4,000,000 characters), and then use training in conjunction with an alphabet enlargement t... |

31 |
The strong law of large numbers for sequential decisions under uncertainty
- Algoet
- 1994
(Show Context)
Citation Context ...] developed a generalized ergodic theorem and used it to prove pointwise convergence in what is now called the Shannon-McMillan-Breiman theorem. See also Barron [2] for a one-sided version and Algoet =-=[1]-=- for other applications. It turns out that Breiman's generalization is a special case of an older and more general ergodic theorem due to Maker [14]. We prove the onesided version and then generalize ... |

27 | On the entropy of DNA: Algorithms and measurements based on memory and rapid convergence
- Farach, Noordewier, et al.
- 1995
(Show Context)
Citation Context .... Without (DC), 1=H is still an asymptotic lower bound for the estimates in (a) and (b). Remarks: 1. Applications. Entropy estimators similar to the one in (a) have already appeared in the literature =-=[8]-=-[4][5][27][10]. They were applied to experimental data in order to determine the entropy rate of the underlying process, and were demonstrated to be very efficient, even when fed with very limited amo... |

27 |
The Shannon-McMillan-Breiman theorem for a class of amenable groups
- Ornstein, Weiss
- 1983
(Show Context)
Citation Context ... log R d ksH a.s. (18) Their argument [16] is also valid in the infinite alphabet case, provided the Shannon-McMillanBreiman theorem holds for the random field fX u g. According to Ornstein and Weiss =-=[17]-=-, this is indeed true if E f\Gamma log P (X 0 )g is finite. Combining (17) and (18) completes the proof. Appendix Breiman [3] developed a generalized ergodic theorem and used it to prove pointwise con... |

24 |
Universal Redundancy Rates Do Not Exist
- Shields
- 1993
(Show Context)
Citation Context ... the algorithm to converge, then the compression ratio is a good estimate for the source entropy. But like for the ergodic theorem, also for data compression there is no universal rate of convergence =-=[21]-=-[23]. Moreover, few of the known universal coding algorithms have been shown to achieve the entropy limit in the pointwise sense, and of those, not all are feasible to implement. In practice, it is of... |

23 |
Estimating the information content of symbol sequences and efficient codes
- Grassberger
- 1989
(Show Context)
Citation Context ...er refined this to pointwise convergence, L n log n ! 1 H a.s. (1) where H is the entropy rate of fX i g (logarithms are to base 2 throughout this correspondence). At about the same time, Grassberger =-=[9]-=- suggested an interesting entropy estimator based on average match-lengths. Shields [22] proved the consistency of Grassberger's estimator for independent identically distributed (i.i.d.) processes an... |

19 | The Redundancy and Distribution of the Phrase Lengths of the Fixed-Database Lempel-Ziv Algorithm
- Wyner
- 1997
(Show Context)
Citation Context ...ence of the ergodic theorem thatsH k;n is almost surely consistent if we first let k ! 1 and then n ! 1, provided that E[L n =(log n)] = 1=H + o(1). This is true for stationary ergodic Markov sources =-=[28]-=- and, by (6) of Theorem 1 0 below combined with (1) it is also true for all stationary ergodic processes satisfying (DC). Similarly,si i is the length of the phrase that would be encoded next by the L... |

17 | Using difficulty of prediction to decrease computation: fast sort, priority queue and convex hull on entropy bounded inputs
- Chen, Reif
- 1993
(Show Context)
Citation Context ...ithout (DC), 1=H is still an asymptotic lower bound for the estimates in (a) and (b). Remarks: 1. Applications. Entropy estimators similar to the one in (a) have already appeared in the literature [8]=-=[4]-=-[5][27][10]. They were applied to experimental data in order to determine the entropy rate of the underlying process, and were demonstrated to be very efficient, even when fed with very limited amount... |

17 |
Entropy and recurrence rates for stationary random fields
- Ornstein, Weiss
- 2002
(Show Context)
Citation Context ...1 : X \GammaC (k) = X \Gammau\GammaC (k) for some u 2 C(n), u 6= 0g: Notice that R k and L n are related by the following relationship: L nsk iff R k ? n: (10) Applying a result of Ornstein and Weiss =-=[16]-=- to the reflected field fX \Gammau g, we see that log R d k k d ! H a.s. (11) and from the duality relationship (10) it follows immediately that L d n log n d ! 1 H a.s. (12) Note that k d , n d , R d... |

15 |
Entropy and prefixes
- Shields
- 1992
(Show Context)
Citation Context ...py rate of fX i g (logarithms are to base 2 throughout this correspondence). At about the same time, Grassberger [9] suggested an interesting entropy estimator based on average match-lengths. Shields =-=[22]-=- proved the consistency of Grassberger's estimator for independent identically distributed (i.i.d.) processes and mixing Markov chains. Kontoyiannis and Suhov [11] extended this to a wider class of st... |

11 |
What can we do with small corpora? Document categorization via crossentropy
- Juola
- 1997
(Show Context)
Citation Context ...), 1=H is still an asymptotic lower bound for the estimates in (a) and (b). Remarks: 1. Applications. Entropy estimators similar to the one in (a) have already appeared in the literature [8][4][5][27]=-=[10]-=-. They were applied to experimental data in order to determine the entropy rate of the underlying process, and were demonstrated to be very efficient, even when fed with very limited amounts of data. ... |

10 |
les propriétés asymptotiques de mouvements régis par certains types de chaînes simples, Thèse de doctorat ès Sciences Mathématiques
- Sur
- 1938
(Show Context)
Citation Context ... motivation is to provide a more general and precise analysis of these practical algorithms. 2. The Doeblin condition. The Doeblin condition was originally introduced in the analysis of Markov chains =-=[7]-=-. In the context of this paper (DC) was first introduced by Kontoyiannis and Suhov [11], where its properties are discussed in greater detail. Here we note that (DC) holds for i.i.d. processes, for er... |

9 | Prefixes and the entropy rate for long-range sources
- Kontoyiannis, Suhov
- 1994
(Show Context)
Citation Context ...ed on average match-lengths. Shields [22] proved the consistency of Grassberger's estimator for independent identically distributed (i.i.d.) processes and mixing Markov chains. Kontoyiannis and Suhov =-=[11]-=- extended this to a wider class of stationary processes, and recently Quas [20] extended it further to certain processes with infinite alphabets and to random fields. In this paper we introduce three ... |

6 |
The ergodic theorem for a sequence of functions
- Maker
- 1940
(Show Context)
Citation Context ...o converge to 1=H, with probability one. 5. Maker's generalized ergodic theorem. The proof of Theorem 1 is based on the fact that, under (DC), we can invoke a generalized ergodic theorem due to Maker =-=[14]-=- and conclude that the Ces`aro averages in Theorem 1 are pointwise consistent estimates for 1=H. Maker's theorem includes, as a special case, Breiman's ergodic theorem, which was used in [3] to prove ... |

5 |
An entropy estimator for a class of infinite alphabet processes
- Quas
- 1995
(Show Context)
Citation Context ...'s estimator for independent identically distributed (i.i.d.) processes and mixing Markov chains. Kontoyiannis and Suhov [11] extended this to a wider class of stationary processes, and recently Quas =-=[20]-=- extended it further to certain processes with infinite alphabets and to random fields. In this paper we introduce three entropy estimators, (a), (b) and (c) below, that are formally similar to the on... |

3 |
Universal redundancy rates for the class of B-processes do not exist
- Shields, Weiss
- 1995
(Show Context)
Citation Context ... algorithm to converge, then the compression ratio is a good estimate for the source entropy. But like for the ergodic theorem, also for data compression there is no universal rate of convergence [21]=-=[23]-=-. Moreover, few of the known universal coding algorithms have been shown to achieve the entropy limit in the pointwise sense, and of those, not all are feasible to implement. In practice, it is often ... |

2 | Stationary entropy estimation via string matching - Kontoyiannis, Suhov - 1996 |

2 |
Entropy and patterns
- Wyner
- 1996
(Show Context)
Citation Context ... (DC), 1=H is still an asymptotic lower bound for the estimates in (a) and (b). Remarks: 1. Applications. Entropy estimators similar to the one in (a) have already appeared in the literature [8][4][5]=-=[27]-=-[10]. They were applied to experimental data in order to determine the entropy rate of the underlying process, and were demonstrated to be very efficient, even when fed with very limited amounts of da... |

1 | Fast pattern matching for entropy bounded text
- Chen, Reif
- 1995
(Show Context)
Citation Context ...out (DC), 1=H is still an asymptotic lower bound for the estimates in (a) and (b). Remarks: 1. Applications. Entropy estimators similar to the one in (a) have already appeared in the literature [8][4]=-=[5]-=-[27][10]. They were applied to experimental data in order to determine the entropy rate of the underlying process, and were demonstrated to be very efficient, even when fed with very limited amounts o... |

1 |
Ergodic Theorems. Berlin: de Gruyter
- Krengel
- 1985
(Show Context)
Citation Context ...ation is a special case of an older and more general ergodic theorem due to Maker [14]. We prove the onesided version and then generalize it to random fields. See also Theorem 7.5 on p. 66 of Krengel =-=[13]-=-. Theorem 4. (Maker) Let T be a measure preserving transformation of a probability space (X ; B; P ) and let I denote the oe-field of invariant events. Let fg n;i g n;i1 be a two-dimensional array of ... |