## Inequalities between Entropy and Index of Coincidence derived from Information Diagrams (2001)

Venue: | IEEE Trans. Inform. Theory |

Citations: | 19 - 11 self |

### BibTeX

@ARTICLE{Harremoës01inequalitiesbetween,

author = {Peter Harremoës and Flemming Topsøe and Rønne Allé Søborg},

title = {Inequalities between Entropy and Index of Coincidence derived from Information Diagrams},

journal = {IEEE Trans. Inform. Theory},

year = {2001},

volume = {47},

pages = {2944--2960}

}

### OpenURL

### Abstract

To any discrete probability distribution P we can associate its entropy H(P) = − � pi ln pi and its index of coincidence IC(P) = � p 2 i. The main result of the paper is the determination of the precise range of the map P � (IC(P), H(P)). The range looks much like that of the map P � (Pmax, H(P)) where Pmax is the maximal point probability, cf. research from 1965 (Kovalevskij [18]) to 1994 (Feder and Merhav [7]). The earlier results, which actually focus on the probability of error 1 − Pmax rather than Pmax, can be conceived as limiting cases of results obtained by methods here presented. Ranges of maps as those indicated are called Information Diagrams. The main result gives rise to precise lower as well as upper bounds for the entropy function. Some of these bounds are essential for the exact solution of certain problems of universal coding and prediction for Bernoulli sources. Other applications concern Shannon theory (relations betweeen various measures of divergence), statistical decision theory and rate distortion theory. Two methods are developed. One is topological, another involves convex analysis and is based on a “lemma of replacement ” which is of independent interest in relation to problems of optimization of mixed type (concave/convex optimization).

### Citations

8557 |
Elements of Information Theory
- Cover, Thomas
- 1991
(Show Context)
Citation Context ...shown in Figure 2. 9sFig.2. The MR/D-diagram for n = 5. Another equivalent form of Theorem 2.1 is obtained by replacing IC(P ) by the Rényi entropy H2(P ) of order 2, cf. Rényi [22], Cover and Thomas =-=[3]-=- or Csiszár and J. Körner [5], for example. As H2(P ) = − ln IC(P ), it is a simple matter to transform the IC/H-diagram ∆n into the equivalent H2/H-diagram. The result of this transformation, again f... |

1482 |
Information theory and reliable communications
- Gallager
- 1968
(Show Context)
Citation Context ...ng that the inequality is anchored in the Uk+1- and in the Uk- type distributions. It is pretty clear that inequalities as the above are of relevance for error probability analysis, cf. e.g. Gallager =-=[10]-=-. There we also find an inequality which is closely related to (19) in the case k = 1, viz. the inequality H ≥ 1 − IC (in our notation), cf. Exercise 4.7 of [10]. Note that (19) contains the 12sfollow... |

887 |
Information Theory: Coding Theorems for Discrete Memoryless Systems
- Csiszár, Köner
- 1981
(Show Context)
Citation Context ...s of other distributions or to other quantities of interest. One of the most well known and useful inequalities of this type is Pinskers inequality (D ≥ 1 2V 2 ), cf. Pinsker [21], Csiszár and Körner =-=[5]-=-, Problem I.3.17 or Fedotov, Harremoës and Topsøe [8]. This inequality allows one to conclude convergence of distributions from smallness of divergence. Some of our inequalities allow for the same con... |

410 | Divergence measures based on the Shannon entropy
- Lin
- 1991
(Show Context)
Citation Context ...1 ln 4 is stronger than the result you get when using the exponent 2 3 . As far as the authors know, no previous instances of the power-type inequality have occured before for general n. However, Lin =-=[19]-=- proved a partial result which amounts to the inequality τ2 ≥ 1 2 . If one plots the upper bound in the power-inequality for n = 2 against the entropy function, one will not be able to tell the differ... |

335 |
On Measures of Entropy and Information
- Rényi
- 1961
(Show Context)
Citation Context ...the MR n /D-diagram is shown in Figure 2. 9sFig.2. The MR/D-diagram for n = 5. Another equivalent form of Theorem 2.1 is obtained by replacing IC(P ) by the Rényi entropy H2(P ) of order 2, cf. Rényi =-=[22]-=-, Cover and Thomas [3] or Csiszár and J. Körner [5], for example. As H2(P ) = − ln IC(P ), it is a simple matter to transform the IC/H-diagram ∆n into the equivalent H2/H-diagram. The result of this t... |

203 |
Information-type measures of difference of probability distributions and indirect observation
- Csiszár
- 1967
(Show Context)
Citation Context ...ent. However, in order that the arguments run smoothly, it is most natural to extend the reasoning behind the lemma of replacement so that it applies to a generalization of f-divergences, cf. Csiszàr =-=[4]-=-, to cases with a function f of mixed type (concave/convex or more general). We hope to return to this in a subsequent publication (announced proofs of certain results from Topsøe [30] which were plan... |

141 |
Information and Information Stability of Random Variables and Processes. San-Francisco: Holden-Day
- Pinsker
- 1964
(Show Context)
Citation Context .... divergence) to entropies of other distributions or to other quantities of interest. One of the most well known and useful inequalities of this type is Pinskers inequality (D ≥ 1 2V 2 ), cf. Pinsker =-=[21]-=-, Csiszár and Körner [5], Problem I.3.17 or Fedotov, Harremoës and Topsøe [8]. This inequality allows one to conclude convergence of distributions from smallness of divergence. Some of our inequalitie... |

115 |
Topology and Geometry
- Bredon
- 1993
(Show Context)
Citation Context ... methods from topology can be invoked. Therefore, our proof will combine general topological facts with specific computations. We first remind the reader of some basic topological notions, cf. Bredon =-=[2]-=- or Greenberg and Harber [12] or, for recent new proofs, Thomassen [27]. For a subset A of a topological space, int(A) = A ◦ denotes the interior of A and ∂(A) the boundary of A. We shall also work wi... |

76 |
Some inequalities for information divergence and related measures of discrimination
- Topsoe
(Show Context)
Citation Context ...arp further bounds are the following: 1 ≤ H(P ) + IC(P ) ≤ ln n + 1 . (25) n The left-hand inequality follows from (12) and the right-hand inequality – previously announced as equation (30) of Topsøe =-=[30]-=-– is obtained by noting that as τn ≥ τ2 = (ln 4) −1 , τn ln n ≥ 1 − 1 and then the inequality follows n from (8) and (23). Note that the lower bound in (25) is the sum of the minimum of H and the maxi... |

72 |
Algebraic topology
- Greenberg, Harper
- 1981
(Show Context)
Citation Context ...e invoked. Therefore, our proof will combine general topological facts with specific computations. We first remind the reader of some basic topological notions, cf. Bredon [2] or Greenberg and Harber =-=[12]-=- or, for recent new proofs, Thomassen [27]. For a subset A of a topological space, int(A) = A ◦ denotes the interior of A and ∂(A) the boundary of A. We shall also work with connectedness components, ... |

55 |
Asymptotic minimax regret for data compression, gambling, and prediction
- Xie, Barron
- 1996
(Show Context)
Citation Context ...zation plays an important role. A starting point is Topsøe [28]. Several other authors have also stressed the importance of the game theoretical view, cf. for example Haussler [16] and Xie and Barron =-=[34]-=-. In [15] we will collect the main theoretical results, [14] will contain specific results of universal coding and prediction and in the present paper we develop special techniques which are needed in... |

35 |
Relations between entropy and error probability
- Feder, Merhav
- 1994
(Show Context)
Citation Context ...e map P � (IC(P ), H(P )). The range looks much like that of the map P � (Pmax, H(P )) where Pmax is the maximal point probability, cf. research from 1965 (Kovalevskij [18]) to 1994 (Feder and Merhav =-=[7]-=-). The earlier results, which actually focus on the probability of error 1 − Pmax rather than Pmax, can be conceived as limiting cases of results obtained by methods here presented. Ranges of maps as ... |

35 | A general minimax result for relative entropy
- Haussler
- 1997
(Show Context)
Citation Context ... theory for which optimization plays an important role. A starting point is Topsøe [28]. Several other authors have also stressed the importance of the game theoretical view, cf. for example Haussler =-=[16]-=- and Xie and Barron [34]. In [15] we will collect the main theoretical results, [14] will contain specific results of universal coding and prediction and in the present paper we develop special techni... |

31 |
Information theoretical optimization techniques
- Topsoe
- 1979
(Show Context)
Citation Context ...goal is to consolidate and further develop a game theoretical viewpoint underlying certain basic parts of information theory for which optimization plays an important role. A starting point is Topsøe =-=[28]-=-. Several other authors have also stressed the importance of the game theoretical view, cf. for example Haussler [16] and Xie and Barron [34]. In [15] we will collect the main theoretical results, [14... |

21 |
The Jordan-Schonflies Theorem and the classification of surfaces
- Thomassen
(Show Context)
Citation Context ...ne general topological facts with specific computations. We first remind the reader of some basic topological notions, cf. Bredon [2] or Greenberg and Harber [12] or, for recent new proofs, Thomassen =-=[27]-=-. For a subset A of a topological space, int(A) = A ◦ denotes the interior of A and ∂(A) the boundary of A. We shall also work with connectedness components, just called components, but only need this... |

19 | Optimal entropy-constrained scalar quantization of a uniform source
- Gyorgy, Linder
- 2000
(Show Context)
Citation Context ... Corollary 2.2. The proof of (ii) is similar, referring this time to Corollary 2.3. The lower bound (17) (and certain extensions of these bounds) have been obtained recently also by György and Linder =-=[13]-=- who applied the results to the study of problems of quantization and rate distortion theory (for general background, see Cover and Thomas [3] and the recent survey paper by Gray 11sand Neuhoff [11]).... |

19 |
Cryptography: Theory and Practice (Boca
- Stinson
- 1995
(Show Context)
Citation Context ...2 i . (1) This quantity, which is the probability of getting “two of a kind” in two independent trials governed by the distribution P , is of significance in cryptoanalysis, cf. Friedman [9], Stinson =-=[25]-=- and Menezes et al. [20]. Simple transformations of the index of coincidence occur elsewhere in the literature as we shall comment on later. Note the trivial inequality i i i qi IC(P ) ≤ max i pi, (2)... |

16 | Bounds for entropy and divergence for distributions over a two-element set
- TOPSØE
- 2001
(Show Context)
Citation Context ...d limn→∞ τn = 1. hn(x) lim n→∞ ln n = 1 − x, 1 ln 4 = τ2 < τ3 < · · · Proof. In view of Lemma 6.3, (i), the inequalities τ2 < τ3 < · · · follow readily. The determination of τ2 can be found in Topsøe =-=[31]-=-. Clearly, g∞ is decreasing in ]0, 1] and thus assumes its minimal value 1 for x = 1. Choose n0 such that gn0( 1 2 ) > 1. Then, for n ≥ n0, gn assumes its minimal value (τn) in [ 1 2 , 1]. As gn conve... |

9 |
Uncertainty and the probability of error
- Tebbe, Dwyer
- 1968
(Show Context)
Citation Context ...ties (due to the form of inequalities considered it is in fact αk + βk and βk that appears in previous research). To be precise, we refer to equations (12) in Kovalevskij [18], (6) in Tebbe and Dwyer =-=[26]-=-, (29) in Ben-Bassat [1] and, finally, to equation (14) in Feder and Merhav [7]. An explanation for this phenomenon is given in the discussion. As indicated in the announcement [29], the inequality (1... |

8 |
Handbook of Applied Cryptography, Boca
- Menezes, Oorschot, et al.
- 1997
(Show Context)
Citation Context ... which is the probability of getting “two of a kind” in two independent trials governed by the distribution P , is of significance in cryptoanalysis, cf. Friedman [9], Stinson [25] and Menezes et al. =-=[20]-=-. Simple transformations of the index of coincidence occur elsewhere in the literature as we shall comment on later. Note the trivial inequality i i i qi IC(P ) ≤ max i pi, (2) in particular IC(P ) ≤ ... |

6 |
f-entropies, probability of error, and feature selection
- Ben-Bassat
(Show Context)
Citation Context ...Neuhoff [11]). That this type of research also leads to certain diagrams as the IC/H-diagrams can be seen as follows (brief indication): Consider a random variable X which is uniformly distributed on =-=[0, 1]-=-. If Q is a nearest neighbour quantizer with finite range and if we use Euclidean distance as distortion measure, then – after a simple computation – it is seen that the distortion of Q equals 1 4 IC(... |

6 |
The problem of character recognition from the point of view of mathematical statistics
- Kovalevskij
- 1967
(Show Context)
Citation Context ...ation of the precise range of the map P � (IC(P ), H(P )). The range looks much like that of the map P � (Pmax, H(P )) where Pmax is the maximal point probability, cf. research from 1965 (Kovalevskij =-=[18]-=-) to 1994 (Feder and Merhav [7]). The earlier results, which actually focus on the probability of error 1 − Pmax rather than Pmax, can be conceived as limiting cases of results obtained by methods her... |

4 |
Instances of exact prediction and a new type of inequalities obtained by anchoring
- Topsoe
- 1999
(Show Context)
Citation Context ...nt way, we first introduce the constants � ek = 1 + 1 �k , k ≥ 1. (18) k Note that as k increases, ek increases and has e as its limit value. We can then state another key result, announced in Topsøe =-=[29]-=-: Theorem 2.6. For any P ∈ M 1 +(N) and any k ≥ 1, the inequality holds, with the constants αk and βk defined by H(P ) ≥ αk − βkIC(P ) (19) αk = ln(k + 1) + ln ek , βk = (k + 1) ln ek, k ≥ 1. (20) Pro... |

3 |
Bounds of the minimal error probability on checking a finite or countable number of hypothesis,” Information Transmission Problems
- VAJDA
- 1968
(Show Context)
Citation Context ...nstant, to a class of entropy-like functions first considered by Havrda and Charvát, cf. [17]. The important case considered here was called, again apart from a constant, “quadratic entropy” by Vajda =-=[32]-=- and reintroduced by Daróczy [6]. More details about 5sprevious research in this area (axiomatics and basic properties) can be found in Vajda and Vaˇsek [33] and in references quoted there. The main p... |

3 |
concave entropies, and comparison of experiments
- VAJDA, VAŠEK, et al.
- 1985
(Show Context)
Citation Context ...m a constant, “quadratic entropy” by Vajda [32] and reintroduced by Daróczy [6]. More details about 5sprevious research in this area (axiomatics and basic properties) can be found in Vajda and Vaˇsek =-=[33]-=- and in references quoted there. The main purpose of the paper is to study the relationship between D(P �Un) and MRn(P ) (or, equivalently, χ2 (P, Un)). Qualitatively, D(P �Un) and MRn(P ) both measur... |

2 |
Concept of structural α-entropy
- Havrda, Charvat
(Show Context)
Citation Context ...nequalities we will study. We note that this quantity behaves as a kind of entropy and belongs, apart from a constant, to a class of entropy-like functions first considered by Havrda and Charvát, cf. =-=[17]-=-. The important case considered here was called, again apart from a constant, “quadratic entropy” by Vajda [32] and reintroduced by Daróczy [6]. More details about 5sprevious research in this area (ax... |

2 |
Bounds on Entropy in a Guessing Game
- Santis, Vaccaro
- 2001
(Show Context)
Citation Context ...study by György and Linder [13] deals with problems of quantization and rate distortion and in this connection they also discovered the lower cascade related to the ICα/H-diagrams. Finally, the paper =-=[23]-=- by Santis, Gaggia and Vaccaro may be the last published research in this direction. Let us also briefly discuss an extension of the index of coincidence to indices of order α defined by ICα(P ) = � p... |

1 |
Daróczy,“Generalized information functions
- unknown authors
- 1970
(Show Context)
Citation Context ...e functions first considered by Havrda and Charvát, cf. [17]. The important case considered here was called, again apart from a constant, “quadratic entropy” by Vajda [32] and reintroduced by Daróczy =-=[6]-=-. More details about 5sprevious research in this area (axiomatics and basic properties) can be found in Vajda and Vaˇsek [33] and in references quoted there. The main purpose of the paper is to study ... |

1 |
The Index of Coincidence and its Applications in Cryptoanalysis
- Friedman
- 1987
(Show Context)
Citation Context ...IC(P ) = � p 2 i . (1) This quantity, which is the probability of getting “two of a kind” in two independent trials governed by the distribution P , is of significance in cryptoanalysis, cf. Friedman =-=[9]-=-, Stinson [25] and Menezes et al. [20]. Simple transformations of the index of coincidence occur elsewhere in the literature as we shall comment on later. Note the trivial inequality i i i qi IC(P ) ≤... |

1 |
On Coding by Probability Transformation. Konstantz: HartungGorre
- Sayir
- 1999
(Show Context)
Citation Context ...[18] in 1965. Further research, partly extending Kovalevskij’s results, partly rediscovering them, includes papers by Tebbe and Dwyer [26], Ben-Bassat [1] and Feder and Merhav [7]. The study by Sayir =-=[24]-=- shows how diagrams with a shape as those considered here and in the previous literature come up in an experimental study. 37 i i p α i � .sA recent independent study by György and Linder [13] deals w... |