## Maximum entropy fundamentals (2001)

Venue: | Entropy |

Citations: | 16 - 7 self |

### BibTeX

@ARTICLE{Harremoës01maximumentropy,

author = {P. Harremoës and F. Topsøe},

title = {Maximum entropy fundamentals},

journal = {Entropy},

year = {2001},

volume = {3},

pages = {191--226}

}

### OpenURL

### Abstract

entropy

### Citations

8563 |
Elements of Information Theory
- Cover, Thomas
- 2006
(Show Context)
Citation Context ...atisfy Kraft’s equality � exp(−κi) = 1 . (2.4) i∈A This case corresponds to codes without superfluous digits. For further motivation, the reader may wish to consult [23] or standard textbooks such as =-=[3]-=- and [6]. Elements in K(A) are compact codes, for short just codes.sEntropy, 2001, 3 196 For mathematical convenience, we shall work with exponentials and logarithms to the base e. For κ ∈ ∼ K(A) and ... |

1482 |
Information theory and reliable communications
- Gallager
- 1968
(Show Context)
Citation Context ...raft’s equality � exp(−κi) = 1 . (2.4) i∈A This case corresponds to codes without superfluous digits. For further motivation, the reader may wish to consult [23] or standard textbooks such as [3] and =-=[6]-=-. Elements in K(A) are compact codes, for short just codes.sEntropy, 2001, 3 196 For mathematical convenience, we shall work with exponentials and logarithms to the base e. For κ ∈ ∼ K(A) and i ∈ A, κ... |

1152 | Information Theory and Statistics
- Kullback
- 1968
(Show Context)
Citation Context ...f Hmax(P). Note also that the proof gave the more precise bound J(P, P ∗ ) ≤ R(κ|P)−H(P ) with assumptions as in the theorem and with J(· , ·) denoting Jeffrey’s measure of discrimination, cf. [3] or =-=[16]-=-. Corollary 6.4. Assume that the model P has a cost-stable code κ ∗ and let P ∗ be the matching distribution. Then P is in equilibrium and has (κ ∗ , P ∗ ) as optimal matching pair if and only if P ∗ ... |

980 |
Human behavior and the principle of least effort
- Zipf
- 1949
(Show Context)
Citation Context ...(8.38) In the sequel we shall typically work with distributions which are ordered in the above sense. The terminology regarding hyperbolic distributions is inspired by [19] but goes back further, cf. =-=[24]-=-. In these references the reader will find remarks and results pertaining to this and related typessEntropy, 2001, 3 217 of distributions and their discovery from empirical studies which we will also ... |

668 |
Information theory and statistical mechanics
- Jaynes
- 1957
(Show Context)
Citation Context ...IMPAN–BC, the Banach Center, Warsaw.sEntropy, 2001, 3 193 1 The Maximum Entropy Principle – overview and a generic example The Maximum Entropy Principle as conceived in its modern form by Jaynes, cf. =-=[11]-=-, [12] and [13], is easy to formulate: “Given a model of probability distributions, choose the distribution with highest entropy.” With this choice you single out the most significant distribution, th... |

258 |
I-divergence geometry of probability distributions and minimization problems. The Annals of Probability
- Csiszár
- 1975
(Show Context)
Citation Context ... ∈ ∆ and let (Pn) ⊆ Px be an asymptotically optimal sequence for Px. Then the condition Hmax(Px) = x is equivalent with the condition H(Pn) → x, ∗ Terminology is close to that adopted by Csiszár, cf. =-=[4]-=-, [5], who first developed the concept for closed models. This was later extended, using a different terminology, in Topsøe [20]. In this paper we refrain from a closer study of I-projections and refe... |

76 |
Sanov property, generalized I-projection and a conditional limit theorem
- Csiszár
- 1984
(Show Context)
Citation Context ...and let (Pn) ⊆ Px be an asymptotically optimal sequence for Px. Then the condition Hmax(Px) = x is equivalent with the condition H(Pn) → x, ∗ Terminology is close to that adopted by Csiszár, cf. [4], =-=[5]-=-, who first developed the concept for closed models. This was later extended, using a different terminology, in Topsøe [20]. In this paper we refrain from a closer study of I-projections and refer the... |

72 |
Maximum-Entropy Models in Science and Engineering
- Kapur
- 1989
(Show Context)
Citation Context .... For practically all applications, the key example which is taken as point of departure – and often the only example discussed – is that of models prescribed by moment conditions. We refer to Kapur, =-=[14]-=- for a large collection of examples as well as a long list of references. In this section we present models defined by just one moment condition. These special models will later be used to illustrate ... |

37 |
Optima and Equilibria, An Introduction to Nonlinear Analysis
- Aubin
- 1993
(Show Context)
Citation Context ...l κ ∗ a Nash equilibrium code for the model P if 〈κ ∗ , P 〉 ≤ 〈κ ∗ , P ∗ 〉 ; P ∈ P (4.14)sEntropy, 2001, 3 201 and if H(P ∗ ) < ∞. The terminology is adapted from mathematical economy, cf. e.g. Aubin =-=[2]-=-. The requirement can be written R(κ ∗ |P) ≤ H(P ∗ ) < ∞. Note that here we insist that a Nash equilibrium code be P-adapted. This condition will later be relaxed. Theorem 4.1. Let P be a model and as... |

35 | A general minimax result for relative entropy
- Haussler
- 1997
(Show Context)
Citation Context ... The cost-function, seen from the point of view of Player II, is the map P × ∼ K(A) → [0; ∞] given by the average code length: (P, κ) � 〈κ, P 〉. This game was introduced in [20], see also [15], [21], =-=[10]-=-, [8] and [22]. Player I may be taken to represent “the system”, “Nature”, “God” or · · · , whereas Player II represents “the observer”, “the statistician” or · · · . We can motivate the game introduc... |

33 |
The general theory of Dirichlet’s series
- Hardy, Riesz
- 1972
(Show Context)
Citation Context ... Clearly, Z is decreasing on [1; ∞[ and Z(β) → 0 for β → ∞ (note that, given K, e −βκi ≤ e −K e −κi for all i when β is large enough). The series defining Z is a Dirichlet-series, cf. Hardy and Riesz =-=[7]-=- or Mandelbrojt [17]. The abscissa of convergence we denote by γ. Thus, by definition, Z(β) < ∞ for β > γ and Z(β) = ∞ for β < γ. As Z(1) = 1, γ ≤ 1. If supp (κ) is infinite, Z(β) = ∞ for β ≤ 0, hence... |

32 | Convergence properties of functional estimates for discrete distributions,” Random Struct. Alg
- Antos, Kontoyiannis
- 2001
(Show Context)
Citation Context ...s of the entropy here considered but, precisely in the regime where Zipf’s law holds, such a study is very difficult assEntropy, 2001, 3 221 convergence of estimators of the entropy is very slow, cf. =-=[1]-=-. A The partition function In this appendix we collect some basic facts about partition functions associated with one linear constraint. The point of departure is a code κ ∈ K(A). With κ we associate ... |

31 |
Information theoretical optimization techniques
- Topsoe
- 1979
(Show Context)
Citation Context ... II chooses a general code. The cost-function, seen from the point of view of Player II, is the map P × ∼ K(A) → [0; ∞] given by the average code length: (P, κ) � 〈κ, P 〉. This game was introduced in =-=[20]-=-, see also [15], [21], [10], [8] and [22]. Player I may be taken to represent “the system”, “Nature”, “God” or · · · , whereas Player II represents “the observer”, “the statistician” or · · · . We can... |

27 | On the theory of word frequencies and on related Markovian models of discourse - Mandelbrot - 1953 |

25 |
Binomial and Poisson distributions as maximum entropy distributions
- Harremoës
- 2001
(Show Context)
Citation Context ...t-up. In such cases, new techniques are required in the search for a maximum entropy distribution. As examples of this difficulty we point to models involving binomial or empirical distributions, cf. =-=[8]-=- and [22]. After presentation of preliminary material, we introduce in Section 3 the basic concepts related to the game we shall study. Then follows a section which quickly leads to familiar key resul... |

17 | Clearing up mysteries - the original goal
- Jaynes
- 1989
(Show Context)
Citation Context ...BC, the Banach Center, Warsaw.sEntropy, 2001, 3 193 1 The Maximum Entropy Principle – overview and a generic example The Maximum Entropy Principle as conceived in its modern form by Jaynes, cf. [11], =-=[12]-=- and [13], is easy to formulate: “Given a model of probability distributions, choose the distribution with highest entropy.” With this choice you single out the most significant distribution, the leas... |

11 |
Maximum entropy versus minimum risk and applications to some classical discrete distributions
- Topsøe
- 2006
(Show Context)
Citation Context ... such cases, new techniques are required in the search for a maximum entropy distribution. As examples of this difficulty we point to models involving binomial or empirical distributions, cf. [8] and =-=[22]-=-. After presentation of preliminary material, we introduce in Section 3 the basic concepts related to the game we shall study. Then follows a section which quickly leads to familiar key results. The m... |

8 | Basic concepts, identities and inequalities – the toolkit of informationtheory
- Topsøe
(Show Context)
Citation Context ... of mappings κ : A → [0; ∞] which satisfy Kraft’s equality � exp(−κi) = 1 . (2.4) i∈A This case corresponds to codes without superfluous digits. For further motivation, the reader may wish to consult =-=[23]-=- or standard textbooks such as [3] and [6]. Elements in K(A) are compact codes, for short just codes.sEntropy, 2001, 3 196 For mathematical convenience, we shall work with exponentials and logarithms ... |

7 |
Game-theoretical equilibrium, maximum entropy and minimum information discrimination, Maximum entropy and Bayesian methods
- Topsoe
- 1992
(Show Context)
Citation Context ... code. The cost-function, seen from the point of view of Player II, is the map P × ∼ K(A) → [0; ∞] given by the average code length: (P, κ) � 〈κ, P 〉. This game was introduced in [20], see also [15], =-=[21]-=-, [10], [8] and [22]. Player I may be taken to represent “the system”, “Nature”, “God” or · · · , whereas Player II represents “the observer”, “the statistician” or · · · . We can motivate the game in... |

5 |
The Information Topology
- Harremoës
(Show Context)
Citation Context ...s of this paper, the reader needs only worry about sequences D but it is comforting to know that the sequential notion Pn → P is indeed a topological notion of convergence. Further details will be in =-=[9]-=-. An important connection between total variation and divergence is expressed by Pinsker’s inequality: i D(P �Q) ≥ 1 2 V (P, Q)2 , (2.7) which shows that convergence in the information topology is str... |

3 |
Robust Noiceless Source Coding Through a Game Theoretic Approach
- Kazakos
- 1983
(Show Context)
Citation Context ...eneral code. The cost-function, seen from the point of view of Player II, is the map P × ∼ K(A) → [0; ∞] given by the average code length: (P, κ) � 〈κ, P 〉. This game was introduced in [20], see also =-=[15]-=-, [21], [10], [8] and [22]. Player I may be taken to represent “the system”, “Nature”, “God” or · · · , whereas Player II represents “the observer”, “the statistician” or · · · . We can motivate the g... |

1 |
Series de Dirichlet
- Mandelbrot
- 1969
(Show Context)
Citation Context ...easing on [1; ∞[ and Z(β) → 0 for β → ∞ (note that, given K, e −βκi ≤ e −K e −κi for all i when β is large enough). The series defining Z is a Dirichlet-series, cf. Hardy and Riesz [7] or Mandelbrojt =-=[17]-=-. The abscissa of convergence we denote by γ. Thus, by definition, Z(β) < ∞ for β > γ and Z(β) = ∞ for β < γ. As Z(1) = 1, γ ≤ 1. If supp (κ) is infinite, Z(β) = ∞ for β ≤ 0, hence γ ≥ 0. If supp (κ) ... |