## A lower bound on compression of unknown alphabets (2005)

Venue: | Theoret. Comput. Sci |

Citations: | 10 - 3 self |

### BibTeX

@ARTICLE{Jevtić05alower,

author = {Nikola Jevtić and Alon Orlitsky and Narayana P. Santhanam},

title = {A lower bound on compression of unknown alphabets},

journal = {Theoret. Comput. Sci},

year = {2005}

}

### OpenURL

### Abstract

Many applications call for universal compression of strings over large, possibly infinite, alphabets. However, it has long been known that the resulting redundancy is infinite even for i.i.d. distributions. It was recently shown that the redudancy of the strings ’ patterns, which abstract the values of the symbols, retaining only their relative precedence, is sublinear in the blocklength n, hence the per-symbol redundancy diminishes to zero. In this paper we show that pattern redundancy is at least (1.5 log 2 e) n 1/3 bits. To do so, we construct a generating function whose coefficients lower bound the redundancy, and use Hayman’s saddle-point approximation technique to determine the coefficients ’ asymptotic behavior. 1

### Citations

1454 |
An Introduction to Probability Theory
- Feller
- 1971
(Show Context)
Citation Context ... that the contribution of the second coefficient is large enough to satisfy fast taper. We choose a φ based on these criteria, and then prove that our choice indeed works. We also use Feller’s bounds =-=[35]-=- on Stirling’s approximation for all n ≥ 1, √ � n �n 2πn ≤ n! ≤ e √ � n 2πn e k=1 � z=x � � n e 1 12n , (11) extensively in the paper. Further, we shall denote by C positive constants that are, in par... |

850 | An empirical study of smoothing technique for language modeling
- Chen, Goodman
- 1999
(Show Context)
Citation Context ...e first letter to appear (a), followed by the second letter to appear (b), then by the third to appear (r), the first that appeared (a again), the fourth (c), the first (a), etc. In many applications =-=[18, 19, 20, 21, 22, 23, 24]-=-, the description of patterns is more important than the dictionary. For example, in language modeling, the pattern reflects the structure of the language while the dictionary plays a less important p... |

347 |
Universal codeword sets and representations of the integers
- Elias
- 1975
(Show Context)
Citation Context ...y avoided direct compression of sequences over infinite or large alphabets. Therefore, several researchers have attempted to get around Kieffer’s negative result. One line of work, along the lines of =-=[11, 12, 13, 14, 15]-=- constructed compression algorithms for collections satisfying Kieffer’s condition. For example, [11] considered the collection of i.i.d. distributions that assign non-increasing probabilities to posi... |

275 |
Fisher information and stochastic complexity
- Rissanen
- 1996
(Show Context)
Citation Context ...ere throughout the paper, logarithms are taken to base 2. For the collection I n m of i.i.d. distributions over length-n strings from an alphabet of a fixed size m, a number of researchers have shown =-=[2, 3, 4, 5, 6, 7, 8, 9]-=- that as n increases ˆR(I n m) = m − 1 2 log n 2π + log Γm ( 1 2 ) Γ( m 2 ) + om(1), (2) where Γ is the gamma function, and the om(1) term diminishes with increasing n at a rate determined by m. This ... |

214 |
Average Case Analysis of Algorithms on Sequences
- Szpankowski
- 2001
(Show Context)
Citation Context ...otics of the coefficients of power series that satisfy certain properties, which, as shown later, G(z) also satisfies. In this section we describe Hayman’s analysis. We follow the terminology used in =-=[32]-=-. Theorem 1. [Hayman] For f(z) = k=1 ∞� anz n , n=0 7slet a(z) def = d log f(z) d log z and let the saddle point rn be the solution of and b(z) def = d2 log f(z) d(log z) 2 = za′ (z), (9) a(rn) = n. I... |

193 | A general language model for information retrieval
- Song, Croft
- 1999
(Show Context)
Citation Context ...e first letter to appear (a), followed by the second letter to appear (b), then by the third to appear (r), the first that appeared (a again), the fourth (c), the first (a), etc. In many applications =-=[18, 19, 20, 21, 22, 23, 24]-=-, the description of patterns is more important than the dictionary. For example, in language modeling, the pattern reflects the structure of the language while the dictionary plays a less important p... |

150 | Universal portfolios
- COVER
- 1991
(Show Context)
Citation Context ...ere throughout the paper, logarithms are taken to base 2. For the collection I n m of i.i.d. distributions over length-n strings from an alphabet of a fixed size m, a number of researchers have shown =-=[2, 3, 4, 5, 6, 7, 8, 9]-=- that as n increases ˆR(I n m) = m − 1 2 log n 2π + log Γm ( 1 2 ) Γ( m 2 ) + om(1), (2) where Γ is the gamma function, and the om(1) term diminishes with increasing n at a rate determined by m. This ... |

125 |
Universal Sequential Coding of Single Messages
- Shtar‘kov
- 1987
(Show Context)
Citation Context ...imum number of extra bits used by any universal code in the worst case is the redundancy, ˆ R(P) of the collection P of distributions. 1sLet P be a collection of distributions over a set X . Shtarkov =-=[1]-=- showed that � � ˆR(P) = log max p∈P p(x) � , (1) x∈X where throughout the paper, logarithms are taken to base 2. For the collection I n m of i.i.d. distributions over length-n strings from an alphabe... |

103 | A game of prediction with expert advice
- Vovk
- 1995
(Show Context)
Citation Context ...e first letter to appear (a), followed by the second letter to appear (b), then by the third to appear (r), the first that appeared (a again), the fourth (c), the first (a), etc. In many applications =-=[18, 19, 20, 21, 22, 23, 24]-=-, the description of patterns is more important than the dictionary. For example, in language modeling, the pattern reflects the structure of the language while the dictionary plays a less important p... |

85 | Universal portfolios with side information
- Cover, Ordentlich
- 1996
(Show Context)
Citation Context ...ere throughout the paper, logarithms are taken to base 2. For the collection I n m of i.i.d. distributions over length-n strings from an alphabet of a fixed size m, a number of researchers have shown =-=[2, 3, 4, 5, 6, 7, 8, 9]-=- that as n increases ˆR(I n m) = m − 1 2 log n 2π + log Γm ( 1 2 ) Γ( m 2 ) + om(1), (2) where Γ is the gamma function, and the om(1) term diminishes with increasing n at a rate determined by m. This ... |

55 |
Asymptotic minimax regret for data compression, gambling, and prediction
- Xie, Barron
- 1996
(Show Context)
Citation Context |

51 |
Probability Scoring for Spelling Correction
- Church, Gale
- 1991
(Show Context)
Citation Context |

49 |
A decision-theoretic extension of stochastic complexity and its applications to learning
- Yamanishi
- 1998
(Show Context)
Citation Context |

42 |
Always Good Turing: Asymptotically optimal probability estimation
- Orlitsky, Santhanam, et al.
- 2003
(Show Context)
Citation Context ...own, e.g., [25], that the i.i.d. distribution assigning the highest probability to a pattern is the best i.i.d. distribution to explain this statistic. For a detailed discussion along this angle, see =-=[28, 29]-=-. 2 Patterns and their redundancy We formally define patterns and discuss their compression. Let A be any alphabet. For x = x1 . . . xn def = xn 1 ∈ An , A(x) def = {x1, . . . ,xn} is the set of symbo... |

39 |
A unified approach to weak universal source coding
- Kieffer
- 1978
(Show Context)
Citation Context ...plications, such as language modeling, text, and image compression, the alphabet size m is large, often comparable to the blocklength, and the redundancy calculated in (2) high. In the limit, Kieffer =-=[10]-=- showed that universal compression of i.i.d. sequences over infinite alphabets requires infinite per-symbol redundancy, and specified the condition for a collection of distributions to have negligible... |

32 | Universal compression of memoryless sources over unknown alphabets
- Orlitsky, Santhanam, et al.
- 2004
(Show Context)
Citation Context ...(1) implies that the pattern redundancy of I n , i.e., the redundancy of the collection of distributions induced on patterns by I n is ˆR(I n ⎛ Ψ) = log ⎝ � ⎞ max p(ψ) ⎠. n ψ∈Ψn p∈I It has been shown =-=[25]-=- that, irrespective of the alphabet size, patterns of i.i.d. distributed strings can be compressed with redundancy of at most ˆR(I n � � � 2 √n Ψ) ≤ π log e 3 bits. Hence as the blocklength n grows, t... |

30 |
A Method for Disambiguating Word Senses
- Gale, Church, et al.
- 1993
(Show Context)
Citation Context |

29 |
The performance of universal coding
- Krichevsky, Trofimov
- 1981
(Show Context)
Citation Context |

17 |
A Generalisation of Stirling f s Formula." Journal fur die reine und angewandte Mathematik 196
- Hayman
(Show Context)
Citation Context ... g(n) appears to be difficult. Instead, we evaluate a generating function of g(n), G(z) def = ∞� n=0 ⎞ ⎠ g(n) nn n! zn , (7) from which the asymptotics of g(n) can be obtained using Hayman’s analysis =-=[31]-=-. To express the generating function G(z) in a more explicit form, observe that thus yielding 4 Hayman’s analysis ∞� � G(z) = n=0 (ϕ1,...,ϕn)∈Φn � � � µ µ z µ ϕµ 1 µ! ϕµ! µ≥1 = � (ϕ1,...,ϕn)∈Φn � � � ... |

16 |
Smeets, “Multialphabet coding with separate alphabet description
- ˚Aberg, Shtarkov, et al.
- 1997
(Show Context)
Citation Context ...ms for collections satisfying Kieffer’s condition. For example, [11] considered the collection of i.i.d. distributions that assign non-increasing probabilities to positive integers. A second approach =-=[16, 17]-=- does not restrict the collection of distributions, but separates the description of the sequence into two parts: a description of the symbols appearing in the string, and of the pattern they form. Fo... |

15 |
Speaking of infinity
- Orlitsky, Santhanam
(Show Context)
Citation Context ... 2 ⎞ ⎠, log n 2π + log Γm ( 1 2 ) Γ( m 2 ) + om(1), (4) 3spresented in Appendix 1 of [4]. However, it is not clear whether the latter bound holds when m grows with n (see discussion in Section 5.2 of =-=[26]-=-). If Bound (4) held for m growing as 3√ n, then it could be applied to obtain a lower bound on ˆ R(In Ψ ), as described in [16]. However, this approach of [16] would yield only the (matching) leading... |

14 | On asymptotics of certain recurrences arising in universal coding
- Szpankowski
- 1998
(Show Context)
Citation Context |

13 |
Multi-alphabet universal coding of memoryless sources
- Shtarkov, Tjalkens, et al.
- 1995
(Show Context)
Citation Context |

12 |
Universal compression of unknown alphabets
- Jevtić, Orlitsky, et al.
(Show Context)
Citation Context ...ms for collections satisfying Kieffer’s condition. For example, [11] considered the collection of i.i.d. distributions that assign non-increasing probabilities to positive integers. A second approach =-=[16, 17]-=- does not restrict the collection of distributions, but separates the description of the sequence into two parts: a description of the symbols appearing in the string, and of the pattern they form. Fo... |

11 |
der Meulen. On universal noiseless source coding for infinite source alphabets
- Gyorfi, Pali, et al.
- 1993
(Show Context)
Citation Context ...y avoided direct compression of sequences over infinite or large alphabets. Therefore, several researchers have attempted to get around Kieffer’s negative result. One line of work, along the lines of =-=[11, 12, 13, 14, 15]-=- constructed compression algorithms for collections satisfying Kieffer’s condition. For example, [11] considered the collection of i.i.d. distributions that assign non-increasing probabilities to posi... |

11 | Universal lossless compression with unknown alphabets—the average case
- Shamir
- 2006
(Show Context)
Citation Context ...yield only the (matching) leading coefficient of Bound (3), and it can be shown that if additional coefficients were calculated, they would not exceed those in Bound (3). We note that recently Shamir =-=[27]-=- showed that a weaker form of the lower bound in Corollary 11 applies to average-case redundancy. An interesting property of patterns arises also in connection with Good Turing estimators. Application... |

10 | Universal codes for finite sequences of integers drawn from a monotone distribution
- Foster, Stine, et al.
- 2002
(Show Context)
Citation Context ...y avoided direct compression of sequences over infinite or large alphabets. Therefore, several researchers have attempted to get around Kieffer’s negative result. One line of work, along the lines of =-=[11, 12, 13, 14, 15]-=- constructed compression algorithms for collections satisfying Kieffer’s condition. For example, [11] considered the collection of i.i.d. distributions that assign non-increasing probabilities to posi... |

8 | One-way communication and error-correcting codes
- Orlitsky, Viswanathan
- 2003
(Show Context)
Citation Context ...= 1, and, since any continuous distribution p has p(12 . . . n) = 1, we derive ˆp(12 . . . n) = 1. In general it is difficult to determine the maximum probability of a pattern. For example, some work =-=[30]-=- is needed to show that ˆp(112) = 1 4 . Since it is difficult to obtain the maximum probability of patterns, it is difficult to compute the pattern redundancy of I n exactly. In [25], an upper bound w... |

7 |
The universality of grammar-based codes for sources with countably infinite alphabets
- He, Yang
- 2005
(Show Context)
Citation Context |

7 | The average case analysis of algorithms: saddle point asymptotics, Rapport de recherche no
- Flajolet, Sedgewick
- 1994
(Show Context)
Citation Context ...lue of Cauchy’s integral � C f(z)/zn+1 around a contour C through the saddle point rn is captured by a short arc around rn. For more details on the saddle point approximation and related results, see =-=[31, 33, 32]-=-. For the generating function G defined in Equation (8), the functions a(z) and b(z) of Equation (9) are a(z) = ∞� kk+1zk k=1 k! and b(z) = ∞� kk+2zk . (10) k! We pick R1 = 1 6 e . The first two condi... |

6 | Minimax regret under log loss for general classes of experts
- Cesa-Bianchi, Lugosi
- 1999
(Show Context)
Citation Context |

4 | The precise minimax redundancy
- Drmota, Szpankowski
- 2002
(Show Context)
Citation Context |

4 |
Asymptotic optimality of two variations of Lempel-Ziv codes for sources with countably infinite alphabet
- Uyematsu, Kanaya
- 1975
(Show Context)
Citation Context |

1 |
Ratio Test.” From Mathworld—A Wolfram Web Resource. http://mathworld.wolfram.com/RatioTest.html
- Weisstein
(Show Context)
Citation Context ... We first check for convergence of each of the summations over k. Indeed, Lemma 2. For any l, � ∞ k=1 kk+l x k k! converges for x < 1 e . l! ∞� kk+lxk � � � � k! Proof By the Cauchy ratio test, e.g., =-=[34]-=-. ✷ Therefore, in order to evaluate the n ′ th coefficient in the Taylor series, Hayman’s theorem approximates the value of G(z) in the complex integration over the circle |z| = x by a correction over... |

1 |
A generalization of Stirling's formula. Journal f"ur die reine und angewandte Mathematik
- Hayman
- 1956
(Show Context)
Citation Context ...n of g(n) appears to be difficult. Instead, we evaluate a generating function of g(n), G(z) def= 1X n=0 g(n) n n n! z n, (7) from which the asymptotics of g(n) can be obtained using Hayman's analysis =-=[31]-=-. To express the generating function G(z) in a more explicit form, observe that G(z) = 1X n=0 X('1,...,'n)2\Phi n Yu>=1 ` uuzu u! ' 'u 1 'u! = X ('1,...,'n)2\Phi n Yu>=1 ` uuzu u! ' 'u 1 'u! = Y u>=1 ... |

1 |
Ratio Test." From Mathworld--A Wolfram Web Resource. http://mathworld.wolfram.com/RatioTest.html
- Weisstein
(Show Context)
Citation Context ...=1 kk+lxk k! fififififiz=x! . We first check for convergence of each of the summations over k. Indeed, Lemma 2. For any l, P1k=1 k k+lxk k! converges for x < 1e .Proof By the Cauchy ratio test, e.g., =-=[34]-=-. 2 Therefore, in order to evaluate the n0th coefficient in the Taylor series, Hayman's theorem approximates the value of G(z) in the complex integration over the circle |z| = x by a correction over t... |