Results 1 
5 of
5
Universal compression of memoryless sources over unknown alphabets
 IEEE TRANSACTIONS ON INFORMATION THEORY
, 2004
"... It has long been known that the compression redundancy of independent and identically distributed (i.i.d.) strings increases to infinity as the alphabet size grows. It is also apparent that any string can be described by separately conveying its symbols, and its pattern—the order in which the symbol ..."
Abstract

Cited by 32 (10 self)
 Add to MetaCart
It has long been known that the compression redundancy of independent and identically distributed (i.i.d.) strings increases to infinity as the alphabet size grows. It is also apparent that any string can be described by separately conveying its symbols, and its pattern—the order in which the symbols appear. Concentrating on the latter, we show that the patterns of i.i.d. strings over all, including infinite and even unknown, alphabets, can be compressed with diminishing redundancy, both in block and sequentially, and that the compression can be performed in linear time. To establish these results, we show that the number of patterns is the Bell number, that the number of patterns with a given number of symbols is the Stirling number of the second kind, and that the redundancy of patterns can be bounded using results of Hardy and Ramanujan on the number of integer partitions. The results also imply an asymptotically optimal solution for the GoodTuring probabilityestimation problem.
Limit results on pattern entropy
 IEEE Trans. Inf. Theory
, 2006
"... We determine the entropy rate of patterns of certain random processes, bound the speed at which the persymbol pattern entropy converges to this rate, and show that patterns satisfy an asymptotic equipartition property. To derive some of these results we upper bound the probability that the n ′ th v ..."
Abstract

Cited by 15 (3 self)
 Add to MetaCart
We determine the entropy rate of patterns of certain random processes, bound the speed at which the persymbol pattern entropy converges to this rate, and show that patterns satisfy an asymptotic equipartition property. To derive some of these results we upper bound the probability that the n ′ th variable in a random process differs from all preceding ones.
A lower bound on compression of unknown alphabets
 Theoret. Comput. Sci
, 2005
"... Many applications call for universal compression of strings over large, possibly infinite, alphabets. However, it has long been known that the resulting redundancy is infinite even for i.i.d. distributions. It was recently shown that the redudancy of the strings ’ patterns, which abstract the values ..."
Abstract

Cited by 10 (3 self)
 Add to MetaCart
Many applications call for universal compression of strings over large, possibly infinite, alphabets. However, it has long been known that the resulting redundancy is infinite even for i.i.d. distributions. It was recently shown that the redudancy of the strings ’ patterns, which abstract the values of the symbols, retaining only their relative precedence, is sublinear in the blocklength n, hence the persymbol redundancy diminishes to zero. In this paper we show that pattern redundancy is at least (1.5 log 2 e) n 1/3 bits. To do so, we construct a generating function whose coefficients lower bound the redundancy, and use Hayman’s saddlepoint approximation technique to determine the coefficients ’ asymptotic behavior. 1
On Universal Coding of Unordered Data
"... Abstract — There are several applications in information transfer and storage where the order of source letters is irrelevant at the destination. For these sourcedestination pairs, multiset communication rather than the more difficult task of sequence communication may be performed. In this work, w ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
Abstract — There are several applications in information transfer and storage where the order of source letters is irrelevant at the destination. For these sourcedestination pairs, multiset communication rather than the more difficult task of sequence communication may be performed. In this work, we study universal multiset communication. For classes of countablealphabet sources that meet Kieffer’s condition for sequence communication, we present a scheme that universally achieves a rate of n + o(n) bits per multiset letter for multiset communication. We also define redundancy measures that are normalized by the logarithm of the multiset size rather than per multiset letter and show that these redundancy measures cannot be driven to zero for the class of finitealphabet memoryless multisets. This further implies that finitealphabet memoryless multisets cannot be encoded universally with vanishing fractional redundancy. I.
1 Universal Coding on Infinite Alphabets: Exponentially Decreasing Envelopes
, 806
"... Abstract—This paper deals with the problem of universal lossless coding on a countable infinite alphabet. It focuses on some classes of sources defined by an envelope condition on the marginal distribution, namely exponentially decreasing envelope classes with exponent α. The minimax redundancy of e ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Abstract—This paper deals with the problem of universal lossless coding on a countable infinite alphabet. It focuses on some classes of sources defined by an envelope condition on the marginal distribution, namely exponentially decreasing envelope classes with exponent α. The minimax redundancy of exponentially decreasing envelope 1 classes is proved to be equivalent to 4α log e log2 n. Then a coding strategy is proposed, with a Bayes redundancy equivalent to the maximin redundancy. At last, an adaptive algorithm is provided, whose redundancy is equivalent to the minimax redundancy. Index Terms—Data compression, universal coding, infinite countable alphabets, redundancy, Bayes, adaptive compression. I.