Results 1 
8 of
8
Universal compression of memoryless sources over unknown alphabets
 IEEE TRANSACTIONS ON INFORMATION THEORY
, 2004
"... It has long been known that the compression redundancy of independent and identically distributed (i.i.d.) strings increases to infinity as the alphabet size grows. It is also apparent that any string can be described by separately conveying its symbols, and its pattern—the order in which the symbol ..."
Abstract

Cited by 32 (10 self)
 Add to MetaCart
It has long been known that the compression redundancy of independent and identically distributed (i.i.d.) strings increases to infinity as the alphabet size grows. It is also apparent that any string can be described by separately conveying its symbols, and its pattern—the order in which the symbols appear. Concentrating on the latter, we show that the patterns of i.i.d. strings over all, including infinite and even unknown, alphabets, can be compressed with diminishing redundancy, both in block and sequentially, and that the compression can be performed in linear time. To establish these results, we show that the number of patterns is the Bell number, that the number of patterns with a given number of symbols is the Stirling number of the second kind, and that the redundancy of patterns can be bounded using results of Hardy and Ramanujan on the number of integer partitions. The results also imply an asymptotically optimal solution for the GoodTuring probabilityestimation problem.
Limit results on pattern entropy
 IEEE Trans. Inf. Theory
, 2006
"... We determine the entropy rate of patterns of certain random processes, bound the speed at which the persymbol pattern entropy converges to this rate, and show that patterns satisfy an asymptotic equipartition property. To derive some of these results we upper bound the probability that the n ′ th v ..."
Abstract

Cited by 15 (3 self)
 Add to MetaCart
We determine the entropy rate of patterns of certain random processes, bound the speed at which the persymbol pattern entropy converges to this rate, and show that patterns satisfy an asymptotic equipartition property. To derive some of these results we upper bound the probability that the n ′ th variable in a random process differs from all preceding ones.
A lower bound on compression of unknown alphabets
 Theoret. Comput. Sci
, 2005
"... Many applications call for universal compression of strings over large, possibly infinite, alphabets. However, it has long been known that the resulting redundancy is infinite even for i.i.d. distributions. It was recently shown that the redudancy of the strings ’ patterns, which abstract the values ..."
Abstract

Cited by 10 (3 self)
 Add to MetaCart
Many applications call for universal compression of strings over large, possibly infinite, alphabets. However, it has long been known that the resulting redundancy is infinite even for i.i.d. distributions. It was recently shown that the redudancy of the strings ’ patterns, which abstract the values of the symbols, retaining only their relative precedence, is sublinear in the blocklength n, hence the persymbol redundancy diminishes to zero. In this paper we show that pattern redundancy is at least (1.5 log 2 e) n 1/3 bits. To do so, we construct a generating function whose coefficients lower bound the redundancy, and use Hayman’s saddlepoint approximation technique to determine the coefficients ’ asymptotic behavior. 1
Minimax Pointwise Redundancy for Memoryless Models over Large Alphabets
"... We study the minimax pointwise redundancy of universal coding for memoryless models over large alphabets and present two main results: We first complete studies initiated in Orlitsky and Santhanam [15] deriving precise asymptotics of the minimax pointwise redundancy for all ranges of the alphabet s ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
We study the minimax pointwise redundancy of universal coding for memoryless models over large alphabets and present two main results: We first complete studies initiated in Orlitsky and Santhanam [15] deriving precise asymptotics of the minimax pointwise redundancy for all ranges of the alphabet size relative to the sequence length. Second, we consider the pointwise minimax redundancy for a family of models in which some symbol probabilities are fixed. The latter problem leads to a binomial sum for functions with superpolynomial growth. Our findings can be used to approximate numerically the minimax pointwise redundancy for various ranges of the sequence length and the alphabet size. These results are obtained by analytic techniques such as treelike generating functions and the saddle point method.
Minimax Redundancy for Large Alphabets
"... Abstract—We study the minimax redundancy of universal coding for large alphabets over memoryless sources and present two main results: We first complete studies initiated in Orlitsky and Santhanam [12] deriving precise asymptotics of the minimax redundancy for all ranges of the alphabet sizes. Secon ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Abstract—We study the minimax redundancy of universal coding for large alphabets over memoryless sources and present two main results: We first complete studies initiated in Orlitsky and Santhanam [12] deriving precise asymptotics of the minimax redundancy for all ranges of the alphabet sizes. Second, we consider the minimax redundancy of a source model in which some symbol probabilities are fixed. The latter model leads to an interesting binomial sum asymptotics with superexponential growth functions. Our findings could be used to approximate numerically the minimax redundancy for various ranges of the sequence length and the alphabet size. These results are obtained by analytic techniques such as treelike generating functions and the saddle point method. I.
Patterns of i.i.d. Sequences and Their Entropy Part II: Bounds for Some Distributions ∗
, 711
"... A pattern of a sequence is a sequence of integer indices with each index describing the order of first occurrence of the respective symbol in the original sequence. In a recent paper, tight general bounds on the block entropy of patterns of sequences generated by independent and identically distribu ..."
Abstract
 Add to MetaCart
A pattern of a sequence is a sequence of integer indices with each index describing the order of first occurrence of the respective symbol in the original sequence. In a recent paper, tight general bounds on the block entropy of patterns of sequences generated by independent and identically distributed (i.i.d.) sources were derived. In this paper, precise approximations are provided for the pattern block entropies for patterns of sequences generated by i.i.d. uniform and monotonic distributions, including distributions over the integers, and the geometric distribution. Numerical bounds on the pattern block entropies of these distributions are provided even for very short blocks. Tight bounds are obtained even for distributions that have infinite i.i.d. entropy rates. The approximations are obtained using general bounds and their derivation techniques. Conditional index entropy is also studied for distributions over smaller alphabets. Index Terms: patterns, monotonic distributions, uniform distributions, entropy.
Universal Source Coding for Monotonic and Fast Decaying Monotonic Distributions ∗
, 704
"... We study universal compression of sequences generated by monotonic distributions. We show that for a monotonic distribution over an alphabet of size k, each probability parameter costs essentially 0.5 log(n/k 3) bits, where n is the coded sequence length, as long as k = o(n 1/3). Otherwise, for k = ..."
Abstract
 Add to MetaCart
We study universal compression of sequences generated by monotonic distributions. We show that for a monotonic distribution over an alphabet of size k, each probability parameter costs essentially 0.5 log(n/k 3) bits, where n is the coded sequence length, as long as k = o(n 1/3). Otherwise, for k = O(n), the total average sequence redundancy is O(n1/3+ε) bits overall. We then show that there exists a subclass of monotonic distributions over infinite alphabets for which redundancy of O(n1/3+ε) bits overall is still achievable. This class contains fast decaying distributions, including many distributions over the integers and geometric distributions. For some slower decays, including other distributions over the integers, redundancy of o(n) bits overall is achievable, where a method to compute specific redundancy rates for such distributions is derived. The results are specifically true for finite entropy monotonic distributions. Finally, we study individual sequence redundancy behavior assuming a sequence is governed by a monotonic distribution. We show that for sequences whose empirical distributions are monotonic, individual redundancy bounds similar to those in the average case can be obtained. However, even if the monotonicity in the empirical distribution is violated, diminishing per symbol individual sequence redundancies with respect to the monotonic maximum likelihood description length may still be achievable.
Tight Bounds on Profile Redundancy and Distinguishability
"... The minimax KLdivergence of any distribution from all distributions in a collection P has several practical implications. In compression, it is called redundancy and represents the least additional number of bits over the entropy needed to encode the output of any distribution in P. In online estim ..."
Abstract
 Add to MetaCart
The minimax KLdivergence of any distribution from all distributions in a collection P has several practical implications. In compression, it is called redundancy and represents the least additional number of bits over the entropy needed to encode the output of any distribution in P. In online estimation and learning, it is the lowest expected logloss regret when guessing a sequence of random values generated by a distribution in P. In hypothesis testing, it upper bounds the largest number of distinguishable distributions in P. Motivated by problems ranging from population estimation to text classification and speech recognition, several machinelearning and informationtheory researchers have recently considered labelinvariant observations and properties induced by i.i.d. distributions. A sufficient statistic for all these properties is the data’s profile, the multiset of the number of times each data element appears. Improving on a sequence of previous works, we show that the redundancy of the collection of distributions induced over profiles by lengthn i.i.d. sequences is between 0.3 · n 1/3 and n 1/3 log 2 n, in particular, establishing its exact growth power. 1