Results 1  10
of
10
Limit results on pattern entropy
 IEEE Trans. Inf. Theory
, 2006
"... We determine the entropy rate of patterns of certain random processes, bound the speed at which the persymbol pattern entropy converges to this rate, and show that patterns satisfy an asymptotic equipartition property. To derive some of these results we upper bound the probability that the n ′ th v ..."
Abstract

Cited by 19 (5 self)
 Add to MetaCart
(Show Context)
We determine the entropy rate of patterns of certain random processes, bound the speed at which the persymbol pattern entropy converges to this rate, and show that patterns satisfy an asymptotic equipartition property. To derive some of these results we upper bound the probability that the n ′ th variable in a random process differs from all preceding ones.
Universal Coding on Infinite Alphabets: Exponentially Decreasing Envelopes
, 2008
"... This paper deals with the problem of universal lossless coding on a countable infinite alphabet. It focuses on some classes of sources defined by an envelope condition on the marginal distribution, namely exponentially decreasing envelope classes with exponent α. The minimax redundancy of exponentia ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
This paper deals with the problem of universal lossless coding on a countable infinite alphabet. It focuses on some classes of sources defined by an envelope condition on the marginal distribution, namely exponentially decreasing envelope classes with exponent α. The minimax redundancy of exponentially decreasing envelope 1 classes is proved to be equivalent to 4α log e log² n. Then a coding strategy is proposed, with a Bayes redundancy equivalent to the maximin redundancy. At last, an adaptive algorithm is provided, whose redundancy is equivalent to the minimax redundancy.
Tight Bounds on Profile Redundancy and Distinguishability
"... The minimax KLdivergence of any distribution from all distributions in a collection P has several practical implications. In compression, it is called redundancy and represents the least additional number of bits over the entropy needed to encode the output of any distribution in P. In online estim ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
(Show Context)
The minimax KLdivergence of any distribution from all distributions in a collection P has several practical implications. In compression, it is called redundancy and represents the least additional number of bits over the entropy needed to encode the output of any distribution in P. In online estimation and learning, it is the lowest expected logloss regret when guessing a sequence of random values generated by a distribution in P. In hypothesis testing, it upper bounds the largest number of distinguishable distributions in P. Motivated by problems ranging from population estimation to text classification and speech recognition, several machinelearning and informationtheory researchers have recently considered labelinvariant observations and properties induced by i.i.d. distributions. A sufficient statistic for all these properties is the data’s profile, the multiset of the number of times each data element appears. Improving on a sequence of previous works, we show that the redundancy of the collection of distributions induced over profiles by lengthn i.i.d. sequences is between 0.3 · n 1/3 and n 1/3 log 2 n, in particular, establishing its exact growth power. 1
The maximum likelihood probability of uniquesingleton, ternary
 and length7 patterns,” Accepted at IEEE International Symposium on Information Theory
, 2009
"... AbstractWe derive several pattern maximum likelihood (PML) results, among them showing that if a pattern has only one symbol appearing once, its PML support size is at most twice the number of distinct symbols, and that if the pattern is ternary with at most one symbol appearing once, its PML suppo ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
(Show Context)
AbstractWe derive several pattern maximum likelihood (PML) results, among them showing that if a pattern has only one symbol appearing once, its PML support size is at most twice the number of distinct symbols, and that if the pattern is ternary with at most one symbol appearing once, its PML support size is three. We apply these results to extend the set of patterns whose PML distribution is known to all ternary patterns, and to all but one pattern of length up to seven. I.
Universal Source Coding for Monotonic and Fast Decaying Monotonic Distributions
, 2007
"... We study universal compression of sequences generated by monotonic distributions. We show that for a monotonic distribution over an alphabet of size k, each probability parameter costs essentially 0.5 log(n/k 3) bits, where n is the coded sequence length, as long as k = o(n 1/3). Otherwise, for k = ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
(Show Context)
We study universal compression of sequences generated by monotonic distributions. We show that for a monotonic distribution over an alphabet of size k, each probability parameter costs essentially 0.5 log(n/k 3) bits, where n is the coded sequence length, as long as k = o(n 1/3). Otherwise, for k = O(n), the total average sequence redundancy is O(n1/3+ε) bits overall. We then show that there exists a subclass of monotonic distributions over infinite alphabets for which redundancy of O(n1/3+ε) bits overall is still achievable. This class contains fast decaying distributions, including many distributions over the integers and geometric distributions. For some slower decays, including other distributions over the integers, redundancy of o(n) bits overall is achievable, where a method to compute specific redundancy rates for such distributions is derived. The results are specifically true for finite entropy monotonic distributions. Finally, we study individual sequence redundancy behavior assuming a sequence is governed by a monotonic distribution. We show that for sequences whose empirical distributions are monotonic, individual redundancy bounds similar to those in the average case can be obtained. However, even if the monotonicity in the empirical distribution is violated, diminishing per symbol individual sequence redundancies with respect to the monotonic maximum likelihood description length may still be achievable.
Tight bounds for universal compression of large alphabets
 In ISIT
, 2013
"... Abstract—Over the past decade, several papers, e.g., [1–7] and references therein, have considered universal compression of sources over large alphabets, often using patterns to avoid infinite redundancy. Improving on previous results, we prove tight bounds on expected and worstcase pattern redund ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
Abstract—Over the past decade, several papers, e.g., [1–7] and references therein, have considered universal compression of sources over large alphabets, often using patterns to avoid infinite redundancy. Improving on previous results, we prove tight bounds on expected and worstcase pattern redundancy, in particular closing a decadelong gap and showing that the worstcase pattern redundancy of i.i.d. distributions is Θ̃(n1/3)†. I.
Universal compression of Markov and related sources over arbitrary alphabets
 IEEE TRANSACTIONS ON INFORMATION THEORY
, 2006
"... Recent work has considered encoding a string by separately conveying its symbols and its pattern—the order in which the symbols appear. It was shown that the patterns of i.i.d. strings can be losslessly compressed with diminishing persymbol redundancy. In this paper the pattern redundancy of distri ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
Recent work has considered encoding a string by separately conveying its symbols and its pattern—the order in which the symbols appear. It was shown that the patterns of i.i.d. strings can be losslessly compressed with diminishing persymbol redundancy. In this paper the pattern redundancy of distributions with memory is considered. Close lower and upper bounds are established on the pattern redundancy of strings generated by Hidden Markov Models with a small number of states, showing in particular that their persymbol pattern redundancy diminishes with increasing string length. The upper bounds are obtained by analyzing the growth rate of the number of multidimensional integer partitions, and the lower bounds, using Hayman’s Theorem.
Recent results on pattern maximum likelihood
 in IEEE Information Theory Workshop
, 2009
"... Abstract—We derive some general sufficient conditions for the uniformity of the Pattern Maximum Likelihood distribution (PML). We also provide upper bounds on the support size of a class of patterns, and mention some recent results about the PML of 1112234. I. ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
Abstract—We derive some general sufficient conditions for the uniformity of the Pattern Maximum Likelihood distribution (PML). We also provide upper bounds on the support size of a class of patterns, and mention some recent results about the PML of 1112234. I.
Pattern Entropy Revisited
"... Abstract A pattern of a sequence is a sequence of integer indices with each index describing the order of rst occurrence of the respective symbol in the original sequence. Several recent works studied entropy and entropy rate of patterns. Specically, in a recent paper, tight general bounds on the b ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract A pattern of a sequence is a sequence of integer indices with each index describing the order of rst occurrence of the respective symbol in the original sequence. Several recent works studied entropy and entropy rate of patterns. Specically, in a recent paper, tight general bounds on the block entropy of patterns of sequences generated by independent and identically distributed (i.i.d.) sources were derived. In this paper, precise approximations are given to the pattern block entropies for patterns of sequences generated by i.i.d. uniform and monotonic distributions, including distributions over the integers, and the geometric distribution. Numerical nonasymptotic bounds on the pattern block entropies of these distributions are provided even for very short blocks, and even for distributions that have innite i.i.d. entropy rates. Conditional index entropy is also studied for distributions over smaller alphabets.
Patterns of i.i.d. Sequences and Their Entropy Part II: Bounds for Some Distributions ∗
, 711
"... A pattern of a sequence is a sequence of integer indices with each index describing the order of first occurrence of the respective symbol in the original sequence. In a recent paper, tight general bounds on the block entropy of patterns of sequences generated by independent and identically distribu ..."
Abstract
 Add to MetaCart
(Show Context)
A pattern of a sequence is a sequence of integer indices with each index describing the order of first occurrence of the respective symbol in the original sequence. In a recent paper, tight general bounds on the block entropy of patterns of sequences generated by independent and identically distributed (i.i.d.) sources were derived. In this paper, precise approximations are provided for the pattern block entropies for patterns of sequences generated by i.i.d. uniform and monotonic distributions, including distributions over the integers, and the geometric distribution. Numerical bounds on the pattern block entropies of these distributions are provided even for very short blocks. Tight bounds are obtained even for distributions that have infinite i.i.d. entropy rates. The approximations are obtained using general bounds and their derivation techniques. Conditional index entropy is also studied for distributions over smaller alphabets. Index Terms: patterns, monotonic distributions, uniform distributions, entropy.