Results 1 - 10
of
15
Precise Asymptotic Analysis of the Tunstall Code
"... Abstract — We study the Tunstall code using the machinery from the analysis of algorithms literature. In particular, we propose an algebraic characterization of the Tunstall code which, together with tools like the Mellin transform and the Tauberian theorems, leads to new results on the variance and ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Abstract — We study the Tunstall code using the machinery from the analysis of algorithms literature. In particular, we propose an algebraic characterization of the Tunstall code which, together with tools like the Mellin transform and the Tauberian theorems, leads to new results on the variance and a central limit theorem for dictionary phrase lengths. This analysis also provides a new argument for obtaining asymptotic results about the mean dictionary phrase length and average redundancy rates. I.
A general framework for codes involving redundancy minimization
- IEEE Trans. Inf. Theory
, 2006
"... Abstract — A framework with two scalar parameters is introduced for various problems of finding a prefix code minimizing a coding penalty function. The framework encompasses problems previously proposed by Huffman, Campbell, Nath, and Drmota and Szpankowski, shedding light on the relationships among ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Abstract — A framework with two scalar parameters is introduced for various problems of finding a prefix code minimizing a coding penalty function. The framework encompasses problems previously proposed by Huffman, Campbell, Nath, and Drmota and Szpankowski, shedding light on the relationships among these problems. In particular, Nath’s range of problems can be seen as bridging the minimum average redundancy problem of Huffman with the minimum maximum pointwise redundancy problem of Drmota and Szpankowski. Using this framework, two linear-time Huffman-like algorithms are devised for the minimum maximum pointwise redundancy problem, the only one in the framework not previously solved with a Huffman-like algorithm. Both algorithms provide solutions common to this problem and a subrange of Nath’s problems, the second algorithm being distinguished by its ability to find the minimum variance solution among all solutions common to the minimum maximum pointwise redundancy and Nath problems. Simple redundancy bounds are also presented. Index Terms — Huffman algorithm, minimax redundancy, optimal prefix code, Rényi entropy, unification.
Tunstall Code, Khodak Variations, and random Walks
, 2008
"... A variable-to-fixed length encoder partitions the source string into variable-length phrases that belong to a given and fixed dictionary. Tunstall, and independently Khodak, designed variable-to-fixed length codes for memoryless sources that are optimal under certain constraints. In this paper, we s ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
A variable-to-fixed length encoder partitions the source string into variable-length phrases that belong to a given and fixed dictionary. Tunstall, and independently Khodak, designed variable-to-fixed length codes for memoryless sources that are optimal under certain constraints. In this paper, we study the Tunstall and Khodak codes using analytic information theory, i.e., the machinery from the analysis of algorithms literature. After proposing an algebraic characterization of the Tunstall and Khodak codes, we present new results on the variance and a central limit theorem for dictionary phrase lengths. This analysis also provides a new argument for obtaining asymptotic results about the mean dictionary phrase length and average redundancy rates.
A One-to-One Code and its Anti-redundancy
- IEEE Trans. Information Theory
"... One-to-one codes are “one shot ” codes that assign a distinct codeword to source symbols and are not necessarily prefix codes (more generally, uniquely decodable). For example, such codes arise when there exists an “end of message ” channel symbol. Interestingly, as Wyner proved in 1972, for such co ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
One-to-one codes are “one shot ” codes that assign a distinct codeword to source symbols and are not necessarily prefix codes (more generally, uniquely decodable). For example, such codes arise when there exists an “end of message ” channel symbol. Interestingly, as Wyner proved in 1972, for such codes the average code length can be smaller than the source entropy. By how much? We call this difference the anti-redundancy. Various authors over the years have shown that the anti-redundancy can be as big as minus the logarithm of the source entropy. However, to the best of our knowledge precise estimates do not exist. In this note, we consider a block code of length n generated by a binary memoryless source, and prove that the average anti-redundancy is − 1 2 log2 n + C + H(n)+o(1) where C is a constant and either H(n) =0iflog2(1 − p)/p is irrational (where p is the probability of generating a “0”) or otherwise H(n) is a fluctuating function as the code length increases. This relatively simple finding requires a combination of quite sophisticated analytic tools such as precise evaluation of Bernoulli sums, the saddle point method, and theory of distribution of sequences modulo 1.
Average Redundancy for Known Sources: Ubiquitous Trees in Source Coding
, 2008
"... Analytic information theory aims at studying problems of information theory using analytic techniques of computer science and combinatorics. Following Hadamard’s precept, these problems are tackled by complex analysis methods such as generating functions, Mellin transform, Fourier series, saddle poi ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Analytic information theory aims at studying problems of information theory using analytic techniques of computer science and combinatorics. Following Hadamard’s precept, these problems are tackled by complex analysis methods such as generating functions, Mellin transform, Fourier series, saddle point method, analytic poissonization and depoissonization, and singularity analysis. This approach lies at the crossroad of computer science and information theory. In this survey we concentrate on one facet of information theory (i.e., source coding better known as data compression), namely the redundancy rate problem. The redundancy rate problem determines by how much the actual code length exceeds the optimal code length. We further restrict our interest to the average redundancy for known sources, that is, when statistics of information sources are known. We present precise analyses of three types of lossless data compression schemes, namely fixed-to-variable (FV) length codes, variable-to-fixed (VF) length codes, and variable-to-variable (VV) length codes. In particular, we investigate average redundancy of Huffman, Tunstall, and Khodak codes. These codes have succinct representations as trees, either as coding or parsing trees, and we analyze here some of their parameters (e.g., the average path from the root to a leaf).
Tight bounds on minimum maximum pointwise redundancy
- In Proceedings of the International Symposium on Information Theory
, 1944
"... Abstract — This paper presents new lower and upper bounds for the optimal compression of binary prefix codes in terms of the most probable input symbol, where compression efficiency is determined by the nonlinear codeword length objective of minimizing maximum pointwise redundancy. This objective re ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Abstract — This paper presents new lower and upper bounds for the optimal compression of binary prefix codes in terms of the most probable input symbol, where compression efficiency is determined by the nonlinear codeword length objective of minimizing maximum pointwise redundancy. This objective relates to both universal modeling and Shannon coding, and these bounds are tight throughout the interval. The upper bounds also apply to a related objective, that of d th exponential redundancy. I.
The minimum average code for finite memoryless monotone sources
- in Proc., IEEE Information Theory Workshop
, 2002
"... Abstract—The problem of selecting a code for finite monotone sources with x symbols is considered. The selection criterion is based on minimizing the average redundancy (called Minave criterion) instead of its maximum (i.e., Minimax criterion). The average probability distribution € x, whose associa ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract—The problem of selecting a code for finite monotone sources with x symbols is considered. The selection criterion is based on minimizing the average redundancy (called Minave criterion) instead of its maximum (i.e., Minimax criterion). The average probability distribution € x, whose associated Huffman code has the minimum average redundancy, is derived. The entropy of the average distribution (i.e.,
On Universal Coding of Unordered Data
"... Abstract — There are several applications in information transfer and storage where the order of source letters is irrelevant at the destination. For these source-destination pairs, multiset communication rather than the more difficult task of sequence communication may be performed. In this work, w ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract — There are several applications in information transfer and storage where the order of source letters is irrelevant at the destination. For these source-destination pairs, multiset communication rather than the more difficult task of sequence communication may be performed. In this work, we study universal multiset communication. For classes of countable-alphabet sources that meet Kieffer’s condition for sequence communication, we present a scheme that universally achieves a rate of n + o(n) bits per multiset letter for multiset communication. We also define redundancy measures that are normalized by the logarithm of the multiset size rather than per multiset letter and show that these redundancy measures cannot be driven to zero for the class of finite-alphabet memoryless multisets. This further implies that finite-alphabet memoryless multisets cannot be encoded universally with vanishing fractional redundancy. I.
A New Algorithm for Building Alphabetic Minimax Trees
, 2008
"... We show how to build an alphabetic minimax tree for a sequence W = w1,...,wn of real weights in O(nd log log n) time, where d is the number of distinct integers ⌈wi⌉. We apply this algorithm to building an alphabetic prefix code given a sample. ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
We show how to build an alphabetic minimax tree for a sequence W = w1,...,wn of real weights in O(nd log log n) time, where d is the number of distinct integers ⌈wi⌉. We apply this algorithm to building an alphabetic prefix code given a sample.
Minimum Expected Length of Fixed-to-Variable Lossless Compression of Memoryless Sources
"... Abstract—Conventional wisdom states that the minimum expected length for fixed-to-variable length encoding of an n-block memoryless source with entropy H grows as nH+O(1). However, this performance is obtained under the constraint that the code assigned to the whole n-block is a prefix code. Droppin ..."
Abstract
- Add to MetaCart
Abstract—Conventional wisdom states that the minimum expected length for fixed-to-variable length encoding of an n-block memoryless source with entropy H grows as nH+O(1). However, this performance is obtained under the constraint that the code assigned to the whole n-block is a prefix code. Dropping this unnecessary constraint we show that the minimum expected length grows as nH − 1 log n + O(1) 2 unless the source is equiprobable. I.

