Results 1  10
of
22
On universal types
 PROC. ISIT 2004
, 2004
"... We define the universal type class of a sequence x n, in analogy to the notion used in the classical method of types. Two sequences of the same length are said to be of the same universal (LZ) type if and only if they yield the same set of phrases in the incremental parsing of Ziv and Lempel (1978 ..."
Abstract

Cited by 25 (6 self)
 Add to MetaCart
(Show Context)
We define the universal type class of a sequence x n, in analogy to the notion used in the classical method of types. Two sequences of the same length are said to be of the same universal (LZ) type if and only if they yield the same set of phrases in the incremental parsing of Ziv and Lempel (1978). We show that the empirical probability distributions of any finite order of two sequences of the same universal type converge, in the variational sense, as the sequence length increases. Consequently, the normalized logarithms of the probabilities assigned by any kth order probability assignment to two sequences of the same universal type, as well as the kth order empirical entropies of the sequences, converge for all k. We study the size of a universal type class, and show that its asymptotic behavior parallels that of the conventional counterpart, with the LZ78 code length playing the role of the empirical entropy. We also estimate the number of universal types for sequences of length n, and show that it is of the form exp((1+o(1))γ n/log n) for a well characterized constant γ. We describe algorithms for enumerating the sequences in a universal type class, and for drawing a sequence from the class with uniform probability. As an application, we consider the problem of universal simulation of individual sequences. A sequence drawn with uniform probability from the universal type class of x n is an optimal simulation of x n in a well defined mathematical sense.
Precise asymptotic analysis of the Tunstall code
 Proc. 2006 International Symposium on Information Theory (Seattle
"... A variabletofixed length encoder partitions the source string over an mary alphabet A into a concatenation of variablelength phrases. Each phrase except the last one is constrained to belong to a given dictionary D of source strings; the last phrase is a nonnull prefix of a dictionary entry. On ..."
Abstract

Cited by 8 (4 self)
 Add to MetaCart
(Show Context)
A variabletofixed length encoder partitions the source string over an mary alphabet A into a concatenation of variablelength phrases. Each phrase except the last one is constrained to belong to a given dictionary D of source strings; the last phrase is a nonnull prefix of a dictionary entry. One common constraint on a dictionary is that it leads to a unique parsing of any string over A. We will assume that all dictionaries are uniquely parsable. It is convenient to represent a uniquely parsable dictionary by a complete parsing tree T, i.e., a tree in which every internal node has all m children nodes in the tree. The dictionary entries d ∈Dcorrespond to the leaves of parsing tree. The encoder represents each parsed string by the fixed length binary code word corresponding to its dictionary entry. If the dictionary D is has M entries, then the code word for each phrase has
Superior Guarantees for Sequential Prediction and Lossless Compression via Alphabet Decomposition
"... We present worst case bounds for the learning rate of a known prediction method that is based on hierarchical applications of binary context tree weighting (CTW) predictors. A heuristic application of this approach that relies on Huffman’s alphabet decomposition is known to achieve stateoftheart p ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
We present worst case bounds for the learning rate of a known prediction method that is based on hierarchical applications of binary context tree weighting (CTW) predictors. A heuristic application of this approach that relies on Huffman’s alphabet decomposition is known to achieve stateoftheart performance in prediction and lossless compression benchmarks. We show that our new bound for this heuristic is tighter than the best known performance guarantees for prediction and lossless compression algorithms in various settings. This result substantiates the efficiency of this hierarchical method and provides a compelling explanation for its practical success. In addition, we present the results of a few experiments that examine other possibilities for improving the multialphabet prediction performance of CTWbased algorithms.
Tunstall Code, Khodak Variations, and random Walks
, 2008
"... A variabletofixed length encoder partitions the source string into variablelength phrases that belong to a given and fixed dictionary. Tunstall, and independently Khodak, designed variabletofixed length codes for memoryless sources that are optimal under certain constraints. In this paper, we s ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
(Show Context)
A variabletofixed length encoder partitions the source string into variablelength phrases that belong to a given and fixed dictionary. Tunstall, and independently Khodak, designed variabletofixed length codes for memoryless sources that are optimal under certain constraints. In this paper, we study the Tunstall and Khodak codes using analytic information theory, i.e., the machinery from the analysis of algorithms literature. After proposing an algebraic characterization of the Tunstall and Khodak codes, we present new results on the variance and a central limit theorem for dictionary phrase lengths. This analysis also provides a new argument for obtaining asymptotic results about the mean dictionary phrase length and average redundancy rates.
Average redundancy for known sources: ubiquitous trees in source coding
 Proceedings, Fifth Colloquium on Mathematics and Computer Science (Blaubeuren, 2008), Discrete Math. Theor. Comput. Sci. Proc. AI
, 2008
"... Analytic information theory aims at studying problems of information theory using analytic techniques of computer science and combinatorics. Following Hadamard’s precept, these problems are tackled by complex analysis methods such as generating functions, Mellin transform, Fourier series, saddle poi ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
Analytic information theory aims at studying problems of information theory using analytic techniques of computer science and combinatorics. Following Hadamard’s precept, these problems are tackled by complex analysis methods such as generating functions, Mellin transform, Fourier series, saddle point method, analytic poissonization and depoissonization, and singularity analysis. This approach lies at the crossroad of computer science and information theory. In this survey we concentrate on one facet of information theory (i.e., source coding better known as data compression), namely the redundancy rate problem. The redundancy rate problem determines by how much the actual code length exceeds the optimal code length. We further restrict our interest to the average redundancy for known sources, that is, when statistics of information sources are known. We present precise analyses of three types of lossless data compression schemes, namely fixedtovariable (FV) length codes, variabletofixed (VF) length codes, and variabletovariable (VV) length codes. In particular, we investigate average redundancy of Huffman, Tunstall, and Khodak codes. These codes have succinct representations as trees, either as coding or parsing trees, and we analyze here some of their parameters (e.g., the average path from the root to a leaf).
Monte Carlo Estimation of Minimax Regret with an Application to MDL Model Selection
, 2008
"... Minimum description length (MDL) model selection, in its modern NML formulation, involves a model complexity term which is equivalent to minimax/maximin regret. When the data are discretevalued, the complexity term is a logarithm of a sum of maximized likelihoods over all possible datasets. Becaus ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
Minimum description length (MDL) model selection, in its modern NML formulation, involves a model complexity term which is equivalent to minimax/maximin regret. When the data are discretevalued, the complexity term is a logarithm of a sum of maximized likelihoods over all possible datasets. Because the sum has an exponential number of terms, its evaluation is in many cases intractable. In the continuous case, the sum is replaced by an integral for which a closed form is available in only a few cases. We present an approach based on Monte Carlo sampling, which works for all model classes, and gives strongly consistent estimators of the minimax regret. The estimates convergence almost surely to the correct value with increasing number of iterations. For the important class of Markov models, one of the presented estimators is particularly efficient: in empirical experiments, accuracy that is sufficient for model selection is usually achieved already on the first iteration, even for long sequences.
On the construction of (explicit) Khodak’s code and its analysis
 IEEE Trans. Inf. Theory
, 2008
"... Variabletovariable codes are very attractive yet not well understood data compression schemes. In 1972 Khodak claimed to provide upper and lower bounds for the achievable redundancy rate, however, he did not offer explicit construction of such codes. In this paper, we first present a constructive ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
Variabletovariable codes are very attractive yet not well understood data compression schemes. In 1972 Khodak claimed to provide upper and lower bounds for the achievable redundancy rate, however, he did not offer explicit construction of such codes. In this paper, we first present a constructive and transparent proof of Khodak’s result showing that for memoryless sources there exists a code with the average redundancy bounded by D −5/3, where D is the average delay (e.g., the average length of a dictionary entry). We also describe an algorithm that constructs a variabletovariable length code with a small redundancy rate for large D. Then, we discuss several generalizations. We prove that the worst case redundancy does not exceed D −4/3. Furthermore we provide similar upper bounds for Markov sources (of order 1). Finally, we consider bounds that are valid for almost all memoryless and Markov sources for which the set of exceptional source parameters has zero measure. In particular, for all memoryless sources outside this exceptional class, we prove there exists a variabletovariable code with the average redundancy rate bounded by D −4/3−m/3+ε and the worst case redundancy rate bounded
Benefiting from disorder: Source coding for unordered data. arXiv
, 708
"... The order of letters is not always relevant in a communication task. This paper discusses the implications of order irrelevance on source coding, presenting results in several major branches of source coding theory: lossless coding, universal lossless coding, ratedistortion, highrate quantization, ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
The order of letters is not always relevant in a communication task. This paper discusses the implications of order irrelevance on source coding, presenting results in several major branches of source coding theory: lossless coding, universal lossless coding, ratedistortion, highrate quantization, and universal lossy coding. The main conclusions demonstrate that there is a significant rate savings when order is irrelevant. In particular, lossless coding of n letters from a finite alphabet requires Θ(log n) bits and universal lossless coding requires n + o(n) bits for many countable alphabet sources. However, there are no universal schemes that can drive a strong redundancy measure to zero. Results for lossy coding include distributionfree expressions for the rate savings from order irrelevance in various highrate quantization schemes. Ratedistortion bounds are given, and it is shown that the analogue of the Shannon lower bound is loose at all finite rates.
On Recurrence Formulas for Computing the Stochastic Complexity
"... Stochastic complexity is a criterion that can be used for model selection and other statistical inference tasks. Many model families, like Bayesian networks, use multinomial variables as their basic components. There now exists new efficient computation methods, based on generating functions, for co ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
(Show Context)
Stochastic complexity is a criterion that can be used for model selection and other statistical inference tasks. Many model families, like Bayesian networks, use multinomial variables as their basic components. There now exists new efficient computation methods, based on generating functions, for computing the stochastic complexity in the multinomial case. However, the theoretical background behind these methods has not been been extensively formalized before. In this paper we define a bivariate generating function framework, which makes the problem setting more comprehensible. Utilizing this framework, we derive a new recurrence relation over the values of a multinomial variable, and show how to apply the recurrence for computing the stochastic complexity. Furthermore, we show that there cannot be a generic homogeneous linear recurrence over data size. We also suggest that the presented form of the marginal generating function, which is valid in the multinomial case, may also generalize to more complex cases. 1.
Counting Markov Types, Balanced Matrices, and Eulerian Graphs
, 2012
"... The method of types is one of the most popular techniques in information theory and combinatorics. Two sequences of equal length have the same type if they have identical empirical distributions. In this paper, we focus on Markov types, that is, sequences generated by a Markov source (of order one). ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
The method of types is one of the most popular techniques in information theory and combinatorics. Two sequences of equal length have the same type if they have identical empirical distributions. In this paper, we focus on Markov types, that is, sequences generated by a Markov source (of order one). We note that sequences having the same Markov type share the same so called balanced frequency matrix that counts the number of distinct pairs of symbols. We enumerate the number of Markov types for sequences of length n over an alphabet of size m. This turns out to be asymptotically equivalent to estimating the number of the balanced frequency matrices, the number of integer solutions of a system of linear Diophantine equations, and the number of connected Eulerian multigraphs. For fixed m we prove that the number of Markov types is asymptotically equal to d(m) nm2 −m (m 2 −m)!, where we give an integral representation for d(m). For m → ∞ we conclude that asymptotically the number of types is equivalent to