Results 1  10
of
13
Universal compression of memoryless sources over unknown alphabets
 IEEE TRANSACTIONS ON INFORMATION THEORY
, 2004
"... It has long been known that the compression redundancy of independent and identically distributed (i.i.d.) strings increases to infinity as the alphabet size grows. It is also apparent that any string can be described by separately conveying its symbols, and its pattern—the order in which the symbol ..."
Abstract

Cited by 32 (10 self)
 Add to MetaCart
It has long been known that the compression redundancy of independent and identically distributed (i.i.d.) strings increases to infinity as the alphabet size grows. It is also apparent that any string can be described by separately conveying its symbols, and its pattern—the order in which the symbols appear. Concentrating on the latter, we show that the patterns of i.i.d. strings over all, including infinite and even unknown, alphabets, can be compressed with diminishing redundancy, both in block and sequentially, and that the compression can be performed in linear time. To establish these results, we show that the number of patterns is the Bell number, that the number of patterns with a given number of symbols is the Stirling number of the second kind, and that the redundancy of patterns can be bounded using results of Hardy and Ramanujan on the number of integer partitions. The results also imply an asymptotically optimal solution for the GoodTuring probabilityestimation problem.
A lower bound on compression of unknown alphabets
 Theoret. Comput. Sci
, 2005
"... Many applications call for universal compression of strings over large, possibly infinite, alphabets. However, it has long been known that the resulting redundancy is infinite even for i.i.d. distributions. It was recently shown that the redudancy of the strings ’ patterns, which abstract the values ..."
Abstract

Cited by 10 (3 self)
 Add to MetaCart
Many applications call for universal compression of strings over large, possibly infinite, alphabets. However, it has long been known that the resulting redundancy is infinite even for i.i.d. distributions. It was recently shown that the redudancy of the strings ’ patterns, which abstract the values of the symbols, retaining only their relative precedence, is sublinear in the blocklength n, hence the persymbol redundancy diminishes to zero. In this paper we show that pattern redundancy is at least (1.5 log 2 e) n 1/3 bits. To do so, we construct a generating function whose coefficients lower bound the redundancy, and use Hayman’s saddlepoint approximation technique to determine the coefficients ’ asymptotic behavior. 1
Analytic Variations on Redundancy Rates of Renewal Processes
 IEEE Trans. Information Theory
, 2002
"... Csisz ar and Shields have recently proved that the minimax redundancy for a class of (stationary) renewal processes is ( n) where n is the block length. This interesting result provides a first nontrivial bound on redundancy for a nonparametric family of processes. The present paper gives a precis ..."
Abstract

Cited by 8 (5 self)
 Add to MetaCart
Csisz ar and Shields have recently proved that the minimax redundancy for a class of (stationary) renewal processes is ( n) where n is the block length. This interesting result provides a first nontrivial bound on redundancy for a nonparametric family of processes. The present paper gives a precise estimate of the redundancy rate for such (nonstationary) renewal sources, namely, 2 n +O(log n): This asymptotic expansion is derived by complexanalytic methods that include generating function representations, Mellin transforms, singularity analysis and saddle point estimates. This work places itself within the framework of analytic information theory.
Universal compression of Markov and related sources over arbitrary alphabets
 IEEE TRANSACTIONS ON INFORMATION THEORY
, 2006
"... Recent work has considered encoding a string by separately conveying its symbols and its pattern—the order in which the symbols appear. It was shown that the patterns of i.i.d. strings can be losslessly compressed with diminishing persymbol redundancy. In this paper the pattern redundancy of distri ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
Recent work has considered encoding a string by separately conveying its symbols and its pattern—the order in which the symbols appear. It was shown that the patterns of i.i.d. strings can be losslessly compressed with diminishing persymbol redundancy. In this paper the pattern redundancy of distributions with memory is considered. Close lower and upper bounds are established on the pattern redundancy of strings generated by Hidden Markov Models with a small number of states, showing in particular that their persymbol pattern redundancy diminishes with increasing string length. The upper bounds are obtained by analyzing the growth rate of the number of multidimensional integer partitions, and the lower bounds, using Hayman’s Theorem.
Uniform asymptotics of some Abel sums arising in coding theory
"... We derive uniform asymptotic expressions of some Abel sums appearing in some problems in coding theory and indicate the usefulness of these sums in other fields, like empirical processes, machine maintenance, analysis of algorithms, probabilistic number theory, queuing models, etc. Key words: Abel s ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
We derive uniform asymptotic expressions of some Abel sums appearing in some problems in coding theory and indicate the usefulness of these sums in other fields, like empirical processes, machine maintenance, analysis of algorithms, probabilistic number theory, queuing models, etc. Key words: Abel sums, coding theory, Mellin transforms, Wfunction, uniform asymptotics. 1
A Universal Compression Perspective of Smoothing
"... We analyze smoothing algorithms from a universalcompression perspective. Instead of evaluating their performance on an empirical sample, we analyze their performance on the most inconvenient sample possible. Consequently the performance of the algorithm can be guaranteed even on unseen data. We sho ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
We analyze smoothing algorithms from a universalcompression perspective. Instead of evaluating their performance on an empirical sample, we analyze their performance on the most inconvenient sample possible. Consequently the performance of the algorithm can be guaranteed even on unseen data. We show that universal compression bounds can explain the empirical performance of several smoothing methods. We also describe a new interpolated additive smoothing algorithm, and show that it has lower training complexity and better compression performance than existing smoothing techniques. Key words: Language modeling, universal compression, smoothing 1
Minimax Pointwise Redundancy for Memoryless Models over Large Alphabets ∗
"... Abstract—We study the minimax pointwise redundancy of universal coding for memoryless models over large alphabets and present two main results: We first complete studies initiated in Orlitsky and Santhanam [15] deriving precise asymptotics of the minimax pointwise redundancy for all ranges of the al ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Abstract—We study the minimax pointwise redundancy of universal coding for memoryless models over large alphabets and present two main results: We first complete studies initiated in Orlitsky and Santhanam [15] deriving precise asymptotics of the minimax pointwise redundancy for all ranges of the alphabet size relative to the sequence length. Second, we consider the pointwise minimax redundancy for a family of models in which some symbol probabilities are fixed. The latter problem leads to a binomial sum for functions with superpolynomial growth. Our findings can be used to approximate numerically the minimax pointwise redundancy for various ranges of the sequence length and the alphabet size. These results are obtained by analytic techniques such as treelike generating functions and the saddle point method. I.
Stability Considerations For Networks
"... this report, we present a sample of our stability results. We concentrate on four papers: Namely, (i) Georgiadis and Szpankowski [3] who analyzed a class of token passing rings; (ii) Georgiadis, Szpankowski and Tssiulas [4] who established the maximal stability region for a ring network with spatial ..."
Abstract
 Add to MetaCart
this report, we present a sample of our stability results. We concentrate on four papers: Namely, (i) Georgiadis and Szpankowski [3] who analyzed a class of token passing rings; (ii) Georgiadis, Szpankowski and Tssiulas [4] who established the maximal stability region for a ring network with spatial reuse and adaptive scheduling policy; (iii) Szpankowski [18] who gave necessary and sufficient conditions for stability of ALOHA system; (iv) finally our recent paper [5] where we analyzed ring networks with spatial reuse and with quota (this paper presents also a generalization of our stability methodology to a large class of queueing systems satisfying a weak monotonicity property).
Markov Types Again Revisited
, 2009
"... The method of types is one of the most popular techniques in information theory and combinatorics. Two sequences of equal length have the same type if they have identical empirical distributions. In this paper, we focus on Markov types, that is, sequences generated by a Markov source (of order one). ..."
Abstract
 Add to MetaCart
The method of types is one of the most popular techniques in information theory and combinatorics. Two sequences of equal length have the same type if they have identical empirical distributions. In this paper, we focus on Markov types, that is, sequences generated by a Markov source (of order one). We note that sequences having the same Markov type share the same so called balanced frequency matrix that counts the number of distinct pairs of symbols. We enumerate the number of Markov types for sequences of length n over an alphabet of size m. This turns out to coincide with the number of the balanced frequency matrices as well as with the number of special linear diophantine equations, and also balanced directed multigraphs. For fixed m we prove that the number of Markov types is asymptotically equal to nm2 −m d(m) (m2 − m)!, where d(m) is a constant for which we give an integral representation. For m →∞we conclude that asymptotically the number of types is equivalent to 3m/2 m 2m e 2 m 2m2 2 m π m/2 nm2 −m provided that m = o(n 1/4) (however, our techniques work for m = o ( √ n)). We also extend these results to r order Markov sources. These findings are derived by analytical techniques ranging from multidimensional generating functions to the saddle point method. Index Terms – Markov types, integer matrices, linear diophantine equations, multidimensional generating functions, saddle point method.
Research Article Inference of Gene Regulatory Networks Based on a Universal Minimum Description Length
"... The Boolean network paradigm is a simple and effective way to interpret genomic systems, but discovering the structure of these networks remains a difficult task. The minimum description length (MDL) principle has already been used for inferring genetic regulatory networks from timeseries expressio ..."
Abstract
 Add to MetaCart
The Boolean network paradigm is a simple and effective way to interpret genomic systems, but discovering the structure of these networks remains a difficult task. The minimum description length (MDL) principle has already been used for inferring genetic regulatory networks from timeseries expression data and has proven useful for recovering the directed connections in Boolean networks. However, the existing method uses an ad hoc measure of description length that necessitates a tuning parameter for artificially balancing the model and error costs and, as a result, directly conflicts with the MDL principle’s implied universality. In order to surpass this difficulty, we propose a novel MDLbased method in which the description length is a theoretical measure derived from a universal normalized maximum likelihood model. The search space is reduced by applying an implementable analogue of Kolmogorov’s structure function. The performance of the proposed method is demonstrated on random synthetic networks, for which it is shown to improve upon previously published network inference algorithms with respect to both speed and accuracy. Finally, it is applied to timeseries Drosophila gene expression measurements. Copyright © 2008 John Dougherty et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. 1.