Results 1  10
of
16
A Suboptimal Lossy Data Compression Based On Approximate Pattern Matching
 IEEE Trans. Information Theory
, 1996
"... A practical suboptimal (variable source coding) algorithm for lossy data compression is presented. This scheme is based on approximate string matching, and it naturally extends the lossless LempelZiv data compression scheme. Among others we consider the typical length of approximately repeated patt ..."
Abstract

Cited by 34 (9 self)
 Add to MetaCart
A practical suboptimal (variable source coding) algorithm for lossy data compression is presented. This scheme is based on approximate string matching, and it naturally extends the lossless LempelZiv data compression scheme. Among others we consider the typical length of approximately repeated pattern within the first n positions of a stationary mixing sequence where D% of mismatches is allowed. We prove that there exists a constant r 0 (D) such that the length of such an approximately repeated pattern converges in probability to 1=r 0 (D) log n (pr.) but it almost surely oscillates between 1=r \Gamma1 (D) log n and 2=r 1 (D) log n, where r \Gamma1 (D) ? r 0 (D) ? r 1 (D)=2 are some constants. These constants are natural generalizations of R'enyi entropies to the lossy environment. More importantly, we show that the compression ratio of a lossy data compression scheme based on such an approximate pattern matching is asymptotically equal to r 0 (D). We also establish the asymptotic be...
Precise Minimax Redundancy and Regret
 IEEE TRANS. INFORMATION THEORY
, 2004
"... Recent years have seen a resurgence of interest in redundancy of lossless coding. The redundancy (regret) of universal xed{to{variable length coding for a class of sources determines by how much the actual code length exceeds the optimal (ideal over the class) code length. In a minimax scenario ..."
Abstract

Cited by 33 (13 self)
 Add to MetaCart
Recent years have seen a resurgence of interest in redundancy of lossless coding. The redundancy (regret) of universal xed{to{variable length coding for a class of sources determines by how much the actual code length exceeds the optimal (ideal over the class) code length. In a minimax scenario one nds the best code for the worst source either in the worst case (called also maximal minimax) or on average. We rst study the worst case minimax redundancy over a class of stationary ergodic sources and replace Shtarkov's bound by an exact formula. Among others, we prove that a generalized Shannon code minimizes the worst case redundancy, derive asymptotically its redundancy, and establish some general properties. This allows us to obtain precise redundancy rates for memoryless, Markov and renewal sources. For example, we derive the exact constant of the redundancy rate for memoryless and Markov sources by showing that an integer nature of coding contributes log(log m=(m 1))= log m+ o(1) where m is the size of the alphabet. Then we deal with the average minimax redundancy and regret. Our approach
Entropy Computations Via Analytic Depoissonization
 IEEE Trans. Information Theory
, 1998
"... We investigate the basic question of information theory, namely, evaluation of Shannon entropy, and a more general Rényi entropy, for some discrete distributions (e.g., binomial, negative binomial, etc.). We aim at establishing analytic methods (i.e., those in which complex analysis plays a pivotal ..."
Abstract

Cited by 29 (12 self)
 Add to MetaCart
We investigate the basic question of information theory, namely, evaluation of Shannon entropy, and a more general Rényi entropy, for some discrete distributions (e.g., binomial, negative binomial, etc.). We aim at establishing analytic methods (i.e., those in which complex analysis plays a pivotal role) for such computations which often yield estimates of unparalleled precision. The main analytic tool used here is that of analytic poissonization and depoissonization. We illustrate our approach on the entropy evaluation of the binomial distribution, that is, we prove that for Binomial(n; p) distribution the entropy h n becomes h n i 1 2 ln n+ 1 2 +ln p 2ßp(1 \Gamma p)+ P k1 a k n \Gammak where a k are explicitly computable constants. Moreover, we shall argue that analytic methods (e.g., complex asymptotics such as Rice's method and singularity analysis, Mellin transforms, poissonization and depoissonization) can offer new tools for information theory, especially for studying ...
Average Profile Of The LempelZiv Parsing Scheme For A Markovian Source
 Algorithmica
, 1998
"... For a Markovian source, we analyze the LempelZiv parsing scheme that partitions sequences into phrases such that a new phrase is the shortest phrase not seen in the past. We consider three models: In the Markov Independent model, several sequences are generated independently by Markovian sources, ..."
Abstract

Cited by 17 (11 self)
 Add to MetaCart
For a Markovian source, we analyze the LempelZiv parsing scheme that partitions sequences into phrases such that a new phrase is the shortest phrase not seen in the past. We consider three models: In the Markov Independent model, several sequences are generated independently by Markovian sources, and the ith phrase is the shortest prefix of the ith sequence that was not seen before as a phrase (i.e., a prefix of previous (i \Gamma 1) sequences). In the other two models, only a single sequence is generated by a Markovian source. In the second model, for which we coin the name GilbertKadota model, a fixed number of phrases is generated according to the LempelZiv algorithm, thus producing a sequence of a variable (random) length. In the last model, known also as the LempelZiv model, a string of fixed length is partitioned into a variable (random) number of phrases. These three models can be efficiently represented and analyzed by digital search trees that are of interest to other al...
On Asymptotics Of Certain Recurrences Arising In Universal Coding
 Problems of Information Transmission
, 1997
"... Ramanujan's Qfunction and the so called "tree function" T (z) defined implicitly by the equation T (z) = ze T (z) found applications in hashing, the birthday paradox problem, random mappings, caching, memory conflicts, and so forth. Recently, several novel applications of these functions to infor ..."
Abstract

Cited by 14 (4 self)
 Add to MetaCart
Ramanujan's Qfunction and the so called "tree function" T (z) defined implicitly by the equation T (z) = ze T (z) found applications in hashing, the birthday paradox problem, random mappings, caching, memory conflicts, and so forth. Recently, several novel applications of these functions to information theory problems such as linear coding and universal portfolios were brought to light. In this paper, we study them in the context of another information theory problem, namely: universal coding which was recently investigated by Shtarkov et al. [Prob. Inf. Trans., 31, 1995]. We provide asymptotic expansions of certain recurrences studied there which describe the optimal redundancy of universal codes. Our methodology falls under the so called analytical information theory that was recently applied successfully to a variety of information theory problems. Key Words: Source coding, multialphabet universal coding, redundancy, minimum description length, analytical information theory, si...
Average Profile Of The Generalized Digital Search Tree And The Generalized LempelZiv Algorithm
, 1997
"... The goal of this research is threefold: (i) to analyze generalized digital search trees, (ii) to derive the average profile (i.e., phrase length) of a generalization of the well known parsing algorithm due to Lempel and Ziv, and (iii) to provide analytical tools to analyze asymptotically certain par ..."
Abstract

Cited by 13 (7 self)
 Add to MetaCart
The goal of this research is threefold: (i) to analyze generalized digital search trees, (ii) to derive the average profile (i.e., phrase length) of a generalization of the well known parsing algorithm due to Lempel and Ziv, and (iii) to provide analytical tools to analyze asymptotically certain partial differential functional equations often arising in the analysis of digital trees. In the generalized LempelZiv parsing scheme, one partitions a sequence of symbols from a finite alphabet into phrases such that the new phrase is the shortest substring seen in the past by at most b \Gamma 1 phrases (b = 1 corresponds to the original LempelZiv scheme). Such a scheme can be analyzed through a generalized digital search tree in which every node is capable of storing up to b strings. In this paper, we investigate the depth of a randomly selected node in such a tree and the length of a randomly selected phrase in the generalized LempelZiv scheme. These findings and some recent results al...
Precise Average Redundancy of an Idealized Arithmetic Coding
 Coding, Proc. Data Compression Conference, 222231, Snowbird
, 2002
"... Redundancy is defined as the excess of the code length over the optimal (ideal) code length. We study the average redundancy of an idealized arithmetic coding (for memoryless sources with unknown distributions) in which the Krichevsky and Tro mov estimator is followed by the ShannonFano code. We sh ..."
Abstract

Cited by 10 (6 self)
 Add to MetaCart
Redundancy is defined as the excess of the code length over the optimal (ideal) code length. We study the average redundancy of an idealized arithmetic coding (for memoryless sources with unknown distributions) in which the Krichevsky and Tro mov estimator is followed by the ShannonFano code. We shall ignore here important practical implementation issues such as finite precisions and finite buffer sizes. In fact, our idealized arithmetic code can be viewed as an adaptive infinite precision implementation of arithmetic encoder that resembles Elias coding. However, we provide very precise results for the average redundancy that takes into account integerlength constraints. These findings are obtained by analytic methods of analysis of algorithms such as theory of distribution of sequences modulo 1 and Fourier series. These estimates can be used to study the average redundancy of codes for tree sources, and ultimately the contexttree weighting algorithms.
Analytic Variations on Redundancy Rates of Renewal Processes
 IEEE Trans. Information Theory
, 2002
"... Csisz ar and Shields have recently proved that the minimax redundancy for a class of (stationary) renewal processes is ( n) where n is the block length. This interesting result provides a first nontrivial bound on redundancy for a nonparametric family of processes. The present paper gives a precis ..."
Abstract

Cited by 8 (5 self)
 Add to MetaCart
Csisz ar and Shields have recently proved that the minimax redundancy for a class of (stationary) renewal processes is ( n) where n is the block length. This interesting result provides a first nontrivial bound on redundancy for a nonparametric family of processes. The present paper gives a precise estimate of the redundancy rate for such (nonstationary) renewal sources, namely, 2 n +O(log n): This asymptotic expansion is derived by complexanalytic methods that include generating function representations, Mellin transforms, singularity analysis and saddle point estimates. This work places itself within the framework of analytic information theory.
On the Average Redundancy Rate of the LempelZiv Code with the KError Protocol
 IEEE Trans. Inform. Theory
, 2001
"... In this paper we examine the average redundancy rate of a LempelZiv'78 code with the kerror protocol. Storer and Reif have studied this modification of the LempelZiv scheme and showed that it provides an efficient protection against error propagation while preserving the asymptotic optimality of ..."
Abstract

Cited by 6 (5 self)
 Add to MetaCart
In this paper we examine the average redundancy rate of a LempelZiv'78 code with the kerror protocol. Storer and Reif have studied this modification of the LempelZiv scheme and showed that it provides an efficient protection against error propagation while preserving the asymptotic optimality of the code. We refine this result by providing an asymptotic expression for the average redundancy rate of this code for memoryless sources. Using analytic methods, we establish our result by exploiting a relationship between a parsing scheme of the LempelZiv encoder with the kerror protocol and a generalization of the digital search tree structure. We accompany our analysis with a number of experiments that test the validity of our theoretical result and demonstrate the effects of various additional modifications of the LempelZiv algorithm.