Results 1 - 10
of
25
Compression of Individual Sequences via Variable-Rate Coding
, 1978
"... this paper contains two parts: descrip five part (Section II) where all the results are stated 532 XEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 1T-24, NO. 5, SEPTEMF..R 1978 and discussed and a formal part (Section III) where all proofs except that of Theorem 2 are given. The proof of Theorem 2, w ..."
Abstract
-
Cited by 612 (6 self)
- Add to MetaCart
this paper contains two parts: descrip five part (Section II) where all the results are stated 532 XEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 1T-24, NO. 5, SEPTEMF..R 1978 and discussed and a formal part (Section III) where all proofs except that of Theorem 2 are given. The proof of Theorem 2, which is constructive and thus informative, is presented in the mainstream of Section II
Effective strong dimension in algorithmic information and computational complexity
- SIAM Journal on Computing
, 2004
"... The two most important notions of fractal dimension are Hausdorff dimension, developed by Hausdorff (1919), and packing dimension, developed independently by Tricot (1982) and Sullivan (1984). Both dimensions have the mathematical advantage of being defined from measures, and both have yielded exten ..."
Abstract
-
Cited by 67 (27 self)
- Add to MetaCart
The two most important notions of fractal dimension are Hausdorff dimension, developed by Hausdorff (1919), and packing dimension, developed independently by Tricot (1982) and Sullivan (1984). Both dimensions have the mathematical advantage of being defined from measures, and both have yielded extensive applications in fractal geometry and dynamical systems. Lutz (2000) has recently proven a simple characterization of Hausdorff dimension in terms of gales, which are betting strategies that generalize martingales. Imposing various computability and complexity constraints on these gales produces a spectrum of effective versions of Hausdorff dimension, including constructive, computable, polynomial-space, polynomial-time, and finite-state dimensions. Work by several investigators has already used these effective dimensions to shed significant new light on a variety of topics in theoretical computer science. In this paper we show that packing dimension can also be characterized in terms of gales. Moreover, even though the usual definition of packing dimension is considerably more complex than that of Hausdorff dimension, our gale characterization of packing dimension is an exact dual
Asymptotic Behavior of the Lempel-Ziv Parsing Scheme and Digital Search Trees
- Theoretical Computer Science
, 1995
"... The Lempel-Ziv parsing scheme finds a wide range of applications, most notably in data compression and algorithms on words. It partitions a sequence of length n into variable phrases such that a new phrase is the shortest substring not seen in the past as a phrase. The parameter of interest is the n ..."
Abstract
-
Cited by 56 (28 self)
- Add to MetaCart
The Lempel-Ziv parsing scheme finds a wide range of applications, most notably in data compression and algorithms on words. It partitions a sequence of length n into variable phrases such that a new phrase is the shortest substring not seen in the past as a phrase. The parameter of interest is the number M n of phrases that one can construct from a sequence of length n. In this paper, for the memoryless source with unequal probabilities of symbols generation we derive the limiting distribution of M n which turns out to be normal. This proves a long standing open problem. In fact, to obtain this result we solved another open problem, namely, that of establishing the limiting distribution of the internal path length in a digital search tree. The latter is a consequence of an asymptotic solution of a multiplicative differential-functional equation often arising in the analysis of algorithms on words. Interestingly enough, our findings are proved by a combination of probabilistic techniques such as renewal equation and uniform integrability, and analytical techniques such as Mellin transform, differential-functional equations, de-Poissonization, and so forth. In concluding remarks we indicate a possibility of extending our results to Markovian models.
Interval and Recency Rank Source Coding: Two On-Line Adaptive Variable-Length Schemes
, 1987
"... In the schemes presented the encoder maps each message into a codeword in a prefix-free codeword set. In interval encoding the codeword is indexed by the interval since the last previous occurrence of that message, and the codeword set must be countably infinite. In recency rank encoding the codewor ..."
Abstract
-
Cited by 31 (0 self)
- Add to MetaCart
In the schemes presented the encoder maps each message into a codeword in a prefix-free codeword set. In interval encoding the codeword is indexed by the interval since the last previous occurrence of that message, and the codeword set must be countably infinite. In recency rank encoding the codeword is indexed by the number of distinct messages in that interval, and there must be no fewer codewords than messages. The decoder decodes each codeword on receipt. Users need not know message probabilities, but must agree on indexings, of the codeword set in an order of increasing length and of the message set in some arbitrary order. The average codeword length over a communications bout is never much larger than the value for an off-line scheme which maps the jth most frequent message in the bout into the jth shortest codeword in the given set, and is never too much larger than the value for off-line Huffman encoding of messages into the best codeword set for the bout message frequencies.
A diffusion limit for a class of randomly-growing binary trees
- Probability Theory and Related Fields
, 1988
"... Summary. Binary trees are grown by adding one node at a time, an available node at height i being added with probability proportional to c-~, c> 1. We establish both a "strong law of large numbers " and a "central limit theorem " for the vector X(t) = (Xi(t)), where X~(t) is the proportion of nodes ..."
Abstract
-
Cited by 19 (1 self)
- Add to MetaCart
Summary. Binary trees are grown by adding one node at a time, an available node at height i being added with probability proportional to c-~, c> 1. We establish both a "strong law of large numbers " and a "central limit theorem " for the vector X(t) = (Xi(t)), where X~(t) is the proportion of nodes of height i that are available at time t. We show, in fact, that there is a deterministic process x~(t) such that [Xi(t)-xi(t) [ converges to 0 a.s., and such that if c> 2 ~-, z~(t) = 2 "/2 {x. +1 (t c")- x.+,(t~")}, and Z " (t) = (Z'~(t)), then Z"(t) converges weakly to a Gaussian diffusion Z (t). The results are applied to establish asymptotic normality in the unbiased coin-tossing case for an entropy estimation procedure due to J. Ziv, and to obtain results on the growth of the maximum height of the tree. 1.
An Implementable Lossy Version of the Lempel-Ziv Algorithm - Part I: Optimality. . . Optimality for Memoryless Sources
, 1998
"... A new lossy variant of the Fixed-Database Lempel-Ziv coding algorithm for encoding at a fixed distortion level is proposed, and its asymptotic optimality and universality for memoryless sources (with respect to bounded single-letter distortion measures) is demonstrated: As the database size m increa ..."
Abstract
-
Cited by 19 (7 self)
- Add to MetaCart
A new lossy variant of the Fixed-Database Lempel-Ziv coding algorithm for encoding at a fixed distortion level is proposed, and its asymptotic optimality and universality for memoryless sources (with respect to bounded single-letter distortion measures) is demonstrated: As the database size m increases to infinity, the expected compression ratio approaches the rate-distortion function. The complexity and redundancy characteristics of the algorithm are comparable to those of its lossless counterpart. A heuristic argument suggests that the redundancy is of order (log log m)= log m, and this is also confirmed experimentally; simulation results are presented that agree well with this rate. Also, the complexity of the algorithm is seen to be comparable to that of the corresponding lossless scheme. We show that there is a trade-off between compression performance and encoding complexity, and we discuss how the relevant parameters can be chosen to balance this trade-off in practice. We also d...
Using Difficulty of Prediction to Decrease Computation: Fast Sort, Priority Queue and Convex Hull on Entropy Bounded Inputs
"... There is an upsurge in interest in the Markov model and also more general stationary ergodic stochastic distributions in theoretical computer science community recently (e.g. see [Vitter,KrishnanSl], [Karlin,Philips,Raghavan92], [Raghavan9 for use of Markov models for on-line algorithms, e.g., cashi ..."
Abstract
-
Cited by 15 (4 self)
- Add to MetaCart
There is an upsurge in interest in the Markov model and also more general stationary ergodic stochastic distributions in theoretical computer science community recently (e.g. see [Vitter,KrishnanSl], [Karlin,Philips,Raghavan92], [Raghavan9 for use of Markov models for on-line algorithms, e.g., cashing and prefetching). Their results used the fact that compressible sources are predictable (and vise versa), and showed that on-line algorithms can improve their performance by prediction. Actual page access sequences are in fact somewhat compressible, so their predictive methods can be of benefit. This paper investigates the interesting idea of decreasing computation by using learning in the opposite way, namely to determine the difficulty of prediction. That is, we will ap proximately learn the input distribution, and then improve the performance of the computation when the input is not too predictable, rather than the reverse. To our knowledge,
Grammar Based Codes: A New Class of Universal Lossless Source Codes
- IEEE TRANSACTIONS ON INFORMATION THEORY
, 2000
"... We investigate a type of lossless source code called a grammar based code, which, in response to any input data string x over a fixed finite alphabet, selects a context-free grammar Gx representing x in the sense that x is the unique string belonging to the language generated by Gx. Lossless compres ..."
Abstract
-
Cited by 14 (2 self)
- Add to MetaCart
We investigate a type of lossless source code called a grammar based code, which, in response to any input data string x over a fixed finite alphabet, selects a context-free grammar Gx representing x in the sense that x is the unique string belonging to the language generated by Gx. Lossless compression of x takes place indirectly via compression of the production rules of the grammar Gx. It is shown that, subject to some mild restrictions, a grammar based code is a universal code with respect to the family of finite state information sources over the finite alphabet. Redundancy bounds for grammar based codes are established. Reduction rules for designing grammar based codes are presented.
The Interactions Between Ergodic Theory and Information Theory
- IEEE Transactions on Information Theory
, 1998
"... Information theorists frequently use the ergodic theorem; likewise entropy concepts are often used in information theory. Recently the two subjects have become partially intertwined as deeper results from each discipline find use in the other. A brief history of this interaction is presented in this ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
Information theorists frequently use the ergodic theorem; likewise entropy concepts are often used in information theory. Recently the two subjects have become partially intertwined as deeper results from each discipline find use in the other. A brief history of this interaction is presented in this paper, together with a more detailed look at three areas of connection, namely, recurrence theory, blowing-up bounds, and direct sample-path methods.
Source codes as random number generators
- IEEE Trans. Inform. Theory
, 1998
"... Abstract—A random number generator generates fair coin flips by processing deterministically an arbitrary source of nonideal randomness. An optimal random number generator generates asymptotically fair coin flips from a stationary ergodic source at a rate of bits per source symbol equal to the entro ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
Abstract—A random number generator generates fair coin flips by processing deterministically an arbitrary source of nonideal randomness. An optimal random number generator generates asymptotically fair coin flips from a stationary ergodic source at a rate of bits per source symbol equal to the entropy rate of the source. Since optimal noiseless data compression codes produce incompressible outputs, it is natural to investigate their capabilities as optimal random number generators. In this paper we show under general conditions that optimal variable-length source codes asymptotically achieve optimal variable-length random bit generation in a rather strong sense. In particular, we show in what sense the Lempel–Ziv algorithm can be considered an optimal universal random bit generator from arbitrary stationary ergodic random sources with unknown distributions. Index Terms — Data compression, entropy, Lempel–Ziv algorithm, random number generation, universal source coding.

