Results 1  10
of
41
Universal prediction
 IEEE TRANSACTIONS ON INFORMATION THEORY
, 1998
"... This paper consists of an overview on universal prediction from an informationtheoretic perspective. Special attention is given to the notion of probability assignment under the selfinformation loss function, which is directly related to the theory of universal data compression. Both the probabili ..."
Abstract

Cited by 137 (11 self)
 Add to MetaCart
This paper consists of an overview on universal prediction from an informationtheoretic perspective. Special attention is given to the notion of probability assignment under the selfinformation loss function, which is directly related to the theory of universal data compression. Both the probabilistic setting and the deterministic setting of the universal prediction problem are described with emphasis on the analogy and the differences between results in the two settings.
A Vector Quantization Approach to Universal Noiseless Coding and Quantization
 IEEE Trans. Inform. Theory
, 1996
"... AbstractA twostage code is a block code in which each block of data is coded in two stages: the first stage codes the identity of a block code among a collection of codes, and the second stage codes the data using the identified code. The collection of codes may he noiseless codes, fixedrate quan ..."
Abstract

Cited by 45 (10 self)
 Add to MetaCart
AbstractA twostage code is a block code in which each block of data is coded in two stages: the first stage codes the identity of a block code among a collection of codes, and the second stage codes the data using the identified code. The collection of codes may he noiseless codes, fixedrate quantizers, or variablerate quantizers. We take a vector quantization approach to twostage coding, in which the first stage code can be regarded as a vector quantizer that “quantizes ” the input data of length n to one of a fixed collection of block codes. We apply the generalized Lloyd algorithm to the firststage quantizer, using induced measures of rate and distortion, to design locally optimal twostage, codes. On a source of medical images, twostage variahlerate vector quantizers designed in this way outperform standard (onestage) fixedrate vector quantizers by over 9 dB. The tail of the operational distortionrate function of the firststage quantizer determines the optimal rate of convergence of the redundancy of a universal sequence of twostage codes. We show that there exist twostage universal noiseless codes, fixedrate quantizers, and variablerate quantizers whose perletter rate and distortion redundancies converge to zero as (k/2)n ’ logn, when the universe of sources has finite dimension k. This extends the achievability part of Rissanen’s theorem from universal noiseless codes to universal quantizers. Further, we show that the redundancies converge as O(n’) when the universe of sources is countable, and as O(r~l+‘) when the universe of sources is infinitedimensional, under appropriate conditions. Index TermsTwostage, adaptive, compression, minimum description length, clustering. I.
Universal compression of memoryless sources over unknown alphabets
 IEEE TRANSACTIONS ON INFORMATION THEORY
, 2004
"... It has long been known that the compression redundancy of independent and identically distributed (i.i.d.) strings increases to infinity as the alphabet size grows. It is also apparent that any string can be described by separately conveying its symbols, and its pattern—the order in which the symbol ..."
Abstract

Cited by 35 (10 self)
 Add to MetaCart
It has long been known that the compression redundancy of independent and identically distributed (i.i.d.) strings increases to infinity as the alphabet size grows. It is also apparent that any string can be described by separately conveying its symbols, and its pattern—the order in which the symbols appear. Concentrating on the latter, we show that the patterns of i.i.d. strings over all, including infinite and even unknown, alphabets, can be compressed with diminishing redundancy, both in block and sequentially, and that the compression can be performed in linear time. To establish these results, we show that the number of patterns is the Bell number, that the number of patterns with a given number of symbols is the Stirling number of the second kind, and that the redundancy of patterns can be bounded using results of Hardy and Ramanujan on the number of integer partitions. The results also imply an asymptotically optimal solution for the GoodTuring probabilityestimation problem.
Limit results on pattern entropy
 IEEE Trans. Inf. Theory
, 2006
"... We determine the entropy rate of patterns of certain random processes, bound the speed at which the persymbol pattern entropy converges to this rate, and show that patterns satisfy an asymptotic equipartition property. To derive some of these results we upper bound the probability that the n ′ th v ..."
Abstract

Cited by 17 (3 self)
 Add to MetaCart
We determine the entropy rate of patterns of certain random processes, bound the speed at which the persymbol pattern entropy converges to this rate, and show that patterns satisfy an asymptotic equipartition property. To derive some of these results we upper bound the probability that the n ′ th variable in a random process differs from all preceding ones.
Universal lossless compression with unknown alphabets  The average case
, 2006
"... Universal compression of patterns of sequences generated by independently identically distributed (i.i.d.) sources with unknown, possibly large, alphabets is investigated. A pattern is a sequence of indices that contains all consecutive indices in increasing order of first occurrence. If the alphabe ..."
Abstract

Cited by 14 (4 self)
 Add to MetaCart
Universal compression of patterns of sequences generated by independently identically distributed (i.i.d.) sources with unknown, possibly large, alphabets is investigated. A pattern is a sequence of indices that contains all consecutive indices in increasing order of first occurrence. If the alphabet of a source that generated a sequence is unknown, the inevitable cost of coding the unknown alphabet symbols can be exploited to create the pattern of the sequence. This pattern can in turn be compressed by itself. It is shown that if the alphabet size k is essentially small, then the average minimax and maximin redundancies as well as the redundancy of every code for almost every source, when compressing a pattern, consist of at least 0.5 log ( n/k 3) bits per each unknown probability parameter, and if all alphabet letters are likely to occur, there exist codes whose redundancy is at most 0.5 log ( n/k 2) bits per each unknown probability parameter, where n is the length of the data sequences. Otherwise, if the alphabet is large, these redundancies are essentially at least O ( n −2/3) bits per symbol, and there exist codes that achieve redundancy of essentially O ( n −1/2) bits per symbol. Two suboptimal lowcomplexity sequential algorithms for compression of patterns are presented and their description lengths
On the minimum description length principle for sources with piecewise constant parameters
 IEEE Trans. Inf. Theory,vol.39
, 1993
"... AbstractUniversal lossless coding in the presence of finitely many abrupt changes in the statistics of the source, at unknown points, is investigated. The minimum description length (MDL) principle is derived for this setting. In particular, it is shown that for any uniquely decipherable code, for ..."
Abstract

Cited by 14 (1 self)
 Add to MetaCart
AbstractUniversal lossless coding in the presence of finitely many abrupt changes in the statistics of the source, at unknown points, is investigated. The minimum description length (MDL) principle is derived for this setting. In particular, it is shown that for any uniquely decipherable code, for almost every combination of statistical parameter vectors governing each segment, and for almost every vector of transition instants, the minimum achievable redundancy is composed from 0.5 log n /n bits for each unknown segmental parameter and log n /n bits for each transition, where n is the length of the input string. This redundancy is shown to be attainable by a strongly sequential universal encoder, i.e., an encoder that does not utilize the knowledge of a prescribed value of n. Key WordsMinimum description length, universal coding, sequential coding, segmentation, edge detection.
A lower bound on compression of unknown alphabets
 Theoret. Comput. Sci
, 2005
"... Many applications call for universal compression of strings over large, possibly infinite, alphabets. However, it has long been known that the resulting redundancy is infinite even for i.i.d. distributions. It was recently shown that the redudancy of the strings ’ patterns, which abstract the values ..."
Abstract

Cited by 10 (3 self)
 Add to MetaCart
Many applications call for universal compression of strings over large, possibly infinite, alphabets. However, it has long been known that the resulting redundancy is infinite even for i.i.d. distributions. It was recently shown that the redudancy of the strings ’ patterns, which abstract the values of the symbols, retaining only their relative precedence, is sublinear in the blocklength n, hence the persymbol redundancy diminishes to zero. In this paper we show that pattern redundancy is at least (1.5 log 2 e) n 1/3 bits. To do so, we construct a generating function whose coefficients lower bound the redundancy, and use Hayman’s saddlepoint approximation technique to determine the coefficients ’ asymptotic behavior. 1
The empirical distribution of rateconstrained source codes
 IEEE Trans. Inform. Theory
"... Let X =(X1,...) be a stationary ergodic finitealphabet source, X n denote its first n symbols, and Y n be the codeword assigned to X n by a lossy source code. The empirical kthorder joint distribution ˆ Q k [X n,Y n](x k,y k)is defined as the frequency of appearances of pairs of kstrings (x k,y k ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
Let X =(X1,...) be a stationary ergodic finitealphabet source, X n denote its first n symbols, and Y n be the codeword assigned to X n by a lossy source code. The empirical kthorder joint distribution ˆ Q k [X n,Y n](x k,y k)is defined as the frequency of appearances of pairs of kstrings (x k,y k)alongthepair(X n,Y n). Our main interest is in the sample behavior of this (random) distribution. Letting I(Q k) denote the mutual information I(X k; Y k) when (X k,Y k) ∼ Q k we show that for any (sequence of) lossy source code(s) of rate ≤ R lim sup n→∞ 1 k I ˆQ k n n
Lossy Compression Of Individual Signals Based On String Matching And One Pass Codebook Design
 In Proceedings ICASSP
"... This paper describes an effort to extend the LempelZiv algorithm to a practical universal lossy compression algorithm. It is based on the idea of approximate string matching with a ratedistortion (R \Gamma D) criterion, and is addressed within the framework of vector quantization (VQ) [4]. A practi ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
This paper describes an effort to extend the LempelZiv algorithm to a practical universal lossy compression algorithm. It is based on the idea of approximate string matching with a ratedistortion (R \Gamma D) criterion, and is addressed within the framework of vector quantization (VQ) [4]. A practical one pass algorithm for VQ codebook construction and adaptation for individual signals is developed which assumes no prior knowledge of the source statistics and involves no iteration. We call this technique ratedistortion LempelZiv (RDLZ). As in the case of the LempelZiv algorithm, the encoded bit stream consists of codebook (dictionary) updates as well as indices (pointers) to the codebook. The idea of "trading" bits for distortion in modifying the codebook will be introduced. Experimental results show that, for Gaussian sources as well as real images, RDLZ performs comparably, sometimes favorably, to static codebook VQ trained on the corresponding sources or images. 1. INTRODUCTION...
On the Vocabulary of GrammarBased Codes and the Logical Consistency of Texts
, 2008
"... The article presents a new interpretation for Zipf’s law in natural language which relies on two areas of information theory. We reformulate the problem of grammarbased compression and investigate properties of strongly nonergodic stationary processes. The motivation for the joint discussion is to ..."
Abstract

Cited by 4 (3 self)
 Add to MetaCart
The article presents a new interpretation for Zipf’s law in natural language which relies on two areas of information theory. We reformulate the problem of grammarbased compression and investigate properties of strongly nonergodic stationary processes. The motivation for the joint discussion is to prove a proposition with a simple informal statement: If an nletter long text describes n β independent facts in a random but consistent way then the text contains at least n β /log n different words. In the formal statement, two specific postulates are adopted. Firstly, the words are understood as the nonterminal symbols of the shortest grammarbased encoding of the text. Secondly, the texts are assumed to be emitted by a nonergodic source, with the described facts being binary IID variables that are asymptotically predictable in a shiftinvariant way. The proof of the formal proposition applies several new tools. These