## Joint fixed-rate universal lossy coding and identification of continuous-alphabet memoryless sources

### Cached

### Download Links

- [www.ifp.uiuc.edu]
- [www.ifp.illinois.edu]
- [arxiv.org]
- DBLP

### Other Repositories/Bibliography

Venue: | IEEE Trans. Inform. Theory |

Citations: | 2 - 1 self |

### BibTeX

@ARTICLE{Raginsky_jointfixed-rate,

author = {Maxim Raginsky},

title = {Joint fixed-rate universal lossy coding and identification of continuous-alphabet memoryless sources},

journal = {IEEE Trans. Inform. Theory},

year = {},

volume = {2005}

}

### OpenURL

### Abstract

The problem of joint universal source coding and identification is considered in the setting of fixed-rate lossy coding of continuous-alphabet memoryless sources. For a wide class of bounded distortion measures, it is shown that any compactly parametrized family of R d-valued i.i.d. sources with absolutely continuous distributions satisfying appropriate smoothness and Vapnik–Chervonenkis learnability conditions, admits a joint scheme for universal lossy block coding and parameter estimation, such that when the block length n tends to infinity, the overhead per-letter rate and the distortion redundancies converge to zero as O(n −1 log n) and O ( � n −1 log n), respectively. Moreover, the active source can be determined at the decoder up to a ball of radius O ( � n −1 log n) in variational distance, asymptotically almost surely. The system has finite memory length equal to the block length, and can be thought of as blockwise application of a time-invariant nonlinear filter with initial conditions determined from the previous block. Comparisons are presented with several existing schemes for universal vector quantization, which do not include parameter estimation explicitly, and an extension to unbounded distortion measures is outlined. Finally, finite mixture classes and exponential families are given as explicit examples of parametric sources admitting joint universal compression and modeling schemes of the kind studied here. Keywords: Learning, minimum-distance density estimation, two-stage codes, universal vector quantization, Vapnik– Chervonenkis dimension. I.

### Citations

8632 |
Elements of Information Theory
- Cover, Thomas
- 1991
(Show Context)
Citation Context ...atural symmetry between these two insights, owing to the well-known one-to-one correspondence between (almost) optimal lossless codes and probability distributions on the space of all input sequences =-=[6]-=-. For this The material in this paper was presented in part at the IEEE International symposium on Information Theory, Seattle, July 9 – July 14, 2006. This work was supported by the Beckman Institute... |

1496 | Probability inequalities for sums of bounded random variables
- Hoeffding
- 1963
(Show Context)
Citation Context ...1) we have that EXY [log a(X,Y )] = I(X,Y ) = R and EXY [ρ(X,Y )] = Dθ(R). Since 0 ≤ ρ(X,Y ) ≤ ρmax, the second probability on the right-hand side of (B.5) can be bounded using Hoeffding’s inequality =-=[49]-=-, which states that for i.i.d. random variables S1, · · · ,Sn satisfying a ≤ Si ≤ b a.s., � n� � 1 P Si ≥ E[S1] + δ ≤ e n −2nδ2 /(b−a) 2 . This yields the estimate PXY � 1 n i=1 n� � ρ(Xi,Yi) ≥ Dθ(R) ... |

1483 | Information Theory and Reliable Communication - Gallager - 1968 |

947 |
On the uniform convergence of the relative frequencies of events to their probablities. Theory Prob
- Vapnik, Chervonenkis
- 1971
(Show Context)
Citation Context ...n △ PXn(B) = 1 n n� i=1 1Xi∈B for all Borel sets B ⊂ R d . The probabilities and expectations are with respect to P . Now, if A is a VC class and V(A) ≥ 2, then the results of Vapnik and Chervonenkis =-=[46]-=- and Sauer [47] imply that SA(n) ≤ n V(A) . Plugging this bound into (A.1) and (A.2), we obtain the following: Lemma A.2. If A is a VC class with V(A) ≥ 2, then � � P sup |PXn(A) − P(A)| > ǫ A∈A for a... |

891 | Information Theory: Coding Theorems for Discrete Memoryless Systems - Csiszár, Körner - 1981 |

320 |
Lectures on the Coupling Method
- Lindvall
- 1992
(Show Context)
Citation Context ...p ρ(x,y)dµ(x,y) X ×X 1 {x�=y}dµ(x,y). The right-hand side of this expression is the well-known coupling characterization of twice the variational distance dV (P,Q) (see, e.g., Section I.5 of Lindvall =-=[50]-=-), so we obtain Interchanging the roles of P and Q, we obtain (C.9). DP(C n ) 1/p ≤ DQ(C n ) 1/p + 2 1/p dmaxdV (P,Q). To prove (C.10), let Cn ∗ achieve the nth-order optimum for P : DP(C n ∗ ) = � Dn... |

307 | The minimum description length principle in coding and modeling
- Barron, Rissanen, et al.
- 1998
(Show Context)
Citation Context ... This is the basis of the so-called Minimum Description Length (MDL) principle for model selection and, more generally, statistical inference (see, e.g., the survey article of Barron, Rissanen and Yu =-=[4]-=- or the recent book by Grünwald [5]). There is, in fact, a natural symmetry between these two insights, owing to the well-known one-to-one correspondence between (almost) optimal lossless codes and pr... |

286 |
Universal Coding, Information , Prediction, and Estimation
- Rissanen
- 1984
(Show Context)
Citation Context ...tudied here. Keywords: Learning, minimum-distance density estimation, two-stage codes, universal vector quantization, Vapnik– Chervonenkis dimension. I. INTRODUCTION In a series of influential papers =-=[1]-=-–[3], Rissanen has elucidated and analyzed deep connections between universal lossless coding and statistical modeling. His approach hinges on the following two key insights: 1) A given parametric cla... |

277 |
Methods of Information Geometry
- Amari, Nagaoka
- 2000
(Show Context)
Citation Context ...mpactly parametrized family. C. Extension to curved parametric families We can also consider parameter spaces that are more general than bounded subsets of R k . For instance, in information geometry =-=[28]-=- one often encounters curved parametric families, i.e., families {Pθ : θ ∈ Θ} of probability distributions where the parameter space Θ is a smooth compact manifold. Roughly speaking, an abstract set Θ... |

274 | Unsupervised learning of finite mixture models
- Figueiredo, Jain
(Show Context)
Citation Context ...or joint universal lossy coding and modeling. These are finite mixture classes and exponential families, which are widely used in statistical modeling, both in theory and in practice (see, e.g., [34]–=-=[37]-=-). 17sA. Mixture classes Let p1, · · · ,pk be fixed probability densities on a measurable X ⊆ Rd , and let Θ △ � = θ = (θ1, · · · ,θk) ∈ R k k� � : 0 ≤ θi ≤ 1,1 ≤ i ≤ k; θi = 1 be the probability k-si... |

240 |
On the density of families of sets
- Sauer
- 1972
(Show Context)
Citation Context ...n n� i=1 1Xi∈B for all Borel sets B ⊂ R d . The probabilities and expectations are with respect to P . Now, if A is a VC class and V(A) ≥ 2, then the results of Vapnik and Chervonenkis [46] and Sauer =-=[47]-=- imply that SA(n) ≤ n V(A) . Plugging this bound into (A.1) and (A.2), we obtain the following: Lemma A.2. If A is a VC class with V(A) ≥ 2, then � � P sup |PXn(A) − P(A)| > ǫ A∈A for any ǫ > 0, and w... |

200 | Entropy and information theory - Gray - 1990 |

169 | A Course in Density Estimation
- Devroye
- 1987
(Show Context)
Citation Context ...th absolutely continuous probability distributions for which the maximum-likelihood estimate behaves rather poorly in terms of the L1 distance between the true and the estimated probability densities =-=[16]-=-. Instead, we propose the use of the so-called minimum-distance estimate, introduced by Devroye and Lugosi [17], [18] in the context of kernel density estimation. The introduction of the minimum-dista... |

142 |
The Minimum Description Length Principle
- Grünwald
- 2007
(Show Context)
Citation Context ... Minimum Description Length (MDL) principle for model selection and, more generally, statistical inference (see, e.g., the survey article of Barron, Rissanen and Yu [4] or the recent book by Grünwald =-=[5]-=-). There is, in fact, a natural symmetry between these two insights, owing to the well-known one-to-one correspondence between (almost) optimal lossless codes and probability distributions on the spac... |

137 |
Central limit theorems for empirical measures
- Dudley
- 1978
(Show Context)
Citation Context ...8n V(A) e −nǫ2 /32 (A.3) � � � log n E sup |PXn(A) − P(A)| ≤ c , (A.4) A∈A n Remark A.1. One can use more delicate arguments involving metric entropies and covering numbers, along the lines of Dudley =-=[48]-=-, to improve the bound in (A.4) to c ′� 1/n, where c ′ = c ′ (V(A)) is another constant. However, c ′ turns out to be much larger than c, so that, for all “practical” values of n, the ”improved” O( � ... |

92 |
Recursive Bayesian estimation using Gaussian sums
- Sorenson, Alspach
- 1971
(Show Context)
Citation Context ...mes for joint universal lossy coding and modeling. These are finite mixture classes and exponential families, which are widely used in statistical modeling, both in theory and in practice (see, e.g., =-=[34]-=-–[37]). 17sA. Mixture classes Let p1, · · · ,pk be fixed probability densities on a measurable X ⊆ Rd , and let Θ △ � = θ = (θ1, · · · ,θk) ∈ R k k� � : 0 ≤ θi ≤ 1,1 ≤ i ≤ k; θi = 1 be the probability... |

48 | Rates of convergence in the source coding theorem, empirical quantizer design, and universal lossy source coding
- Linder, Lugosi, et al.
- 1994
(Show Context)
Citation Context ...y optimal performance, in the sense of minimizing the average distortion under the rate constraint, on any source in the class. Two-stage codes have also proved quite useful in universal lossy coding =-=[10]-=-, [11], [13]. For instance, the two-stage universal quantizer introduced by Chou, Effros and Gray [13] is similar in spirit to the adaptive lossless coder of Rice and Plaunt [14], [15], known as the “... |

46 |
Causal source codes
- Neuhoff, Gilbert
(Show Context)
Citation Context ...the coding process can be thought of as blockwise application of a nonlinear time-invariant filter with initial conditions determined by the preceding block. In the terminology of Neuhoff and Gilbert =-=[20]-=-, this is an instance of a block-stationary causal source code. The remainder of the paper is organized as follows. In Section II, we state the basic notions of universal lossy coding specialized to b... |

44 | A Vector Quantization Approach to Universal Noiseless Coding and Quantization
- Chou, ros, et al.
- 1996
(Show Context)
Citation Context ...r. In this paper, we are concerned with modeling the actual source directly, and not through a codebook distribution in the reproduction space. The objective of universal lossy coding (see, e.g., [8]–=-=[13]-=-) is to construct lossy block source codes (vector quantizers) that perform well in incompletely or inaccurately specified statistical environments. Roughly speaking, a sequence of vector quantizers i... |

42 |
Approximation of density functions by sequences of exponential families.” The Annals of Statistics 19
- Barron, Sheu
- 1991
(Show Context)
Citation Context ...hes Condition 3). B. Exponential families Let X be a measurable subset of Rd , and let Θ be a compact subset of Rk . A family {pθ : θ ∈ Θ} of probability densities on X is an exponential family [28], =-=[35]-=- if each pθ has the form � k� � pθ(x) = p(x)exp θihi(x) − g(θ) i=1 ≡ p(x)e θ·h(x)−g(θ) , (5.28) 18swhere p is a fixed reference density, h1, · · · ,hk are fixed real-valued functions on X , and � g(θ)... |

37 | An on-line universal lossy data compression algorithm via continuous codebook re nement-Part I - Zhang, Wei - 1996 |

29 |
The redundancy of source coding with a fidelity criterion I: Known statistics
- Zhang, Yang, et al.
- 1997
(Show Context)
Citation Context ...ealphabet memoryless sources with rate redundancy converging to zero as (k/2)log n/n, where k is the dimension of the simplex of probability distributions on the reproduction alphabet. Yang and Zhang =-=[40]-=- proved an analogous result for fixed-rate universal lossy codes and showed furthermore that the (k/2)log n/n convergence rate is optimal in a certain sense. (The redundancies in our scheme are theref... |

29 | Data-hiding codes
- Moulin, Koetter
- 2005
(Show Context)
Citation Context ...ationary ergodic sources satisfying a certain mixing condition. Moreover, the theory presented here needs to be tested in practical settings, one promising area for applications being media forensics =-=[45]-=-, where the parameter θ could represent traces or “evidence” of some prior processing performed, say, on an image or on a video sequence, and where the goal is to design an efficient system for compre... |

26 |
Fixed rate universal block source coding with a fidelity criterion
- Neuhoff, Gray, et al.
- 1975
(Show Context)
Citation Context ... the mode of convergence with respect to θ, one gets different types of universal codes. Specifically, let {Cn,m } ∞ n=1 be a sequence of lossy codes satisfying R(Cn,m ) → R as n → ∞. Then, following =-=[9]-=-, we can distinguish between the following three types of universality: Definition 2.1 (weighted universal). {C n,m } ∞ n=1 is weighted universal for {Pθ : θ ∈ Θ} with respect to a probability distrib... |

26 |
ɛ-entropy and ɛ-capacity of sets in functional spaces
- Kolmogorov, Tikhomirov
- 1961
(Show Context)
Citation Context ...ts an estimator θ ∗ = θ ∗ (X n ), where X n is an i.i.d. sample from one of the Pθ’s, such that Eθ[dV (Pθ, P θ ∗ (X n ))] ≤ 3ɛ + � 32Hɛ + 8 , n where Hɛ is the metric entropy, or Kolmogorov ɛ-entropy =-=[27]-=-, of {Pθ : θ ∈ Θ}, i.e., the logarithm of the cardinality of the minimal ɛ-net for {Pθ} under dV (·, ·). Thus, if we choose ɛ = ɛn such that � Hɛn/n → 0 as n → ∞, then θ ∗ (X n ) is a consistent estim... |

23 |
ǫ-entropy and ǫ-capacity of sets in function spaces
- Kolmogorov, Tikhomorov
- 1961
(Show Context)
Citation Context ...sts an estimator θ ∗ = θ ∗ (X n ), where X n is an i.i.d. sample from one of the Pθ’s, such that Eθ[dV (Pθ,P θ ∗ (X n ))] ≤ 3ǫ + � 32Hǫ + 8 , n where Hǫ is the metric entropy, or Kolmogorov ǫ-entropy =-=[27]-=-, of {Pθ : θ ∈ Θ}, i.e., the logarithm of the cardinality of the minimal ǫ-net for {Pθ} under dV (·, ·). Thus, if we choose ǫ = ǫn such that � Hǫn/n → 0 as n → ∞, then θ ∗ (X n ) is a consistent estim... |

22 |
A universally acceptable smoothing factor for kernel density estimates
- Devroye, Lugosi, et al.
- 1996
(Show Context)
Citation Context ...rly in terms of the L1 distance between the true and the estimated probability densities [16]. Instead, we propose the use of the so-called minimum-distance estimate, introduced by Devroye and Lugosi =-=[17]-=-, [18] in the context of kernel density estimation. The introduction of the minimum-distance estimate allows us to draw upon the powerful machinery of Vapnik–Chervonenkis theory (see, e.g., [19] and A... |

22 |
Rates of convergence of minimum distance estimators and Kolmogorov's entropy
- Yatracos
- 1985
(Show Context)
Citation Context ...> 0 such that, for each θ ∈ Θ, dV (Pθ,Pη) ≤ m�θ − η� for all η ∈ Br(θ), where � · � is the Euclidean norm on R k and Br(θ) is an open ball of radius r centered at θ. 3) The Yatracos class [17], [18], =-=[24]-=- associated with Θ, defined as AΘ △ = is a Vapnik–Chervonenkis class, V(AΘ) = V < ∞. � Aθ,η = {x ∈ X : pθ(x) > pη(x)} : θ,η ∈ Θ;θ �= η Let ρ : X × � X → R + be a single-letter distortion function of t... |

21 | Gaussian Mixture Density Modeling, Decomposition, and Applications - Zhuang, Huang, et al. - 1996 |

19 |
On an extremum problem of information theory
- Csiszár
- 1974
(Show Context)
Citation Context ... i.i.d. vector sources and characterizes the rate at which the nth-order operational DRF converges to the Shannon DRF (the proof, which uses Csiszár’s generalized parametric representation of the DRF =-=[23]-=-, as well as a combination of standard random coding arguments and large-deviation estimates, is an almost verbatim adaptation of the proof of Linder et al. to vector sources, and is presented for com... |

18 |
Coding for sources with unknown statistics: Part II. Distortion relative to a delity criterion
- Ziv
- 1972
(Show Context)
Citation Context ...coder. In this paper, we are concerned with modeling the actual source directly, and not through a codebook distribution in the reproduction space. The objective of universal lossy coding (see, e.g., =-=[8]-=-–[13]) is to construct lossy block source codes (vector quantizers) that perform well in incompletely or inaccurately specified statistical environments. Roughly speaking, a sequence of vector quantiz... |

14 |
information and stochastic complexity
- “Fisher
- 1996
(Show Context)
Citation Context ...ed here. Keywords: Learning, minimum-distance density estimation, two-stage codes, universal vector quantization, Vapnik– Chervonenkis dimension. I. INTRODUCTION In a series of influential papers [1]–=-=[3]-=-, Rissanen has elucidated and analyzed deep connections between universal lossless coding and statistical modeling. His approach hinges on the following two key insights: 1) A given parametric class o... |

13 |
A Generalization of Ornstein’s d Distance with Applications to Information Theory
- Gray, Neuhoff, et al.
- 1975
(Show Context)
Citation Context ...ces {Pθ : θ ∈ Θ} be totally bounded with respect to the variational distance. (Totally bounded classes, with respect to either the variational distance or its generalizations, such as the ¯ρ-distance =-=[26]-=-, have, in fact, been extensively used in the theory of universal lossy codes [9].) This was precisely the assumption made in the paper of Yatracos [24] on density estimation, which in turn inspired t... |

13 | Arbitrary source models and Bayesian codebooks in rate-distortion theory
- Kontoyiannis, Zhang
- 2002
(Show Context)
Citation Context ...cal problem the term “model” can refer either to a probabilistic description of the source or to a probabilistic description of a rate-distortion codebook. In fact, as shown by Kontoyiannis and Zhang =-=[39]-=-, for variable-rate lossy codes operating under a fixed distortion constraint, there is a one-to-one correspondence between codes and discrete distributions over sequences in the reproduction space (s... |

10 |
Adaptive control design and analysis
- Tao
(Show Context)
Citation Context ...ple. However, there are situations in which one would like to compress the source and identify its statistics at the same time. For instance, in indirect adaptive control (see, e.g., Chapter 7 of Tao =-=[7]-=-) the parameters of the plant (the controlled system) are estimated on the basis of observation, and the controller is modified accordingly. Consider the discretetime stochastic setting, in which the ... |

9 |
Process definitions of distortion-rate functions and source coding theorems
- GRAY, NEUHOFF, et al.
- 1975
(Show Context)
Citation Context ... both the reproduction process { � Xi} and the pair process {(Xi, � Xi)} are n-stationary, i.e., the vector processes {Xn (t)} ∞ t=−∞ and {(Xn (t), � Xn (t))} ∞ t=−∞ are stationary [20]. This implies =-=[21]-=- that where � X n = C n,m (X n ,X n m (0)). i=1 Dθ(C n,m ) = 1 n Eθ[ρ(X n , � X n )] = 1 n n� Eθ[ρ(Xi, � Xi)], More specifically, we shall consider fixed-rate lossy block codes (also referred to as ve... |

8 |
The rate distortion function for a class of sources
- Sakrison
- 1969
(Show Context)
Citation Context .... Since | ¯ Γes| = 2nR , we can choose δ small enough so that |Γes| ≤ 2 n(R+ǫ) . (4.23) 2 A sequence of lossy codes is (strongly) robust for a given class of information sources at rate R (see, e.g., =-=[31]-=-–[33]) if its asymptotic performance on each source in the class is no worse than the supremum of the distortion-rate functions of all the sources in the class at R. Neuhoff and García-Muñoz [33] have... |

6 |
Minimum description length vs. maximum likelihood in lossy data compression
- Madiman, Harrison, et al.
- 2004
(Show Context)
Citation Context ... open problem to determine lower bounds on the redundancies in the setting of joint source coding and identification.) These papers, together with the work of Madiman, Harrison and Kontoyiannis [41], =-=[42]-=-, can be thought of as generalizing Rissanen’s MDL principle to lossy setting, provided that the term “model” is understood to refer to probability distributions over codebooks in the reproduction spa... |

4 |
Strong universal source coding subject to a rate-distortion constraint
- García-Muñoz, Neuhoff
- 1982
(Show Context)
Citation Context ...n,n } ∞ n=1 of two-stage (n,n)-block codes, such that R(C n,n � � log n ) ≤ R + ǫ + O n and δθ(C n,n ) ≤ ǫ + O �� � log n 4 n (4.19) (4.20) for every θ ∈ Θ. Taking a cue from García-Muñoz and Neuhoff =-=[29]-=-, we shall call a sequence of codes {C n,n } satisfying lim n→∞ R(Cn,n ) ≤ R + ǫ and lim n→∞ δθ(C n,n ) ≤ ǫ, ∀θ ∈ Θ for a given ǫ > 0 ǫ-weakly minimax universal for {Pθ : θ ∈ Θ}. By continuity, the ex... |

4 |
Unified methods for optimal quantization of messages,” in Problemy
- Dobrushin
- 1970
(Show Context)
Citation Context ...stortion function, namely O( � n −1 log n); in particular, the constant implicit in the O(·) notation depends neither on ǫ nor on the behavior of ρ. The proof below draws upon some ideas of Dobrushin =-=[30]-=-, the difference being that he considered robust, rather than universal, codes. 2 Let M > 0 be a constant to be specified later, and define a single-letter distortion function ρM : X × � X → R + by ρM... |

3 | The Rice machine: television data compression,” Jet Propulsion - Rice, Plaunt - 1970 |

3 | sources and robust codes for difference distortion measures - “Worst - 1975 |

2 |
universal lossy source coding coding and rates of convergence for memoryless sources
- “Fixed-rate
- 1995
(Show Context)
Citation Context ...mal performance, in the sense of minimizing the average distortion under the rate constraint, on any source in the class. Two-stage codes have also proved quite useful in universal lossy coding [10], =-=[11]-=-, [13]. For instance, the two-stage universal quantizer introduced by Chou, Effros and Gray [13] is similar in spirit to the adaptive lossless coder of Rice and Plaunt [14], [15], known as the “Rice m... |

2 |
Robust source coding of weakly compact classes
- Neuhoff, García-Muñoz
- 1987
(Show Context)
Citation Context ...ce | ¯ Γes| = 2nR , we can choose δ small enough so that |Γes| ≤ 2 n(R+ǫ) . (4.23) 2 A sequence of lossy codes is (strongly) robust for a given class of information sources at rate R (see, e.g., [31]–=-=[33]-=-) if its asymptotic performance on each source in the class is no worse than the supremum of the distortion-rate functions of all the sources in the class at R. Neuhoff and García-Muñoz [33] have show... |

2 | Second-order properties of lossy likelihoods and the MLE/MDL dichotomy in lossy compression
- Madiman, Kontoyiannis
(Show Context)
Citation Context ...esting open problem to determine lower bounds on the redundancies in the setting of joint source coding and identification.) These papers, together with the work of Madiman, Harrison and Kontoyiannis =-=[41]-=-, [42], can be thought of as generalizing Rissanen’s MDL principle to lossy setting, provided that the term “model” is understood to refer to probability distributions over codebooks in the reproducti... |

1 | Joint universal lossy coding and identification of stationary mixing sources with general alphabets
- Raginsky
(Show Context)
Citation Context ...c source models such as autoregressive or Markov sources, and to variable-rate codes, so that unbounded parameter spaces could be accommodated. We have made some initial progress in this direction in =-=[43]-=-, [44], where we constructed joint schemes for variable-rate universal lossy coding and identification of stationary ergodic sources satisfying a certain mixing condition. Moreover, the theory present... |

1 |
universal lossy coding and identification of stationary mixing sources
- “Joint
- 2007
(Show Context)
Citation Context ...ce models such as autoregressive or Markov sources, and to variable-rate codes, so that unbounded parameter spaces could be accommodated. We have made some initial progress in this direction in [43], =-=[44]-=-, where we constructed joint schemes for variable-rate universal lossy coding and identification of stationary ergodic sources satisfying a certain mixing condition. Moreover, the theory presented her... |