## Universal Discrete Denoising: Known Channel (2003)

### Cached

### Download Links

- [www.hpl.hp.com]
- [www.princeton.edu]
- [www.ee.princeton.edu]
- DBLP

### Other Repositories/Bibliography

Venue: | IEEE Trans. Inform. Theory |

Citations: | 79 - 32 self |

### BibTeX

@ARTICLE{Weissman03universaldiscrete,

author = {Tsachy Weissman and Tsachy Weissman and Erik Ordentlich and Erik Ordentlich and Gadiel Seroussi and Gadiel Seroussi and Sergio Verdú and Sergio Verdú and Marcelo Weinberger and Marcelo Weinberger},

title = {Universal Discrete Denoising: Known Channel},

journal = {IEEE Trans. Inform. Theory},

year = {2003},

volume = {51},

pages = {5--28}

}

### Years of Citing Articles

### OpenURL

### Abstract

A discrete denoising algorithm estimates the input sequence to a discrete memoryless channel (DMC) based on the observation of the entire output sequence. For the case in which the DMC is known and the quality of the reconstruction is evaluated with a given single-letter fidelity criterion, we propose a discrete denoising algorithm that does not assume knowledge of statistical properties of the input sequence. Yet, the algorithm is universal in the sense of asymptotically performing as well as the optimum denoiser that knows the input sequence distribution, which is only assumed to be stationary and ergodic. Moreover, the algorithm is universal also in a semi-stochastic setting, in which the input is an individual sequence, and the randomness is due solely to the channel noise.

### Citations

8567 | Elements of Information Theory - Cover, Thomas - 1991 |

6050 |
E.: A mathematical theory of communication
- SHANNON
- 1948
(Show Context)
Citation Context ...of δ is obtained. In the majority of cases, no further iterations were needed. We now present denoising results for two images. The first image is the first page from a scanned copy of a famous paper=-= [44]-=-, available in the publications data base of the IEEE Information Theory Society. The results are shown in the upper portion of Table 2, which lists the normalized bit-error rate of the denoised image... |

2112 |
A New Approach to Linear Filtering and Prediction Problems
- Kalman
- 1960
(Show Context)
Citation Context ...e, where the input and output alphabets are the real line (or other Euclidean spaces), has received significant attention for over half a century. From the linear filters of Wiener [57, 3] and Kalman =-=[27], -=-to Donoho and Johnstone’s nonlinear denoisers [14, 15], the amount of work and literature in between is far too extensive even to be given a representative sample of references. In fact, the practic... |

1878 | Introduction of Probability Theory and its Applications - Feller - 1971 |

1493 | Probability inequalities for sums of bounded random variables - Hoeffding - 1963 |

804 | DL(1995). “De-noising by soft-thresholding
- Donoho
(Show Context)
Citation Context ... line (or other Euclidean spaces), has received significant attention for over half a century. From the linear filters of Wiener [57, 3] and Kalman [27], to Donoho and Johnstone’s nonlinear denoiser=-=s [14, 15]-=-, the amount of work and literature in between is far too extensive even to be given a representative sample of references. In fact, the practice of denoising, as influenced by the theory, at least fo... |

772 |
A maximization technique occuring in the statistical analysis of probabilistic functions of Markov chains
- Baum, Petrie, et al.
- 1970
(Show Context)
Citation Context ...o a known Markov process and the channel (from state to observation) is known, the above optimum Bayesian scheme can be implemented with reasonable complexity via forward-backward dynamic programming =-=[8, 1]-=-. It should be mentioned, however, that even for the simplest among cases where the underlying signal has memory, namely the case of a binary Markov chain observed through a Binary Symmetric Channel (... |

730 | Compression of individual sequences via variable-rate coding
- Ziv, Lempel
- 1978
(Show Context)
Citation Context ...se, for every underlying individual sequence. Here, competing with finite-order slidingwindow denoisers is akin to the setting introduced in the universal lossless coding literature by Ziv and Lempel =-=[60]-=-. (b) The stochastic setting. We show that our denoising algorithm asymptotically attains the performance of the optimal distribution-dependent scheme, for any stationary ergodic source that may be ge... |

658 | Large Deviations Techniques and Applications - Dembo, Zeitouni - 1998 |

498 |
Stochastic Complexity
- Rissanen
- 1989
(Show Context)
Citation Context ...tant in all applications, but essential in applications with large alphabets (e.g., continuous tone images), as is evident from the error terms in Theorem 2 in Section 4. Similar issues of model cost =-=[37]-=- have been addressed in related areas of lossless image compression (see, for instance, [6]), and significant knowledge and experience have been generated, which can be brought to bear on the discrete... |

377 |
Extrapolation, Interpolation, and Smoothing of Stationary Time Series
- Wiener
- 1949
(Show Context)
Citation Context ...sThe continuous case, where the input and output alphabets are the real line (or other Euclidean spaces), has received significant attention for over half a century. From the linear filters of Wiener =-=[57, 3] a-=-nd Kalman [27], to Donoho and Johnstone’s nonlinear denoisers [14, 15], the amount of work and literature in between is far too extensive even to be given a representative sample of references. In f... |

364 |
The Theory of Matrices
- Lancaster, Tismenetsky
- 1985
(Show Context)
Citation Context ... obtain ˆX opt � �T T T −1 (z(T ))[t] = arg min PZt|z(T \t) Π (ΠΠ ) [λˆx ⊙ πzt] . (32) ˆx∈A The above derivation can be readily extended by replacing the Moore-Penrose generalized in=-=verse (cf., e.g., [29]) Π T (ΠΠ T-=- ) −1 appearing in (30) and (32) with any other generalized inverse of the form Γ T (ΠΓ T ) −1 , where Γ is any M × M ′ matrix for which the generalized inverse exists. While any generalize... |

239 | Wavelet shrinkage: asymptopia
- Donoho, Johnstone, et al.
- 1995
(Show Context)
Citation Context ... line (or other Euclidean spaces), has received significant attention for over half a century. From the linear filters of Wiener [57, 3] and Kalman [27], to Donoho and Johnstone’s nonlinear denoiser=-=s [14, 15]-=-, the amount of work and literature in between is far too extensive even to be given a representative sample of references. In fact, the practice of denoising, as influenced by the theory, at least fo... |

222 |
Morphological Image Analysis: Principles and Application
- Soille
- 1999
(Show Context)
Citation Context ...the normalized bit-error rate of the denoised image, relative to the original one. The table also shows results of denoising the same image with a 3 × 3 median filter [23], and a morphological filter=-= [47] -=-available under MATLAB. The results for the morphological filter are for the best ordering of the morphological open and close operations based on a 2 × 2 structural element, which was found to give ... |

172 | Hidden Markov processes
- Ephraim, Merhav
(Show Context)
Citation Context ...hat of the noise-corrupted signal, are finite. The problem arises in a variety of situations ranging from typing and/or spelling correction [30, 10] to Hidden Markov Model (HMM) state estimation (cf. =-=[18]-=- for the many applications); from DNA sequence analysis and processing [45, 49, 48] to enhancement of facsimile and other binary images; from blind equalization problems to joint source-channel decodi... |

158 | The context-tree weighting method: basic properties
- Willems, Shtarkov, et al.
- 1995
(Show Context)
Citation Context ...lies to more general context models, in which the context length depends not only on z n , but may vary from location to location, similar to the tree models customary in data compression (see, e.g., =-=[51, 58]-=-). Moreover, the context length need not be equal on the left and on the right. As mentioned in Section 3, the internal data structure of the DUDE can be readily designed to support these models. Choo... |

155 | Universal prediction of individual sequences
- Feder, Merhav, et al.
- 1992
(Show Context)
Citation Context ...ce of kn is akin to the situation arising in universal prediction of individual sequences, where any growth rate for the order of a Markov predictor slower than some threshold guarantees universality =-=[19]-=-. The choice of a logarithmic growth rate (the fastest in the allowable range) would be similar to the choice implicit in the LZ predictor. The trade-offs involved in this choice will become clearer i... |

136 | Universal prediction
- Merhav, Feder
- 1998
(Show Context)
Citation Context ...e center of c is α. Note that, for every d ∈ A n , For P ∈ M, let � q(a, d, bβc)[α] = m(a, b, c)[β]. α∈A � U(P) = min Λ(a, ˆx)P(a) = min ˆx∈A ˆx∈A a∈A λTˆx P (3) denote the =-=Bayes envelope (cf., e.g., [24, 41, 31]) -=-associated with the distribution P and the loss measure Λ. Following [24], it will be convenient to extend the definition of U(·) to cases in which the argument is any M-vector v, not necessarily in... |

95 |
Approximation to Bayes risk in repeated plays,” in Contributions to the Theory of Games, voL HI, Annals of Mathematics Studies
- Hannan
(Show Context)
Citation Context ...e center of c is α. Note that, for every d ∈ A n , For P ∈ M, let � q(a, d, bβc)[α] = m(a, b, c)[β]. α∈A � U(P) = min Λ(a, ˆx)P(a) = min ˆx∈A ˆx∈A a∈A λTˆx P (3) denote the =-=Bayes envelope (cf., e.g., [24, 41, 31]) -=-associated with the distribution P and the loss measure Λ. Following [24], it will be convenient to extend the definition of U(·) to cases in which the argument is any M-vector v, not necessarily in... |

71 | Lossy source coding - Berger, Gibson - 1998 |

51 |
Wavelet Shrinkage
- Donoho, Johnstone, et al.
- 1995
(Show Context)
Citation Context ...r other Euclidean spaces), has received significant attention for over half a century. From the linear filters of Wiener [75], [6] and Kalman [34], to Donoho and Johnstone’s nonlinear denoisers [20], =-=[22]-=-, the amount of work and literature in between is far too extensive even to be given a representative sample of references. In fact, the practice of denoising, as influenced by the theory, at least fo... |

50 | denoising
- Rissanen
- 2000
(Show Context)
Citation Context ... encompassing Natarajan’s “Occam filters” 2s[32, 33, 34], Yu et al.’s “compresstimation” [7, 26], Donoho’s “Kolmogorov sampler” [16], and TabusRissanen-Astola’s “normalized maxim=-=um likelihood” models [48, 49, 38]-=-. The intuition motivating the compression-based approach is that the noise constitutes that part of the noisy signal which is hardest to compress. Thus, by lossily compressing the noisy signal and ap... |

45 |
Context based spelling correction
- Mays, Damerau, et al.
- 1991
(Show Context)
Citation Context ...r the case where the alphabet of the noiseless, as well as that of the noise-corrupted signal, are finite. The problem arises in a variety of situations ranging from typing and/or spelling correction =-=[30, 10]-=- to Hidden Markov Model (HMM) state estimation (cf. [18] for the many applications); from DNA sequence analysis and processing [45, 49, 48] to enhancement of facsimile and other binary images; from bl... |

42 |
Filtering random noise from deterministic signals via data compression
- Natarajan
(Show Context)
Citation Context ...posterior distribution on which the optimal denoiser is based is not available. One recent line of attack to this problem is the compression-based approach, encompassing Natarajan’s “Occam filters=-=” 2s[32, 33, 34], Yu et al.’s ��-=-�compresstimation” [7, 26], Donoho’s “Kolmogorov sampler” [16], and TabusRissanen-Astola’s “normalized maximum likelihood” models [48, 49, 38]. The intuition motivating the compression-b... |

40 |
On receiver structures for channels having memory
- Chang, Hancock
- 1966
(Show Context)
Citation Context ...ng [45, 49, 48] to enhancement of facsimile and other binary images; from blind equalization problems to joint source-channel decoding when a discrete source is sent unencoded through a noisy channel =-=[8, 21]-=-. Here, it is assumed that the goal of a denoising algorithm is to minimize the expected distortion of its output with respect to the unobserved noiseless signal (measured by a single-letter loss func... |

40 |
Asymptotically subminimax solutions of compound statistical decision problems
- Robbins
- 1951
(Show Context)
Citation Context ...noising and related problems. Most closely connected to our stochastic and semi-stochastic settings are the empirical Bayes and compound decision methods, respectively, from the statistics literature =-=[51]-=-, [31], [53]–[55], [57], [58] (cf. [77] for a more comprehensive list of references). Most of the work on the compound decision problem has focused on competing with a “symbol-by-symbol” denoiser, and... |

27 |
On limited-delay lossy coding and filtering of individual sequences
- Weissman, Merhav
- 2002
(Show Context)
Citation Context ... first introduced into information theory by Ziv in his work [59] on rate distortion coding of individual sequences. More recently, problems of prediction [53, 56], as well as of limited-delay coding =-=[54]-=- of noise-corrupted individual sequences were also considered. As mentioned in Section 1, the semi-stochastic setting is also related to the classical compound decision problem [25, 39, 40, 42, 43, 50... |

24 |
To Code or Not To Code
- Gastpar
- 2002
(Show Context)
Citation Context ...ng [45, 49, 48] to enhancement of facsimile and other binary images; from blind equalization problems to joint source-channel decoding when a discrete source is sent unencoded through a noisy channel =-=[8, 21]-=-. Here, it is assumed that the goal of a denoising algorithm is to minimize the expected distortion of its output with respect to the unobserved noiseless signal (measured by a single-letter loss func... |

24 |
Distortion-rate theory for individual sequences
- Ziv
(Show Context)
Citation Context ...in this case we have the explicit relation Pr(Z n = z n ) = n� Π(xi, zi). i=1 A setting involving a noise-corrupted individual sequence was first introduced into information theory by Ziv in his wo=-=rk [59]-=- on rate distortion coding of individual sequences. More recently, problems of prediction [53, 56], as well as of limited-delay coding [54] of noise-corrupted individual sequences were also considered... |

20 |
Probability Theory and Examples
- Durret
- 2005
(Show Context)
Citation Context ... the two sequences, which for each given input k-tuple will converge to deterministic (channel dependent) values. The technical proof is best handled by direct use of Kolmogorov’s 0-1 law (cf., e.g.=-=, [17]). Pro-=-of of Claim 1: For fixed x ∈ A ∞ and k, Dk(x, z) is, by definition, invariant to changes in a finite number of coordinates of z. Thus, by Kolmogorov’s 0-1 law, there exists a deterministic const... |

18 | Asymptotic filtering for finite state Markov chains, Stochastic Process
- Khasminskii, Zeitouni
- 1996
(Show Context)
Citation Context ...iser is not explicitly known for all values of the transition probability and the channel error rate; only the asymptotic behavior of the bit error rate, as the transition probabilities become small, =-=[28, 46] and-=- conditions for the optimality of “singlet decoding” (cf. [13]), are known. The literature on the universal discrete denoising setting is even sparser. In this setting, there is uncertainty regard... |

17 | Lossless compression of continuous-tone images
- Carpentieri, Weinberger, et al.
(Show Context)
Citation Context ...us tone images), as is evident from the error terms in Theorem 2 in Section 4. Similar issues of model cost [37] have been addressed in related areas of lossless image compression (see, for instance, =-=[6]-=-), and significant knowledge and experience have been generated, which can be brought to bear on the discrete denoising problem. Finally, we mention that if a general tree model is used for the count ... |

16 | Universal prediction of individual binary sequences in the presence of noise
- Weissman, Merhav
- 2001
(Show Context)
Citation Context ...ing a noise-corrupted individual sequence was first introduced into information theory by Ziv in his work [59] on rate distortion coding of individual sequences. More recently, problems of prediction =-=[53, 56]-=-, as well as of limited-delay coding [54] of noise-corrupted individual sequences were also considered. As mentioned in Section 1, the semi-stochastic setting is also related to the classical compound... |

15 |
Wavelet analysis
- Bruce, Donoho, et al.
- 1996
(Show Context)
Citation Context ...ionally indexed data corrupted by additive Gaussian white noise, is believed by some to have reached a point where substantial improvement in performance is unlikely for most applications of interest =-=[5]-=-. Considerably less developed are the theory and practice of denoising for the case where the alphabet of the noiseless, as well as that of the noise-corrupted signal, are finite. The problem arises i... |

15 |
Twofold universal prediction schemes for achieving the finite state predictability of a noisy individual binary sequence
- Weissman, Merhav, et al.
(Show Context)
Citation Context ...ing a noise-corrupted individual sequence was first introduced into information theory by Ziv in his work [59] on rate distortion coding of individual sequences. More recently, problems of prediction =-=[53, 56]-=-, as well as of limited-delay coding [54] of noise-corrupted individual sequences were also considered. As mentioned in Section 1, the semi-stochastic setting is also related to the classical compound... |

14 | On the optimality of symbol by symbol filtering and denoising
- Ordentlich, Weissman
- 2006
(Show Context)
Citation Context ...the channel error rate; only the asymptotic behavior of the bit-error rate, as the transition probabilities become small [35], [61], and conditions for the optimality of “singlet decoding” (cf. [19], =-=[47]-=-), are known. In this work, we address a universal version of the discrete denoising problem in which there is uncertainty about the distribution of the underlying noiseless signal, so that the poster... |

13 | Inequalities for the l1 deviation of the empirical distribution
- Weissman, Ordentlich, et al.
- 2003
(Show Context)
Citation Context ... Notice that Zℓ+k+1, Z ℓ+2(k+1), . . . , Z ℓ+nℓ(k+1), are the only random variables in the lemma that have not been fixed. We will obtain the bound (75) of Lemma 3 by applying the following re=-=sult of [36], where DB-=-(p1�p2) = p1 log(p1/p2) + (1 − p1) log((1 − p1)/(1 − p2)) will denote the binary divergence, which we take to be ∞ if p1 > 1. Proposition 1 Let P be a probability distribution on the set {1,... |

12 |
A simplified Derivation of Linear Least Square Smoothing and Prediction Theory, Proceedings of I.R.E., Volume: 38 , Issue: 4 pp.: 417 – 425
- Bode, Shannon
- 1949
(Show Context)
Citation Context ...sThe continuous case, where the input and output alphabets are the real line (or other Euclidean spaces), has received significant attention for over half a century. From the linear filters of Wiener =-=[57, 3] a-=-nd Kalman [27], to Donoho and Johnstone’s nonlinear denoisers [14, 15], the amount of work and literature in between is far too extensive even to be given a representative sample of references. In f... |

12 | The minimax distortion redundancy in noisy source coding
- Dembo, Weissman
- 2003
(Show Context)
Citation Context ...he fact that compressionbased schemes for universal denoising fall short of the optimal distribution dependent performance was consolidated from a somewhat different perspective by Dembo and Weissman =-=[11, 52]-=-, who consider universal rate distortion coding of noisy sources and characterize tradeoffs between the attainable denoising performance and the rate constraint. In principle, a denoising scheme that ... |

12 |
Asymptotic solutions of the compound decision problem for two completely specified distributions,” Ann
- Hannan, Robbins
- 1955
(Show Context)
Citation Context ...th) yields a denoiser with the claimed properties. We remark that in the statistics literature, the semi-stochastic setting dates nearly half a century back to the so-called compound decision problem =-=[25, 39, 40, 42, 43, 50]-=-, which can be viewed as the particular case of our denoising setting in which the denoiser is constrained to be context-independent, corresponding to k = 0 in the above discussion. The remainder of t... |

12 |
Almost-noiseless joint source-channel coding-decoding of sources with memory
- Caire, Shamai, et al.
- 2004
(Show Context)
Citation Context ...f facsimile and other binary images; from blind equalization problems to joint source–channel decoding when a discrete source is sent uncompressed (or suboptimally compressed) through a noisy channel =-=[9]-=-, [45] (and references therein). A commonly analyzed denoising setting is one in which the underlying noiseless signal and noisy channel are assumed to be stochastic with known distributions. It is as... |

11 |
An examination of undetected typing errors
- Damerau, Mays
- 1989
(Show Context)
Citation Context ...r the case where the alphabet of the noiseless, as well as that of the noise-corrupted signal, are finite. The problem arises in a variety of situations ranging from typing and/or spelling correction =-=[30, 10]-=- to Hidden Markov Model (HMM) state estimation (cf. [18] for the many applications); from DNA sequence analysis and processing [45, 49, 48] to enhancement of facsimile and other binary images; from bl... |

10 | Discrete universal filtering through incremental parsing
- Ordentlich, Weissman, et al.
- 2004
(Show Context)
Citation Context ...the expected dynamics of the data statistics can be helpful. Related theoretical and practical directions that have been pursued since the submission of this work include causal denoising (filtering) =-=[48]-=-, the case of channel uncertainty [27], [81], the case of a general (not necessarily discrete) channel output alphabet [17], the case of channel memory [82], loss estimation for efficient pruning of b... |

10 | Compound decision theory and empirical Bayes methods. Ann. Statist. 31 379–390. Dedicated to the memory of Herbert
- ZHANG
- 2003
(Show Context)
Citation Context ...ely connected to our stochastic and semi-stochastic settings are the empirical Bayes and compound decision methods, respectively, from the statistics literature [51], [31], [53]–[55], [57], [58] (cf. =-=[77]-=- for a more comprehensive list of references). Most of the work on the compound decision problem has focused on competing with a “symbol-by-symbol” denoiser, and can be viewed as a particularization o... |

9 |
A note on the observation of a Markov source through a noisy channel
- Devore
- 1974
(Show Context)
Citation Context ...ity and the channel error rate; only the asymptotic behavior of the bit error rate, as the transition probabilities become small, [28, 46] and conditions for the optimality of “singlet decoding” (=-=cf. [13]-=-), are known. The literature on the universal discrete denoising setting is even sparser. In this setting, there is uncertainty regarding the distribution of the underlying noiseless signal and/or reg... |

9 |
Filtering random noise via data compression
- Natarajan
- 1993
(Show Context)
Citation Context ...posterior distribution on which the optimal denoiser is based is not available. One recent line of attack to this problem is the compression-based approach, encompassing Natarajan’s “Occam filters=-=” 2s[32, 33, 34], Yu et al.’s ��-=-�compresstimation” [7, 26], Donoho’s “Kolmogorov sampler” [16], and TabusRissanen-Astola’s “normalized maximum likelihood” models [48, 49, 38]. The intuition motivating the compression-b... |

9 |
Classification and feature gene selection using the normalized maximum likelihood model for discrete regression
- Tabus, Rissanen, et al.
(Show Context)
Citation Context ...ety of situations ranging from typing and/or spelling correction [30, 10] to Hidden Markov Model (HMM) state estimation (cf. [18] for the many applications); from DNA sequence analysis and processing =-=[45, 49, 48]-=- to enhancement of facsimile and other binary images; from blind equalization problems to joint source-channel decoding when a discrete source is sent unencoded through a noisy channel [8, 21]. Here, ... |

9 |
Normalized maximum likelihood models for Boolean regression with application to prediction and classification
- Tabus, Rissanen, et al.
- 2002
(Show Context)
Citation Context ...ety of situations ranging from typing and/or spelling correction [30, 10] to Hidden Markov Model (HMM) state estimation (cf. [18] for the many applications); from DNA sequence analysis and processing =-=[45, 49, 48]-=- to enhancement of facsimile and other binary images; from blind equalization problems to joint source-channel decoding when a discrete source is sent unencoded through a noisy channel [8, 21]. Here, ... |

8 | Universal prediction of random binary sequences in a noisy environment
- Weissman, Merhav
- 2004
(Show Context)
Citation Context ...ng schemes, not necessarily sliding block schemes of the type considered in Section 4. This is in accord with analogous situations in universal compression [60], prediction [31], and noisy prediction =-=[55]-=-, where in the individual-sequence setting the class of schemes in the comparison class is limited in some computational sense. In the fully stochastic setting, on the other hand, such a limitation ta... |

8 |
Efficient pruning of bidirectional context trees with applications to universal denoising and compression
- Ordentlich, Weinberger, et al.
- 2004
(Show Context)
Citation Context ... may vary from location to location, similar to the tree models customary in data compression (see, e.g., [68], [76]). Moreover, the context length need not be equal on the left and on the right (see =-=[80]-=- for a formal definition). As mentioned in Section IV, the internal data structure of the DUDE can be readily designed to support these models. Choosing an appropriately sized context model is importa... |