## Wavelet shrinkage: asymptopia (1995)

### Cached

### Download Links

Venue: | Journal of the Royal Statistical Society, Ser. B |

Citations: | 256 - 35 self |

### BibTeX

@ARTICLE{Donoho95waveletshrinkage:,

author = {David L. Donoho and Iain M. Johnstone and Gerard Kerkyacharian and Dominique Picard},

title = {Wavelet shrinkage: asymptopia},

journal = {Journal of the Royal Statistical Society, Ser. B},

year = {1995},

pages = {371--394}

}

### Years of Citing Articles

### OpenURL

### Abstract

Considerable e ort has been directed recently to develop asymptotically minimax methods in problems of recovering in nite-dimensional objects (curves, densities, spectral densities, images) from noisy data. A rich and complex body of work has evolved, with nearly- or exactly- minimax estimators being obtained for a variety of interesting problems. Unfortunately, the results have often not been translated into practice, for a variety of reasons { sometimes, similarity to known methods, sometimes, computational intractability, and sometimes, lack of spatial adaptivity. We discuss a method for curve estimation based on n noisy data; one translates the empirical wavelet coe cients towards the origin by an amount p p 2 log(n) = n. The method is di erent from methods in common use today, is computationally practical, and is spatially adaptive; thus it avoids a number of previous objections to minimax estimators. At the same time, the method is nearly minimax for a wide variety of loss functions { e.g. pointwise error, global error measured in L p norms, pointwise and global error in estimation of derivatives { and for a wide range of smoothness classes, including standard Holder classes, Sobolev classes, and Bounded Variation. This is amuch broader near-optimality than anything previously proposed in the minimax literature. Finally, the theory underlying the method is interesting, as it exploits a correspondence between statistical questions and questions of optimal recovery and information-based complexity.

### Citations

4457 | Classification and Regression Trees - Breiman, Friedman, et al. - 1984 |

2671 | A theory for multiresolution signal decomposition: The wavelet representation - Mallat - 1989 |

763 | Adapting to unknown smoothness via wavelet shrinkage - Donoho, Johnstone - 1995 |

608 |
Ideal spatial adaptation via wavelet shrinkage. Biometrika 81
- Donoho, Johnstone
- 1994
(Show Context)
Citation Context ...the di culties, hesitations, quali cations, and limitations in the existing statistical literature. 3.1 The Method For simplicity, we focus on the nonparametric regression model (3) and a proposal of =-=[22]-=-; similar results are possible in the density estimation model [39]. We suppose that we have n=2 J+1 data of the form (3) and that is known. 1. Take the n given numbers and apply an empirical wavelet ... |

498 | Multivariate adaptive regression splines - Friedman - 1991 |

272 | Minimax estimation via wavelet shrinkage
- Donoho, Johnstone
- 1998
(Show Context)
Citation Context ...sholding work. 266.2.1 Minimaxity and Spatial Adaptivity The implicit position of the \Spatial Adaptivity" community that minimax theory leads to spatially non-adaptive methods is no longer tenable. =-=[20]-=- shows that minimax estimators can generally be expected to have a spatially adaptive structure, and we see in this paper that a speci c nearly-minimax estimator exhibits spatially adaptive behavior {... |

269 |
Optimal global rates of convergence for nonparametric regression
- Stone
- 1982
(Show Context)
Citation Context ...ed by decision theorists systematically over the last 15 years { namely the embedding of an appropriate hypercube in the class and using elementary decision-theoretic arguments on hypercubes. Compare =-=[56, 6, 35, 59]-=-. Theorem 9 Let k k come from the Besov scale, with parameter ( 0 ;p 0 ;q 0 ). Let be a Besov body p;q (C). Then with a c = c( ;p;q; 0 ;p 0 ;q 0 ) inf sup P fk ^ k c ( )g !1: (48) ^ ^ Moreover, when p... |

244 | InformationBased Complexity
- Traub, Wasilkowski, et al.
- 1988
(Show Context)
Citation Context ...h identi cations further in section 5.4 below. 5.2 Solution of an Optimal Recovery Model Before tackling data from (22), we consider a simpler abstract model, in which noise is deterministic (Compare =-=[47, 48, 61]-=-). The approach of analyzing statistical problems by deterministic noise has been applied previously in [14, 15]. Suppose we havean index set I (not necessarily nite), an object ( I) ofinterest, and o... |

209 | Nonlinear solution of linear inverse problems by wavelet-vaguelette decomposition - Donoho - 1995 |

194 | Classi cation and Regression Trees - Breiman, Friedman, et al. - 1984 |

165 |
Littlewood-Paley Theory and the Study of Function Spaces
- Frazier, Jawerth, et al.
- 1991
(Show Context)
Citation Context ...orthogonal systems they do not serve as eigenfunctions of a classically important operator, such as di erentiation or convolution. Nevertheless, wavelets are \almost-eigenfunctions" of many operators =-=[30, 46]-=-; while if they were the exact eigenfunctions of some speci c operator (e.g. a convolution operator) they could not continue to be \almost-eigenfunctions" of many other operators. Here, precise optima... |

164 | M,; Asymptotic equivalence of nonparametric regression and white - Brown, Low - 1996 |

131 | Interpolating wavelet transforms
- Donoho
- 1992
(Show Context)
Citation Context ...lming probability, k^ k c 0 q ( log(C= )): This completes the proof in the transitional case; the proof of Theorem 9 is complete. 7.4 Proof of Theorem 10 7.4.1 Empirical Wavelet Transform Point 1. In =-=[17, 18]-=- it is shown how one may de ne a theoretical wavelet-like transform [n] = Wnf taking a continuous function f on [0; 1] into a countable sequence [n], with two properties: (a) Matching. The theoretical... |

126 | New Thoughts on Besov Spaces - Peetre - 1976 |

108 |
Multiresolution analysis, wavelets and fast algorithms on an interval. Comptes Rendus des Séances de l’Académie des
- Cohen, Daubechies, et al.
- 1993
(Show Context)
Citation Context ...ing (3) yields data ~yI = I + ~zI; I 2In; (75) with = = p n. This form of data is of the same general form as supposed in the sequence model (22). Detailed study of the Pyramid Filtering Algorithm of =-=[10]-=- reveals that all but 2 O(log(n)) of these coe cients are a standard Gaussian white noise with variance =n; the other coe cients \feel the boundaries", and have a slight covariance among themselves 2 ... |

93 | Flexible parsimonious smoothing and additive modeling (with discussion - Friedman, Silverman - 1989 |

70 |
Approximation dans les espaces métriques et théorie de l’estimation
- Birgé
- 1983
(Show Context)
Citation Context ...he sense that k + k k k +kk ; 1-convex is just convex. Results for this model will imply Theorem 4 by suitable identi cations. Thus we will ultimately interpret 15[1] ( I) aswavelet coe cients of f; =-=[2]-=- ( ^ I) as empirical wavelet coe cients of an estimate ^ fn; and [3] k ^ k as a norm equivalent tok ^ f fk. We will explain such identi cations further in section 5.4 below. 5.2 Solution of an Optimal... |

66 |
De-noising via soft-thresholding
- Donoho
- 1995
(Show Context)
Citation Context ...C(R; D) is the scale of all spaces Bp;q and all spaces Fp;q which embed continuously in C[0; 1], so that >1=p, and for which the wavelet basis is an unconditional basis, so that <min(R; D). Theorem 1 =-=[17]-=-. There areuniversal constants ( n) with n ! 1 as n =2j1!1, and constants C1(F; ) depending on the function space F[0; 1] 2C(R; D) and on the wavelet basis, but not on n or f, so that In words, ^ f n ... |

55 | Variable kernel density estimation - Terrell, Scott - 1992 |

53 | Optimal filtering of square integrable signals in Gaussian white noise - Pinsker - 1980 |

51 |
On nonparametric estimation of the value of a linear functional in Gaussian white noise. Theory Probab
- Ibragimov, Hasminskii
- 1984
(Show Context)
Citation Context ... point f(t0) with squarederror loss, with ellipsoidal (L 2 -smoothness) class F(C), the penalized spline estimate is minimax among linear estimates. Actually, it is nearly minimax among all estimates =-=[36, 25, 24, 14]-=-. ? Sacks and Ylvisaker (1981) showed that for estimating a function at a point, with squared-error loss and a quasi Holder class F(C), the linear minimax estimate is akernel estimate with specially c... |

43 | Fast wavelet techniques for near optimal image processing
- Devore, Lucier
- 1992
(Show Context)
Citation Context ...courages us to believe that our theoretical results describe phenomena observable in the real-world. Of these e orts, the closest to the present one in point of view is the work of De Vore and Lucier =-=[12]-=-, who have announced results for estimation in Besov spaces paralleling our own. Obtained from an approximation theoretic point of view, the parallel is perhaps to be expected, because of the well-kno... |

41 |
Minimax risk over hyperrectangles, and implications
- DONOHO, LIU, et al.
- 1990
(Show Context)
Citation Context ... point f(t0) with squarederror loss, with ellipsoidal (L 2 -smoothness) class F(C), the penalized spline estimate is minimax among linear estimates. Actually, it is nearly minimax among all estimates =-=[36, 25, 24, 14]-=-. ? Sacks and Ylvisaker (1981) showed that for estimating a function at a point, with squared-error loss and a quasi Holder class F(C), the linear minimax estimate is akernel estimate with specially c... |

41 | Learning algorithm for nonparametric filtering. Automat - Pinsker - 1984 |

40 |
A survey of optimal recovery
- Micchelli, Rivlin
- 1977
(Show Context)
Citation Context ...h identi cations further in section 5.4 below. 5.2 Solution of an Optimal Recovery Model Before tackling data from (22), we consider a simpler abstract model, in which noise is deterministic (Compare =-=[47, 48, 61]-=-). The approach of analyzing statistical problems by deterministic noise has been applied previously in [14, 15]. Suppose we havean index set I (not necessarily nite), an object ( I) ofinterest, and o... |

37 | Asymptotic minimax risk for sup-norm loss: solution via optimal recovery. Probability Theory and Related Fields 99 - Donoho - 1994 |

33 | Variable bandwidth kernel estimators of regression curves - Muller, Stadtmuller - 1987 |

27 | Estimation of square-integrable probability density of a random variable - Efromovich, Pinsker - 1982 |

27 | Spline smoothing and optimal rates of convergence in nonparametric regression - Speckman - 1985 |

25 |
One-sided inference about functionals of a density
- Donoho
- 1988
(Show Context)
Citation Context ...hat certain smoothness conditions hold; yet we never know such smoothness to be the case. (There are even results showing that it is impossible to tell whether or not a function belongs to some W p m =-=[13]-=-). There is therefore a disconnect between the suppositions of the Minimax Paradigm and the actual situation when one is confronted with real data. This makes the applicability of the results a priori... |

22 | A completely automatic french curve - Wahba, Wold - 1975 |

21 |
Estimation d’une densite de probabilite par methode d’ondelettes
- Johnstone, Kerkyacharian, et al.
- 1992
(Show Context)
Citation Context ...existing statistical literature. 3.1 The Method For simplicity, we focus on the nonparametric regression model (3) and a proposal of [22]; similar results are possible in the density estimation model =-=[39]-=-. We suppose that we have n=2 J+1 data of the form (3) and that is known. 1. Take the n given numbers and apply an empirical wavelet transform W n n , obtaining n empirical wavelet coe cients (wj;k). ... |

21 | On nonparametric estimation of smooth regression functions - Nemirovskii - 1985 |

19 |
Bounds for the risks of nonparametric regression estimates. Theory of Probability and its Applications 27
- Ibragimov, Hasminskii
- 1982
(Show Context)
Citation Context ...ed by decision theorists systematically over the last 15 years { namely the embedding of an appropriate hypercube in the class and using elementary decision-theoretic arguments on hypercubes. Compare =-=[56, 6, 35, 59]-=-. Theorem 9 Let k k come from the Besov scale, with parameter ( 0 ;p 0 ;q 0 ). Let be a Besov body p;q (C). Then with a c = c( ;p;q; 0 ;p 0 ;q 0 ) inf sup P fk ^ k c ( )g !1: (48) ^ ^ Moreover, when p... |

19 | Rates of Convergence of Nonparametric Estimates of Maximum Likelihood Type, Problems of Information Transmission - Nemirovskii, Polyak, et al. - 1985 |

18 | The Grenander estimator: a nonasymptotic approach - BIRGE, L - 1989 |

18 | Adaptive asymptotically minimax estimates of smooth signals - Golubev - 1987 |

15 | Density estimation - Kerkyacharian, Picard - 1992 |

15 | On problems of adaptive estimation in white Gaussian noise - Lepskii - 1992 |

14 | A new approach to least squares estimation, with applications - Geer - 1987 |

14 | Some problems on nonparametric estimation in Gaussian white noise. Theory Probab - IBRAGIMOV, NEMIROVSKII, et al. - 1986 |

13 |
New Minimax Theorems, Thresholding, and Adaptation
- Donoho, Johnstone
- 1992
(Show Context)
Citation Context ...w that the correct Holder class is one of two speci c classes. Hence for 0 < 0< 1<1and 0 <C0;C1<1, inf ^fn max i=0;1 C 2(ri 1) i n ri 2ri sup E( (C) ^ fn(t0) f(t0)) 2 const log(n) r0 : (17) Theorem 3 =-=[23]-=-. Suppose we use a wavelet transform with min(R; D) > 1. For each Holder class (C) with 0 < <min(R; D), we have sup E( (C) ^ fn(t0) f(t0)) 2 log(n) r B( ) (C 2 1 r ) ( 2 n )r (1 + o(1)); n !1: (18) He... |

12 |
Minimax estimation of a normal mean subject to doing well at a point
- Bickel
- 1983
(Show Context)
Citation Context ...(t)j p dt C p g: (6) Second, one assumes a speci c risk measure. Standard examples include risk at a point: global squared L 2 norm risk: Rn( ^ f;f)=E( ^ f(t0) f(t0)) 2 ; (7) Rn( ^ f;f)=Ek^ f fk 2 L2 =-=[0;1]-=-; (8) and other measures, such as risk in estimating some derivative atapoint, or estimating the function with global L p -loss, or estimating some derivative of the function with global L p loss. Thi... |

9 | Estimation of square-integrable [spectral] density based on a sequence of observations. Problems of Information Transmission - Efroimovich, Pinsker - 1982 |

9 | Estimation of linear functionals in Gaussian noise. Theory Probab - Ibragimov, Khasminski - 1987 |

9 | Asymptotically optimum kernels for density estimation at a point. The Annals of Statistics 9 - Sacks, Ylvisaker - 1981 |

8 | Minimax risk over ` p -balls - Donoho, Johnstone - 1990 |

7 | Superefficiency and lack of adaptability in functional estimation - Brown, Low - 1992 |

7 |
Ondelettes
- Meyer
- 1990
(Show Context)
Citation Context ... Suppose f obeys a Holder smoothness condition f 2 (C), where, if is not an integer, (C) =ff :jf (m) (s) f (m) (t)j Cjs tjg; (14) with m = d e 1 and = m. (If is an integer, we use Zygmund's de nition =-=[46]-=-). Suppose, however, that we are not sure of and C. Ifwedid know and C, then we could construct a linear minimax estimator ^ f ( ;C) n = P i ciyi where the (ci) are the solution of a quadratic program... |

7 | Minimax Estimates of Linear Functionals in a Hilbert Space, Unpublished Manuscript - Speckman - 1980 |