Results 1  10
of
37
Ideal spatial adaptation by wavelet shrinkage
 Biometrika
, 1994
"... With ideal spatial adaptation, an oracle furnishes information about how best to adapt a spatially variable estimator, whether piecewise constant, piecewise polynomial, variable knot spline, or variable bandwidth kernel, to the unknown function. Estimation with the aid of an oracle o ers dramatic ad ..."
Abstract

Cited by 1071 (4 self)
 Add to MetaCart
With ideal spatial adaptation, an oracle furnishes information about how best to adapt a spatially variable estimator, whether piecewise constant, piecewise polynomial, variable knot spline, or variable bandwidth kernel, to the unknown function. Estimation with the aid of an oracle o ers dramatic advantages over traditional linear estimation by nonadaptive kernels � however, it is a priori unclear whether such performance can be obtained by a procedure relying on the data alone. We describe a new principle for spatiallyadaptive estimation: selective wavelet reconstruction. Weshowthatvariableknot spline ts and piecewisepolynomial ts, when equipped with an oracle to select the knots, are not dramatically more powerful than selective wavelet reconstruction with an oracle. We develop a practical spatially adaptive method, RiskShrink, which works by shrinkage of empirical wavelet coe cients. RiskShrink mimics the performance of an oracle for selective wavelet reconstruction as well as it is possible to do so. A new inequality inmultivariate normal decision theory which wecallthe oracle inequality shows that attained performance di ers from ideal performance by at most a factor 2logn, where n is the sample size. Moreover no estimator can give a better guarantee than this. Within the class of spatially adaptive procedures, RiskShrink is essentially optimal. Relying only on the data, it comes within a factor log 2 n of the performance of piecewise polynomial and variableknot spline methods equipped with an oracle. In contrast, it is unknown how or if piecewise polynomial methods could be made to function this well when denied access to an oracle and forced to rely on data alone.
Minimax Estimation via Wavelet Shrinkage
, 1992
"... We attempt to recover an unknown function from noisy, sampled data. Using orthonormal bases of compactly supported wavelets we develop a nonlinear method which works in the wavelet domain by simple nonlinear shrinkage of the empirical wavelet coe cients. The shrinkage can be tuned to be nearly minim ..."
Abstract

Cited by 289 (32 self)
 Add to MetaCart
We attempt to recover an unknown function from noisy, sampled data. Using orthonormal bases of compactly supported wavelets we develop a nonlinear method which works in the wavelet domain by simple nonlinear shrinkage of the empirical wavelet coe cients. The shrinkage can be tuned to be nearly minimax over any member of a wide range of Triebel and Besovtype smoothness constraints, and asymptotically minimax over Besov bodies with p q. Linear estimates cannot achieve even the minimax rates over Triebel and Besov classes with p <2, so our method can signi cantly outperform every linear method (kernel, smoothing spline, sieve,:::) in a minimax sense. Variants of our method based on simple threshold nonlinearities are nearly minimax. Our method possesses the interpretation of spatial adaptivity: it reconstructs using a kernel which mayvary in shape and bandwidth from point to point, depending on the data. Least favorable distributions for certain of the Triebel and Besov scales generate objects with sparse wavelet transforms. Many real objects have similarly sparse transforms, which suggests that these minimax results are relevant for practical problems. Sequels to this paper discuss practical implementation, spatial adaptation properties and applications to inverse problems.
Wavelet shrinkage: asymptopia
 Journal of the Royal Statistical Society, Ser. B
, 1995
"... Considerable e ort has been directed recently to develop asymptotically minimax methods in problems of recovering in nitedimensional objects (curves, densities, spectral densities, images) from noisy data. A rich and complex body of work has evolved, with nearly or exactly minimax estimators bein ..."
Abstract

Cited by 273 (35 self)
 Add to MetaCart
Considerable e ort has been directed recently to develop asymptotically minimax methods in problems of recovering in nitedimensional objects (curves, densities, spectral densities, images) from noisy data. A rich and complex body of work has evolved, with nearly or exactly minimax estimators being obtained for a variety of interesting problems. Unfortunately, the results have often not been translated into practice, for a variety of reasons { sometimes, similarity to known methods, sometimes, computational intractability, and sometimes, lack of spatial adaptivity. We discuss a method for curve estimation based on n noisy data; one translates the empirical wavelet coe cients towards the origin by an amount p p 2 log(n) = n. The method is di erent from methods in common use today, is computationally practical, and is spatially adaptive; thus it avoids a number of previous objections to minimax estimators. At the same time, the method is nearly minimax for a wide variety of loss functions { e.g. pointwise error, global error measured in L p norms, pointwise and global error in estimation of derivatives { and for a wide range of smoothness classes, including standard Holder classes, Sobolev classes, and Bounded Variation. This is amuch broader nearoptimality than anything previously proposed in the minimax literature. Finally, the theory underlying the method is interesting, as it exploits a correspondence between statistical questions and questions of optimal recovery and informationbased complexity.
Datadriven bandwidth selection in local polynomial fitting: variable bandwidth and spatial
 B
, 1995
"... ..."
Optimal pointwise adaptive methods in nonparametric estimation
 ANN. STATIST
, 1997
"... The problem of optimal adaptive estimation of a function at a given point from noisy data is considered. Two procedures are proved to be asymptotically optimal for different settings. First we study the problem of bandwidth selection for nonparametric pointwise kernel estimation with a given kernel. ..."
Abstract

Cited by 56 (9 self)
 Add to MetaCart
The problem of optimal adaptive estimation of a function at a given point from noisy data is considered. Two procedures are proved to be asymptotically optimal for different settings. First we study the problem of bandwidth selection for nonparametric pointwise kernel estimation with a given kernel. We propose a bandwidth selection procedure and prove its optimality in the asymptotic sense. Moreover, this optimality is stated not only among kernel estimators with a variable bandwidth. The resulting estimator is asymptotically optimal among all feasible estimators. The important feature of this procedure is that it is fully adaptive and it “works” for a very wide class of functions obeying a mild regularity restriction. With it the attainable accuracy of estimation depends on the function itself and is expressed in terms of the “ideal adaptive bandwidth” corresponding to this function and a given kernel. The second procedure can be considered as a specialization of the first one under the qualitative assumption that the function to be estimated belongs to some Hölder class ��β � L � with unknown parameters β � L. This assumption allows us to choose a family of kernels in an optimal way and the resulting procedure appears to be asymptotically optimal in the adaptive sense in any range of adaptation with β ≤ 2.
Uniform in bandwidth consistency of kerneltype function estimators
 Ann. Stat
, 2005
"... We introduce a general method to prove uniform in bandwidth consistency of kerneltype function estimators. Examples include the kernel density estimator, the Nadaraya–Watson regression estimator and the conditional empirical process. Our results may be useful to establish uniform consistency of dat ..."
Abstract

Cited by 48 (5 self)
 Add to MetaCart
(Show Context)
We introduce a general method to prove uniform in bandwidth consistency of kerneltype function estimators. Examples include the kernel density estimator, the Nadaraya–Watson regression estimator and the conditional empirical process. Our results may be useful to establish uniform consistency of datadriven bandwidth kerneltype function estimators. 1. Introduction and statements of main results. Let X,X1,X2,... be i.i.d. Rd, d ≥ 1, valued random variables and assume that the common distribution function of these variables has a Lebesgue density function, which we shall denote by f. A kernel K will be any measurable function which
Nonlinear BlackBox Models in System Identification: Mathematical Foundations
, 1995
"... In this paper we discuss several aspects of the mathematical foundations of nonlinear blackbox identification problem. As we shall see that the quality of the identification procedure is always a result of a certain tradeoff between the expressive power of the model we try to identify (the larger ..."
Abstract

Cited by 42 (6 self)
 Add to MetaCart
In this paper we discuss several aspects of the mathematical foundations of nonlinear blackbox identification problem. As we shall see that the quality of the identification procedure is always a result of a certain tradeoff between the expressive power of the model we try to identify (the larger is the number of parameters used to describe the model, more flexible would be the approximation), and the stochastic error (which is proportional to the number of parameters). A consequence of this tradeoff is a simple fact that good approximation technique can be a basis of good identification algorithm. From this point of view we consider different approximation methods, and pay special attention to spatially adaptive approximants. We introduce wavelet and "neuron" approximations and show that they are spatially adaptive. Then we apply the acquired approximation experience to estimation problems. Finally, we consider some implications of these theoretic developments for the practically...
Local Maximum Likelihood Estimation and Inference
 J. Royal Statist. Soc. B
, 1998
"... Local maximum likelihood estimation is a nonparametric counterpart of the widelyused parametric maximum likelihood technique. It extends the scope of the parametric maximum likelihood method to a much wider class of parametric spaces. Associated with this nonparametric estimation scheme is the issu ..."
Abstract

Cited by 29 (4 self)
 Add to MetaCart
Local maximum likelihood estimation is a nonparametric counterpart of the widelyused parametric maximum likelihood technique. It extends the scope of the parametric maximum likelihood method to a much wider class of parametric spaces. Associated with this nonparametric estimation scheme is the issue of bandwidth selection and bias and variance assessment. This article provides a unified approach to selecting a bandwidth and constructing con dence intervals in local maximum likelihood estimation. The approach is then applied to leastsquares nonparametric regression and to nonparametric logistic regression. Our experiences in these two settings show that the general idea outlined here is powerful and encouraging.
Bandwidth Selection and the Estimation of Treatment Effects with NonExperimental Data. Working paper
, 2005
"... positions. ..."
Interpolation Models with Multiple Hyperparameters
, 1997
"... A traditional interpolation model is characterized by the choice of rcg ularizcr applied to the intcrpolant, and the choice of noise model. Typi cally, the rcgularizcr has a single rcgularization constant , and the noise model has a single parameter . The ratio / alone is responsible for de t ..."
Abstract

Cited by 13 (2 self)
 Add to MetaCart
A traditional interpolation model is characterized by the choice of rcg ularizcr applied to the intcrpolant, and the choice of noise model. Typi cally, the rcgularizcr has a single rcgularization constant , and the noise model has a single parameter . The ratio / alone is responsible for de termining globally all these attributes of the intcrpolant: its 'complexity', 'flexibility', 'smoothness', 'characteristic scale length', and 'characteristic amplitude'. We suggest that interpolation models should be able to cap turc more than just one flavour of simplicity and complexity. Wc describe Bayesian models in which the intcrpolant has a smoothness that varies spatially. We emphasize the importance, in practical implementation, of the concept of 'conditional convexity' when designing models with many hyperparameters.