Results 1  10
of
23
DeNoising By SoftThresholding
, 1992
"... Donoho and Johnstone (1992a) proposed a method for reconstructing an unknown function f on [0; 1] from noisy data di = f(ti)+ zi, iid i =0;:::;n 1, ti = i=n, zi N(0; 1). The reconstruction fn ^ is de ned in the wavelet domain by translating all the empirical wavelet coe cients of d towards 0 by an a ..."
Abstract

Cited by 1249 (14 self)
 Add to MetaCart
Donoho and Johnstone (1992a) proposed a method for reconstructing an unknown function f on [0; 1] from noisy data di = f(ti)+ zi, iid i =0;:::;n 1, ti = i=n, zi N(0; 1). The reconstruction fn ^ is de ned in the wavelet domain by translating all the empirical wavelet coe cients of d towards 0 by an amount p 2 log(n) = p n. We prove two results about that estimator. [Smooth]: With high probability ^ fn is at least as smooth as f, in any of a wide variety of smoothness measures. [Adapt]: The estimator comes nearly as close in mean square to f as any measurable estimator can come, uniformly over balls in each of two broad scales of smoothness classes. These two properties are unprecedented in several ways. Our proof of these results develops new facts about abstract statistical inference and its connection with an optimal recovery model.
Adapting to unknown smoothness via wavelet shrinkage
 JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
, 1995
"... We attempt to recover a function of unknown smoothness from noisy, sampled data. We introduce a procedure, SureShrink, which suppresses noise by thresholding the empirical wavelet coefficients. The thresholding is adaptive: a threshold level is assigned to each dyadic resolution level by the princip ..."
Abstract

Cited by 990 (20 self)
 Add to MetaCart
We attempt to recover a function of unknown smoothness from noisy, sampled data. We introduce a procedure, SureShrink, which suppresses noise by thresholding the empirical wavelet coefficients. The thresholding is adaptive: a threshold level is assigned to each dyadic resolution level by the principle of minimizing the Stein Unbiased Estimate of Risk (Sure) for threshold estimates. The computational effort of the overall procedure is order N log(N) as a function of the sample size N. SureShrink is smoothnessadaptive: if the unknown function contains jumps, the reconstruction (essentially) does also; if the unknown function has a smooth piece, the reconstruction is (essentially) as smooth as the mother wavelet will allow. The procedure is in a sense optimally smoothnessadaptive: it is nearminimax simultaneously over a whole interval of the Besov scale; the size of this interval depends on the choice of mother wavelet. We know from a previous paper by the authors that traditional smoothing methods  kernels, splines, and orthogonal series estimates  even with optimal choices of the smoothing parameter, would be unable to perform
Wavelet shrinkage using crossvalidation
, 1996
"... Wavelets are orthonormal basis functions with special properties that show potential in many areas of mathematics and statistics. This article concentrates on the estimation of functions and images from noisy data using wavelet shrinkage. A modified form of twofold crossvalidation is introduced to ..."
Abstract

Cited by 97 (14 self)
 Add to MetaCart
Wavelets are orthonormal basis functions with special properties that show potential in many areas of mathematics and statistics. This article concentrates on the estimation of functions and images from noisy data using wavelet shrinkage. A modified form of twofold crossvalidation is introduced to choose a threshold for wavelet shrinkage estimators operating on data sets of length a power of two. The crossvalidation algorithm is then extended to data sets of any length and to multidimensional data sets. The algorithms are compared to established threshold choosers using simulation. An application to a real data set arising from anaesthesia is presented.
Optimal pointwise adaptive methods in nonparametric estimation
 ANN. STATIST
, 1997
"... The problem of optimal adaptive estimation of a function at a given point from noisy data is considered. Two procedures are proved to be asymptotically optimal for different settings. First we study the problem of bandwidth selection for nonparametric pointwise kernel estimation with a given kernel. ..."
Abstract

Cited by 67 (11 self)
 Add to MetaCart
The problem of optimal adaptive estimation of a function at a given point from noisy data is considered. Two procedures are proved to be asymptotically optimal for different settings. First we study the problem of bandwidth selection for nonparametric pointwise kernel estimation with a given kernel. We propose a bandwidth selection procedure and prove its optimality in the asymptotic sense. Moreover, this optimality is stated not only among kernel estimators with a variable bandwidth. The resulting estimator is asymptotically optimal among all feasible estimators. The important feature of this procedure is that it is fully adaptive and it “works” for a very wide class of functions obeying a mild regularity restriction. With it the attainable accuracy of estimation depends on the function itself and is expressed in terms of the “ideal adaptive bandwidth” corresponding to this function and a given kernel. The second procedure can be considered as a specialization of the first one under the qualitative assumption that the function to be estimated belongs to some Hölder class ��β � L � with unknown parameters β � L. This assumption allows us to choose a family of kernels in an optimal way and the resulting procedure appears to be asymptotically optimal in the adaptive sense in any range of adaptation with β ≤ 2.
Universal smoothing factor selection in density estimation: theory and practice (with discussion
 Test
, 1997
"... In earlier work with Gabor Lugosi, we introduced a method to select a smoothing factor for kernel density estimation such that, for all densities in all dimensions, the L1 error of the corresponding kernel estimate is not larger than 3+e times the error of the estimate with the optimal smoothing fac ..."
Abstract

Cited by 32 (11 self)
 Add to MetaCart
In earlier work with Gabor Lugosi, we introduced a method to select a smoothing factor for kernel density estimation such that, for all densities in all dimensions, the L1 error of the corresponding kernel estimate is not larger than 3+e times the error of the estimate with the optimal smoothing factor plus a constant times Ov~~n/n, where n is the sample size, and the constant only depends on the complexity of the kernel used in the estimate. The result is nonasymptotic, that is, the bound is valid for each n. The estimate uses ideas from the minimum distance estimation work of Yatracos. We present a practical implementation of this estimate, report on some comparative results, and highlight some key properties of the new method.
Progressive Modeling
, 2002
"... Presently, inductive learning is still performed in a frustrating batch process. The user has little interaction with the system and no control over the final accuracy and training time. If the accuracy of the produced model is too low, all the computing resources are misspent. In this paper, we pro ..."
Abstract

Cited by 27 (9 self)
 Add to MetaCart
Presently, inductive learning is still performed in a frustrating batch process. The user has little interaction with the system and no control over the final accuracy and training time. If the accuracy of the produced model is too low, all the computing resources are misspent. In this paper, we propose a progressive modeling framework. In progressive modeling, the learning algorithm estimates online both the accuracy of the final model and remaining training time. If the estimated accuracy is far below expectation, the user can terminate training prior to completion without wasting further resources. If the user chooses to complete the learning process, progressive modeling will compute a model with expected accuracy in expected time. We describe one implementation of progressive modeling using ensemble of classifiers.
Consistency of cross validation for comparing regression procedures. Annals of Statistics, Accepted paper
"... Theoretical developments on cross validation (CV) have mainly focused on selecting one among a list of finitedimensional models (e.g., subset or order selection in linear regression) or selecting a smoothing parameter (e.g., bandwidth for kernel smoothing). However, little is known about consistenc ..."
Abstract

Cited by 25 (3 self)
 Add to MetaCart
(Show Context)
Theoretical developments on cross validation (CV) have mainly focused on selecting one among a list of finitedimensional models (e.g., subset or order selection in linear regression) or selecting a smoothing parameter (e.g., bandwidth for kernel smoothing). However, little is known about consistency of cross validation when applied to compare between parametric and nonparametric methods or within nonparametric methods. We show that under some conditions, with an appropriate choice of data splitting ratio, cross validation is consistent in the sense of selecting the better procedure with probability approaching 1. Our results reveal interesting behavior of cross validation. When comparing two models (procedures) converging at the same nonparametric rate, in contrast to the parametric case, it turns out that the proportion of data used for evaluation in CV does not need to be dominating in size. Furthermore, it can even be of a smaller order than the proportion for estimation while not affecting the consistency property.
Model Indexing and Smoothing Parameter Selection in Nonparametric Function Estimation
"... Smoothing parameter selection is among the most intensively studied subjects in nonparametric function estimation. A closely related issue, that of identifying a proper index for the smoothing parameter, is however largely neglected in the existing literature. Through heuristic arguments and simple ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
Smoothing parameter selection is among the most intensively studied subjects in nonparametric function estimation. A closely related issue, that of identifying a proper index for the smoothing parameter, is however largely neglected in the existing literature. Through heuristic arguments and simple simulations, we shall illustrate that most current working indices are conceptually "incorrect", in the sense that they are not interpretable acrossreplicate in repeated experiments, and as a consequence, a few popular working concepts, such as the expected mean square error and the "degrees of freedom", appear vulnerable under close scrutiny. Due to technical constraints, the arguments are mainly developed in the penalized likelihood setting, but conceptual parallels can be drawn to other settings as well. In the light of our findings, simulations and discussion are also presented to compare the relative merits of the simple crossvalidation method versus the more sophisticated plugin met...
The Diverging
 Patterns of Profitability, Investment, and Growth of China and India, 1980–2003,” Australian National University Working Paper 22/2005
, 2005
"... Prediction error is critical to assessing the performance of statistical methods and selecting statistical models. We propose the crossvalidation and approximated crossvalidation methods for estimating prediction error under a broad qclass of Bregman divergence for error measures which embeds nea ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
(Show Context)
Prediction error is critical to assessing the performance of statistical methods and selecting statistical models. We propose the crossvalidation and approximated crossvalidation methods for estimating prediction error under a broad qclass of Bregman divergence for error measures which embeds nearly all of the commonly used loss functions in regression, classification procedures and machine learning literature. The approximated crossvalidation formulas are analytically derived, which facilitate fast estimation of prediction error under the Bregman divergence. We then study a datadriven optimal bandwidth selector for the locallikelihood estimation that minimizes the overall prediction error or equivalently the covariance penalty. It is shown that the covariance penalty and crossvalidation methods converge to the same meanpredictionerrorcriterion. We also propose a lowerbound scheme for computing the local logistic regression estimates and demonstrate that it is as simple and stable as the local leastsquares regression estimation. The algorithm monotonically enhances the target locallikelihood and converges. The idea and methods are extended to the generalized varyingcoefficient models and semiparametric models. Key words and Phrases: Crossvalidation; Exponential family; Generalized varyingcoefficient
Bayesian fluorescence in situ hybridisation signal classification
 Artificial Intelligence in Medicine
, 2004
"... ..."
(Show Context)