Results 1  10
of
209
DeNoising By SoftThresholding
, 1992
"... Donoho and Johnstone (1992a) proposed a method for reconstructing an unknown function f on [0; 1] from noisy data di = f(ti)+ zi, iid i =0;:::;n 1, ti = i=n, zi N(0; 1). The reconstruction fn ^ is de ned in the wavelet domain by translating all the empirical wavelet coe cients of d towards 0 by an a ..."
Abstract

Cited by 798 (13 self)
 Add to MetaCart
Donoho and Johnstone (1992a) proposed a method for reconstructing an unknown function f on [0; 1] from noisy data di = f(ti)+ zi, iid i =0;:::;n 1, ti = i=n, zi N(0; 1). The reconstruction fn ^ is de ned in the wavelet domain by translating all the empirical wavelet coe cients of d towards 0 by an amount p 2 log(n) = p n. We prove two results about that estimator. [Smooth]: With high probability ^ fn is at least as smooth as f, in any of a wide variety of smoothness measures. [Adapt]: The estimator comes nearly as close in mean square to f as any measurable estimator can come, uniformly over balls in each of two broad scales of smoothness classes. These two properties are unprecedented in several ways. Our proof of these results develops new facts about abstract statistical inference and its connection with an optimal recovery model.
Adapting to unknown smoothness via wavelet shrinkage
 JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
, 1995
"... We attempt to recover a function of unknown smoothness from noisy, sampled data. We introduce a procedure, SureShrink, which suppresses noise by thresholding the empirical wavelet coefficients. The thresholding is adaptive: a threshold level is assigned to each dyadic resolution level by the princip ..."
Abstract

Cited by 675 (19 self)
 Add to MetaCart
We attempt to recover a function of unknown smoothness from noisy, sampled data. We introduce a procedure, SureShrink, which suppresses noise by thresholding the empirical wavelet coefficients. The thresholding is adaptive: a threshold level is assigned to each dyadic resolution level by the principle of minimizing the Stein Unbiased Estimate of Risk (Sure) for threshold estimates. The computational effort of the overall procedure is order N log(N) as a function of the sample size N. SureShrink is smoothnessadaptive: if the unknown function contains jumps, the reconstruction (essentially) does also; if the unknown function has a smooth piece, the reconstruction is (essentially) as smooth as the mother wavelet will allow. The procedure is in a sense optimally smoothnessadaptive: it is nearminimax simultaneously over a whole interval of the Besov scale; the size of this interval depends on the choice of mother wavelet. We know from a previous paper by the authors that traditional smoothing methods  kernels, splines, and orthogonal series estimates  even with optimal choices of the smoothing parameter, would be unable to perform
Locally weighted learning
 ARTIFICIAL INTELLIGENCE REVIEW
, 1997
"... This paper surveys locally weighted learning, a form of lazy learning and memorybased learning, and focuses on locally weighted linear regression. The survey discusses distance functions, smoothing parameters, weighting functions, local model structures, regularization of the estimates and bias, ass ..."
Abstract

Cited by 448 (52 self)
 Add to MetaCart
This paper surveys locally weighted learning, a form of lazy learning and memorybased learning, and focuses on locally weighted linear regression. The survey discusses distance functions, smoothing parameters, weighting functions, local model structures, regularization of the estimates and bias, assessing predictions, handling noisy data and outliers, improving the quality of predictions by tuning t parameters, interference between old and new data, implementing locally weighted learning e ciently, and applications of locally weighted learning. A companion paper surveys how locally weighted learning can be used in robot learning and control.
Minimax Estimation via Wavelet Shrinkage
, 1992
"... We attempt to recover an unknown function from noisy, sampled data. Using orthonormal bases of compactly supported wavelets we develop a nonlinear method which works in the wavelet domain by simple nonlinear shrinkage of the empirical wavelet coe cients. The shrinkage can be tuned to be nearly minim ..."
Abstract

Cited by 246 (32 self)
 Add to MetaCart
We attempt to recover an unknown function from noisy, sampled data. Using orthonormal bases of compactly supported wavelets we develop a nonlinear method which works in the wavelet domain by simple nonlinear shrinkage of the empirical wavelet coe cients. The shrinkage can be tuned to be nearly minimax over any member of a wide range of Triebel and Besovtype smoothness constraints, and asymptotically minimax over Besov bodies with p q. Linear estimates cannot achieve even the minimax rates over Triebel and Besov classes with p <2, so our method can signi cantly outperform every linear method (kernel, smoothing spline, sieve,:::) in a minimax sense. Variants of our method based on simple threshold nonlinearities are nearly minimax. Our method possesses the interpretation of spatial adaptivity: it reconstructs using a kernel which mayvary in shape and bandwidth from point to point, depending on the data. Least favorable distributions for certain of the Triebel and Besov scales generate objects with sparse wavelet transforms. Many real objects have similarly sparse transforms, which suggests that these minimax results are relevant for practical problems. Sequels to this paper discuss practical implementation, spatial adaptation properties and applications to inverse problems.
Wavelet shrinkage: asymptopia
 Journal of the Royal Statistical Society, Ser. B
, 1995
"... Considerable e ort has been directed recently to develop asymptotically minimax methods in problems of recovering in nitedimensional objects (curves, densities, spectral densities, images) from noisy data. A rich and complex body of work has evolved, with nearly or exactly minimax estimators bein ..."
Abstract

Cited by 239 (35 self)
 Add to MetaCart
Considerable e ort has been directed recently to develop asymptotically minimax methods in problems of recovering in nitedimensional objects (curves, densities, spectral densities, images) from noisy data. A rich and complex body of work has evolved, with nearly or exactly minimax estimators being obtained for a variety of interesting problems. Unfortunately, the results have often not been translated into practice, for a variety of reasons { sometimes, similarity to known methods, sometimes, computational intractability, and sometimes, lack of spatial adaptivity. We discuss a method for curve estimation based on n noisy data; one translates the empirical wavelet coe cients towards the origin by an amount p p 2 log(n) = n. The method is di erent from methods in common use today, is computationally practical, and is spatially adaptive; thus it avoids a number of previous objections to minimax estimators. At the same time, the method is nearly minimax for a wide variety of loss functions { e.g. pointwise error, global error measured in L p norms, pointwise and global error in estimation of derivatives { and for a wide range of smoothness classes, including standard Holder classes, Sobolev classes, and Bounded Variation. This is amuch broader nearoptimality than anything previously proposed in the minimax literature. Finally, the theory underlying the method is interesting, as it exploits a correspondence between statistical questions and questions of optimal recovery and informationbased complexity.
A Theory of Networks for Approximation and Learning
 Laboratory, Massachusetts Institute of Technology
, 1989
"... Learning an inputoutput mapping from a set of examples, of the type that many neural networks have been constructed to perform, can be regarded as synthesizing an approximation of a multidimensional function, that is solving the problem of hypersurface reconstruction. From this point of view, t ..."
Abstract

Cited by 194 (24 self)
 Add to MetaCart
Learning an inputoutput mapping from a set of examples, of the type that many neural networks have been constructed to perform, can be regarded as synthesizing an approximation of a multidimensional function, that is solving the problem of hypersurface reconstruction. From this point of view, this form of learning is closely related to classical approximation techniques, such as generalized splines and regularization theory. This paper considers the problems of an exact representation and, in more detail, of the approximation of linear and nonlinear mappings in terms of simpler functions of fewer variables. Kolmogorov's theorem concerning the representation of functions of several variables in terms of functions of one variable turns out to be almost irrelevant in the context of networks for learning. Wedevelop a theoretical framework for approximation based on regularization techniques that leads to a class of threelayer networks that we call Generalized Radial Basis Functions (GRBF), since they are mathematically related to the wellknown Radial Basis Functions, mainly used for strict interpolation tasks. GRBF networks are not only equivalent to generalized splines, but are also closely related to pattern recognition methods suchasParzen windows and potential functions and to several neural network algorithms, suchas Kanerva's associative memory,backpropagation and Kohonen's topology preserving map. They also haveaninteresting interpretation in terms of prototypes that are synthesized and optimally combined during the learning stage. The paper introduces several extensions and applications of the technique and discusses intriguing analogies with neurobiological data.
Nonlinear BlackBox Modeling in System Identification: a Unified Overview
 Automatica
, 1995
"... A nonlinear black box structure for a dynamical system is a model structure that is prepared to describe virtually any nonlinear dynamics. There has been considerable recent interest in this area with structures based on neural networks, radial basis networks, wavelet networks, hinging hyperplanes, ..."
Abstract

Cited by 136 (15 self)
 Add to MetaCart
A nonlinear black box structure for a dynamical system is a model structure that is prepared to describe virtually any nonlinear dynamics. There has been considerable recent interest in this area with structures based on neural networks, radial basis networks, wavelet networks, hinging hyperplanes, as well as wavelet transform based methods and models based on fuzzy sets and fuzzy rules. This paper describes all these approaches in a common framework, from a user's perspective. It focuses on what are the common features in the different approaches, the choices that have to be made and what considerations are relevant for a successful system identification application of these techniques. It is pointed out that the nonlinear structures can be seen as a concatenation of a mapping from observed data to a regression vector and a nonlinear mapping from the regressor space to the output space. These mappings are discussed separately. The latter mapping is usually formed as a basis function e...
Local Regression: Automatic Kernel Carpentry
 Statistical Science
, 1993
"... . A kernel smoother is an intuitive estimate of a regression function or conditional expectation; at each point x 0 the estimate of E(Y j x 0 ) is a weighted mean of the sample Y i , with observations close to x 0 receiving the largest weights. Unfortunately this simplicity has flaws. At the boundar ..."
Abstract

Cited by 107 (2 self)
 Add to MetaCart
. A kernel smoother is an intuitive estimate of a regression function or conditional expectation; at each point x 0 the estimate of E(Y j x 0 ) is a weighted mean of the sample Y i , with observations close to x 0 receiving the largest weights. Unfortunately this simplicity has flaws. At the boundary of the predictor space, the kernel neighborhood is asymmetric and the estimate may have substantial bias. Bias can be a problem in the interior as well if the predictors are nonuniform or if the regression function has substantial curvature. These problems are particularly severe when the predictors are multidimensional. A variety of kernel modifications have been proposed to provide approximate and asymptotic adjustment for these biases. Such methods generally place substantial restrictions on the regression problems that can be considered; in unfavorable situations, they can perform very poorly. Moreover, the necessary modifications are very difficult to implement in the multidimensional...
KernelBased Reinforcement Learning
 Machine Learning
, 1999
"... We present a kernelbased approach to reinforcement learning that overcomes the stability problems of temporaldifference learning in continuous statespaces. First, our algorithm converges to a unique solution of an approximate Bellman's equation regardless of its initialization values. Second, the ..."
Abstract

Cited by 102 (1 self)
 Add to MetaCart
We present a kernelbased approach to reinforcement learning that overcomes the stability problems of temporaldifference learning in continuous statespaces. First, our algorithm converges to a unique solution of an approximate Bellman's equation regardless of its initialization values. Second, the method is consistent in the sense that the resulting policy converges asymptotically to the optimal policy. Parametric value function estimates such as neural networks do not possess this property. Our kernelbased approach also allows us to show that the limiting distribution of the value function estimate is a Gaussian process. This information is useful in studying the biasvariance tradeo in reinforcement learning. We find that all reinforcement learning approaches to estimating the value function, parametric or nonparametric, are subject to a bias. This bias is typically larger in reinforcement learning than in a comparable regression problem.
InformationTheoretic Determination of Minimax Rates of Convergence
 Ann. Stat
, 1997
"... In this paper, we present some general results determining minimax bounds on statistical risk for density estimation based on certain informationtheoretic considerations. These bounds depend only on metric entropy conditions and are used to identify the minimax rates of convergence. ..."
Abstract

Cited by 98 (18 self)
 Add to MetaCart
In this paper, we present some general results determining minimax bounds on statistical risk for density estimation based on certain informationtheoretic considerations. These bounds depend only on metric entropy conditions and are used to identify the minimax rates of convergence.