## Learning least squares estimators without assumed priors or supervision (2009)

### Cached

### Download Links

Citations: | 2 - 1 self |

### BibTeX

@MISC{Raphan09learningleast,

author = {Martin Raphan and Eero P. Simoncelli and Howard Hughes},

title = {Learning least squares estimators without assumed priors or supervision},

year = {2009}

}

### OpenURL

### Abstract

The two standard methods of obtaining a least-squares optimal estimator are (1) Bayesian estimation, in which one assumes a prior distribution on the true values and combines this with a model of the measurement process to obtain an optimal estimator, and (2) supervised regression, in which one optimizes a parametric estimator over a training set containing pairs of corrupted measurements and their associated true values. But many real-world systems do not have access to either supervised training examples or a prior model. Here, we study the problem of obtaining an optimal estimator given a measurement process with known statistics, and a set of corrupted measurements of random values drawn from an unknown prior. We develop a general form of nonparametric empirical Bayesian estimator that is written as a direct function of the measurement density, with no explicit reference to the prior. We study the observation conditions under which such “prior-free ” estimators may be obtained, and we derive specific forms for a variety of different corruption processes. Each of these prior-free estimators may also be used to express the mean squared estimation error as an expectation over the measurement density, thus generalizing Stein’s unbiased risk estimator (SURE) which provides such an expression for the additive Gaussian noise case. Minimizing this expression over measurement samples provides an “unsupervised

### Citations

2535 |
A new approach to linear filtering and prediction problems
- Kalman
- 1960
(Show Context)
Citation Context ...e ˆxn(yn). The diagram also includes an optional path for supervised data xn, which may be used to improve the estimator. This incremental formulation may seem similar to the well-known Kalman filter =-=[21]-=-, which provides an incremental estimate of a state variable that is observed through additive Gaussian measurements. But the resemblance is somewhat superficial: the Kalman filter is based on a state... |

2219 |
An Introduction to Probability Theory and Its Applications
- Feller
- 1967
(Show Context)
Citation Context ... ax I {ax<0} So that E (X|Y = y) = (sgn(a)eax I {ax<0}) ⋆ (yPY (y)) PY (y) A fourth special case is when X is a random positive value, W is an independent variable drawn from an α-stable distribution =-=[29]-=- with Fourier transform 1 ̂PW − (ω) =e α |ω|α , and Y = X 1 α W. Generally, if PW is an infinitely divisible distribution and X is an arbitrary positive real number, then the right side of Eq.(42) wil... |

1643 | Mean Shift: A Robust Approach Toward Feature Space Analysis
- Comaniciu, Meer
- 2002
(Show Context)
Citation Context ...e of negative one, resulting in essentially perfect recovery of the true value of x. Note that this optimal shrinkage is accomplished in a single step, unlike methods such as the mean-shift algorithm =-=[18]-=-, which uses iterative gradient ascent on the logarithm of a density to perform nonparametric clustering. 2 Technically, this is a form of nonparametric empirical Bayesian estimator [1]. We have intro... |

669 |
Multivariate density estimation: theory, practice, and visualization
- Scott
- 1992
(Show Context)
Citation Context ...the convolutional operator from observed samples {Yn}: K ⋆ PY (y) ≈ 1 N N∑ K(y − Yn). i=1 Note that this has the form of a kernel density estimator. While such density estimators are generally biased =-=[23]-=-, in our situation this approximation is unbiased and converges to the desired convolution K ⋆PY as the number of samples (N) increases, since ( N∑ ) ∫ 1 E K(y − Yn) = K(y − ˜y)PY (˜y) d˜y. N n=1 Of c... |

341 | Estimation of the mean of a multivariate normal distribution - Stein - 1981 |

160 |
Scale mixtures of normal distributions
- Andrews, Mallows
- 1974
(Show Context)
Citation Context ...h can be verified by direct calculation. The second example arises when X is a positive random variable and Y is a zero mean Gaussian with variance X, a case known as the Gaussian Scale Mixture (GSM) =-=[28]-=-. In this case Eq. (42) holds for 1 ̂PW − (ω) =e 2 ω2 . In this case, the operator will be which gives ̂m(ω) = −1 , (47) iω 1 −(H(y) − 2 E (X|Y = y) = ) ⋆ (yPY (y)) , (48) PY (y) where H is the Heavys... |

153 |
All of Statistics: A Concise Course in Statistical Inference
- Wasserman
- 2004
(Show Context)
Citation Context ...contexts for which squared estimation error is not relevant, and with parametric families that cannot have arisen from an additive Gaussian noise process. when the measurements are drawn from P (φ) Y =-=[20]-=-. 4 In fact, maximizing likelihood minimizes the Kullback-Leibler divergence between the true density and the parametric density 73 General formulation: Prior-free BLS estimator We now develop a gene... |

88 |
An empirical Bayes approach to statistics
- Robbins
- 1955
(Show Context)
Citation Context ...t, we develop a “prior-free” expression for the least squares estimator directly in terms of the density of noisy measurements, which generalizes several specialized examples from previous literature =-=[1, 2, 3]-=-. In addition to unifying these results, our framework allows us to provide a complete characterization of observaton models for which a prior-free estimator exists, and to obtain specific solutions f... |

80 | Updating the inverse of a matrix
- Hager
- 1989
(Show Context)
Citation Context ... Sec. 5 requires the inversion of a matrix, which can be expensive, depending on the number of parameters. As is common in the derivation of the Kalman filter, we can use the Woodbury matrix identity =-=[26]-=- to rewrite the incremental form directly in terms of the inverse matrix: C −1 n = ( anCn−1 + (1 − an)hnh T ) −1 n ( = a −1 n = a −1 n where we have defined C −1 [ n−1 − a−1 n C −1 n−1 − n−1hn (1 − an... |

56 |
Exponential operators and parameter differentiation in quantum physics
- Wilcox
- 1967
(Show Context)
Citation Context ...r, we wish to find E (X|Y ) so we need to use the change of variables formula in Eq. (16). Since X = e ln(X) , we have E (X|Z = z) = e(z+σ2 Dz) {PZ}(z) . PZ(z) By the Baker-Campbell-Hausdorff formula =-=[27]-=- we have that so that e (z+σ2Dz) {f}(z) = z+ e 1 2 σ2 (e σ2Dz {f}(z)) = 1 z+ e 2 σ2 f(z + σ 2 ), E (X|Z = z) = ez+ 1 2 σ2 PZ(z + σ 2 ) PZ(z) Next, using the fact that ln(Y )=Z, we have by the change o... |

35 | The SURE-LET Approach to Image Denoising - Blu, Luisier - 2007 |

31 |
An introduction to empirical Bayes data analysis
- Casella
- 1995
(Show Context)
Citation Context ...ation problem. By comparison, most empirical Bayes procedures select parameters for the prior density by optimizing some other criterion (e.g., maximizing likelihood of the data, or matching moments) =-=[19]-=-, which are inconsistent with the estimation goal. 4 Outside of the estimation context, Eq. (9) provides an objective function that can be used for estimating the parameters of the density P (φ) Y (y)... |

26 |
Building Robust Wavelet Estimators for Multicomponent Images Using Stein’s Principle
- Benazza-Benyahia, Pesquet
- 2005
(Show Context)
Citation Context ...revious examples, our framework ties them directly to the seemingly unrelated prior-free methodology. In practice, approximating the reformulated MSE with a sample average (as has been done with SURE =-=[7, 8, 9, 10, 11, 12, 13]-=-) allows one to select an optimal parametric estimator based entirely on a set of corrupted measurements, a procedure we refer to generally as “unsupervised regression”. For the special case of an est... |

19 | Learning to be Bayesian without supervision
- Raphan, Simoncelli
- 2007
(Show Context)
Citation Context ...revious examples, our framework ties them directly to the seemingly unrelated prior-free methodology. In practice, approximating the reformulated MSE with a sample average (as has been done with SURE =-=[7, 8, 9, 10, 11, 12, 13]-=-) allows one to select an optimal parametric estimator based entirely on a set of corrupted measurements, a procedure we refer to generally as “unsupervised regression”. For the special case of an est... |

17 | Estimation of non-normalized statistical models by score matching
- Hyvarinen
- 2005
(Show Context)
Citation Context ...supervised regression expression to yield an objective function for fitting a parametric density to observed data, which provides a generalization of the recently developed “score matching” procedure =-=[14, 15]-=-. Finally, we compare the empirical convergence of several example prior-free and unsupervised estimators with their Bayesian or supervised counterparts. Preliminary versions of the work in this artic... |

14 |
Improving on inadmissible estimators in continuous exponential families with applications to simultaneous estimation of gamma scale parameters
- Berger
- 1980
(Show Context)
Citation Context ...pecialized examples of this appear in the literature, including Stein’s unbiased risk estimator (SURE, which is derived for the case of additive Gaussian noise [4]), and a few other specific examples =-=[5, 6]-=-. In addition to unifying and generalizing these previous examples, our framework ties them directly to the seemingly unrelated prior-free methodology. In practice, approximating the reformulated MSE ... |

13 | A nonlinear Stein based estimator for multichannel image denoising
- Chaux, Duval, et al.
(Show Context)
Citation Context ...revious examples, our framework ties them directly to the seemingly unrelated prior-free methodology. In practice, approximating the reformulated MSE with a sample average (as has been done with SURE =-=[7, 8, 9, 10, 11, 12, 13]-=-) allows one to select an optimal parametric estimator based entirely on a set of corrupted measurements, a procedure we refer to generally as “unsupervised regression”. For the special case of an est... |

10 |
Improving upon standard estimators in discrete exponential families with applications to Poisson and negative binomial cases, Ann
- Hwang
- 1982
(Show Context)
Citation Context ...pecialized examples of this appear in the literature, including Stein’s unbiased risk estimator (SURE, which is derived for the case of additive Gaussian noise [4]), and a few other specific examples =-=[5, 6]-=-. In addition to unifying and generalizing these previous examples, our framework ties them directly to the seemingly unrelated prior-free methodology. In practice, approximating the reformulated MSE ... |

10 |
Optimal approximation of signal priors
- Hyvärinen
- 2008
(Show Context)
Citation Context ...supervised regression expression to yield an objective function for fitting a parametric density to observed data, which provides a generalization of the recently developed “score matching” procedure =-=[14, 15]-=-. Finally, we compare the empirical convergence of several example prior-free and unsupervised estimators with their Bayesian or supervised counterparts. Preliminary versions of the work in this artic... |

8 |
An empirical bayes estimator of the mean of a normal population
- Miyasawa
- 1961
(Show Context)
Citation Context ...ian estimator”, expresses the estimator in terms of a linear operator that depends only on the observation model. This unifies and generalizes several special cases found in the statistics literature =-=[2, 1, 3]-=-. We also showed that this form may be extended to estimate arbitrary statistics of the unknown variable, or the expected value of any polynomial combination of the unknonwn and measurement variables,... |

8 | Optimal denoising in redundant representations
- Raphan, Simoncelli
(Show Context)
Citation Context |

5 | SURE-based wavelet thresholding integrating inter-scale dependencies
- Luisier, Blu, et al.
- 2006
(Show Context)
Citation Context |

4 | Empirical Bayes least squares estimation without an explicit prior.” manuscript in preparation
- Raphan, Simoncelli
- 2006
(Show Context)
Citation Context ... the empirical convergence of several example prior-free and unsupervised estimators with their Bayesian or supervised counterparts. Preliminary versions of the work in this article were presented in =-=[16, 10, 17]-=-. 2 Introductory example: Additive Gaussian noise We begin with a simple scalar example. Suppose random variable Y represents a noisy observation of an underlying random variable, X. It is well known ... |

4 |
Local likelihood density estimation. The Annals of Statistics
- Loader
- 1996
(Show Context)
Citation Context ...ent 0.5, and the noisy SNR is 4.8 dB. In this case, we compute Eq. (14) using a more sophisticated approximation method, as described in [22]. We fit a local exponential model similar to that used in =-=[24]-=- to the data in bins, with binwidth adaptively selected so that the product of the number of points in the bin and the squared binwidth is constant. This binwidth selection procedure, analogous to ada... |

2 | Optimal estimation: Prior free methods and physiological application. Unpublished doctoral dissertation
- Raphan
- 2007
(Show Context)
Citation Context ... the empirical convergence of several example prior-free and unsupervised estimators with their Bayesian or supervised counterparts. Preliminary versions of the work in this article were presented in =-=[16, 10, 17]-=-. 2 Introductory example: Additive Gaussian noise We begin with a simple scalar example. Suppose random variable Y represents a noisy observation of an underlying random variable, X. It is well known ... |