Results 1  10
of
10
Learning to be Bayesian without supervision
 in Adv. Neural Information Processing Systems (NIPS*06
, 2007
"... Bayesian estimators are defined in terms of the posterior distribution. Typically, this is written as the product of the likelihood function and a prior probability density, both of which are assumed to be known. But in many situations, the prior density is not known, and is difficult to learn from ..."
Abstract

Cited by 19 (8 self)
 Add to MetaCart
Bayesian estimators are defined in terms of the posterior distribution. Typically, this is written as the product of the likelihood function and a prior probability density, both of which are assumed to be known. But in many situations, the prior density is not known, and is difficult to learn from data since one does not have access to uncorrupted samples of the variable being estimated. We show that for a wide variety of observation models, the Bayes least squares (BLS) estimator may be formulated without explicit reference to the prior. Specifically, we derive a direct expression for the estimator, and a related expression for the mean squared estimation error, both in terms of the density of the observed measurements. Each of these priorfree formulations allows us to approximate the estimator given a sufficient amount of observed data. We use the first form to develop practical nonparametric approximations of BLS estimators for several different observation processes, and the second form to develop a parametric family of estimators for use in the additive Gaussian noise case. We examine the empirical performance of these estimators as a function of the amount of observed data. 1
Generalized SURE for exponential families: Applications to regularization
 IEEE Trans. on Signal Processing
, 2009
"... ..."
Eaton's Markov chain, its conjugate partner and Padmissibility
 Annals of Statistics
, 1999
"... Suppose that X is a random variable with density f(xj`) and that ï¿½ï¿½(`jx) is a proper posterior corresponding to an improper prior (`). The prior is called Padmissible if the generalized Bayes estimator of every bounded function of ` is almostadmissible under squared error loss. Eaton (1992) s ..."
Abstract

Cited by 6 (5 self)
 Add to MetaCart
Suppose that X is a random variable with density f(xj`) and that ï¿½ï¿½(`jx) is a proper posterior corresponding to an improper prior (`). The prior is called Padmissible if the generalized Bayes estimator of every bounded function of ` is almostadmissible under squared error loss. Eaton (1992) showed that recurrence of the Markov chain with transition density R(jj`) = R ï¿½ï¿½(jjx)f(xj`)dx is a sufficient condition for Padmissibility of (`). We show that Eaton's Markov chain is recurrent if and only if its conjugate partner, with transition density
Optimal estimation: Prior free methods and physiological application
 Ph.D. dissertation, Courant Institute of Mathematical Sciences
, 2007
"... First and foremost, I would like to thank my advisors, Eero Simoncelli and Dan Tranchina. Dan supervised my work on cortical modeling, and his insight and advice were extremely helpful in carrying out the bulk of the work of Chapter 1. He also had many useful comments about the remainder of the mate ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
First and foremost, I would like to thank my advisors, Eero Simoncelli and Dan Tranchina. Dan supervised my work on cortical modeling, and his insight and advice were extremely helpful in carrying out the bulk of the work of Chapter 1. He also had many useful comments about the remainder of the material in the thesis. Over the years, I have learned a lot about computational neuroscience in general from discussions with him. Eero supervised my work on priorfree methods and applications, which make up the substance of Chapters 24. His intuition, insight and ideas were crucial in helping me progress in this line of research, and more importantly, in obtaining useful results. I also learned a lot from him about image processing, statistics and computational neuroscience, amongst other things. I would like to thank my third reader, Charlie Peskin, for his input to my thesis and defense and helpful discussions about the material. I would also like to thank Mehryar Mohri for being on my committee and for some useful discussions about VC type bounds for regression. As well, I would like to thank Francesca Chiaromonte for being on my committee, and for helpful discussions and comments about the material in the thesis. It was good to have a statistician’s point of view on the work. I would like to thank Bob Shapley for his helpful input, and for information about contrast dependent summation area. I would also like to thank him for letting me sit in on his ”new view ” class about visual cortex, where I read some very useful papers. I would like to thank members of the Laboratory for Computational v Vision, for helpful comments and discussions along the way. I would also like to thank LCV alumni Liam Paninski and Jonathan Pillow, who both had some particularly useful comments about the priorfree methods. I would also like thank the various people at Courant, too numerous to mention, who have provided help along the way.
Smoothing spline estimation of variance functions
 Journal of Computational and Graphical Statistics
, 2006
"... This article considers spline smoothing of variance functions. We focus on selection of smoothing parameters and develop three direct datadriven methods: unbiased risk (UBR), generalized approximate cross validation (GACV) and generalized maximum likelihood (GML). In addition to guaranteed converge ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
This article considers spline smoothing of variance functions. We focus on selection of smoothing parameters and develop three direct datadriven methods: unbiased risk (UBR), generalized approximate cross validation (GACV) and generalized maximum likelihood (GML). In addition to guaranteed convergence, simulations show that these direct methods perform better than existing indirect UBR, generalized cross validation (GCV) and GML methods. The direct UBR and GML methods perform better than the GACV method. An application to arraybased comparative genomic hybridization data illustrates the usefulness of the proposed methods. KEY WORDS: arraybased comparative genomic hybridization; generalized approximate cross validation; generalized maximum likelihood; heteroscedasticity; smoothing parameter; unbiased risk. 1.
Learning least squares estimators without assumed priors or supervision
, 2009
"... The two standard methods of obtaining a leastsquares optimal estimator are (1) Bayesian estimation, in which one assumes a prior distribution on the true values and combines this with a model of the measurement process to obtain an optimal estimator, and (2) supervised regression, in which one opti ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
The two standard methods of obtaining a leastsquares optimal estimator are (1) Bayesian estimation, in which one assumes a prior distribution on the true values and combines this with a model of the measurement process to obtain an optimal estimator, and (2) supervised regression, in which one optimizes a parametric estimator over a training set containing pairs of corrupted measurements and their associated true values. But many realworld systems do not have access to either supervised training examples or a prior model. Here, we study the problem of obtaining an optimal estimator given a measurement process with known statistics, and a set of corrupted measurements of random values drawn from an unknown prior. We develop a general form of nonparametric empirical Bayesian estimator that is written as a direct function of the measurement density, with no explicit reference to the prior. We study the observation conditions under which such “priorfree ” estimators may be obtained, and we derive specific forms for a variety of different corruption processes. Each of these priorfree estimators may also be used to express the mean squared estimation error as an expectation over the measurement density, thus generalizing Stein’s unbiased risk estimator (SURE) which provides such an expression for the additive Gaussian noise case. Minimizing this expression over measurement samples provides an “unsupervised
MINIMAX ESTIMATION WITH THRESHOLDING AND ITS APPLICATION TO WAVELET ANALYSIS
, 2005
"... Many statistical practices involve choosing between a full model and reduced models where some coefficients are reduced to zero. Data were used to select a model with estimated coefficients. Is it possible to do so and still come up with an estimator always better than the traditional estimator base ..."
Abstract
 Add to MetaCart
Many statistical practices involve choosing between a full model and reduced models where some coefficients are reduced to zero. Data were used to select a model with estimated coefficients. Is it possible to do so and still come up with an estimator always better than the traditional estimator based on the full model? The James–Stein estimator is such an estimator, having a property called minimaxity. However, the estimator considers only one reduced model, namely the origin. Hence it reduces no coefficient estimator to zero or every coefficient estimator to zero. In many applications including wavelet analysis, what should be more desirable is to reduce to zero only the estimators smaller than a threshold, called thresholding in this paper. Is it possible to construct this kind of estimators which are minimax? In this paper, we construct such minimax estimators which perform thresholding. We apply our recommended estimator to the wavelet analysis and show that it performs the best among the wellknown estimators aiming simultaneously at estimation and model selection. Some of our estimators are also shown to be asymptotically optimal.
ARTICLE Communicated by Konrad Paul Kording Least Squares Estimation Without Priors or Supervision
"... Selection of an optimal estimator typically relies on either supervised training samples (pairs of measurements and their associated true values) or a prior probability model for the true values. Here, we consider the problem of obtaining a least squares estimator given a measurement process with kn ..."
Abstract
 Add to MetaCart
Selection of an optimal estimator typically relies on either supervised training samples (pairs of measurements and their associated true values) or a prior probability model for the true values. Here, we consider the problem of obtaining a least squares estimator given a measurement process with known statistics (i.e., a likelihood function) and a set of unsupervised measurements, each arising from a corresponding true value drawn randomly from an unknown distribution. We develop a general expression for a nonparametric empirical Bayes least squares (NEBLS) estimator, which expresses the optimal least squares estimator in terms of the measurement density, with no explicit reference to the unknown (prior) density. We study the conditions under which such estimators exist and derive specific forms for a variety of different measurement processes. We further show that each of these NEBLS estimators may be used to express the mean squared estimation error as an expectation over the measurement density alone, thus generalizing Stein’s unbiased
1 Generalized SURE for Exponential Families: Applications to Regularization
, 804
"... Abstract — Stein’s unbiased risk estimate (SURE) was proposed by Stein for the independent, identically distributed (iid) Gaussian model in order to derive estimates that dominate leastsquares (LS). In recent years, the SURE criterion has been employed in a variety of denoising problems for choosin ..."
Abstract
 Add to MetaCart
Abstract — Stein’s unbiased risk estimate (SURE) was proposed by Stein for the independent, identically distributed (iid) Gaussian model in order to derive estimates that dominate leastsquares (LS). In recent years, the SURE criterion has been employed in a variety of denoising problems for choosing regularization parameters that minimize an estimate of the meansquared error (MSE). However, its use has been limited to the iid case which precludes many important applications. In this paper we begin by deriving a SURE counterpart for general, not necessarily iid distributions from the exponential family. This enables extending the SURE design technique to a much broader class of problems. Based on this generalization we suggest a new method for choosing regularization parameters in penalized LS estimators. We then demonstrate its superior performance over the conventional generalized cross validation approach and the discrepancy method in the context of image deblurring and deconvolution. The SURE technique can also be used to design estimates without predefining their structure. However, allowing for too many free parameters impairs the performance of the resulting estimates. To address this inherent tradeoff we propose a regularized SURE objective. Based on this design criterion, we derive a wavelet denoising strategy that is similar in sprit to the standard softthreshold approach but can lead to improved MSE performance. I.
unknown title
"... and another linear subspace. However, it always chooses between the two. The nice idea of George (1986a, b) in multiple shrinkage does allow the data to choose among several models; it, however, does not do thresholding, as is the aim of the paper. Models based on wavelets are very important in many ..."
Abstract
 Add to MetaCart
and another linear subspace. However, it always chooses between the two. The nice idea of George (1986a, b) in multiple shrinkage does allow the data to choose among several models; it, however, does not do thresholding, as is the aim of the paper. Models based on wavelets are very important in many statistical applications. Using these models involves model selection among the full model or the models with smaller dimensions where some of the wavelet coefficients are zero. Is there a way to select a reduced model so that the estimator based on it does no worse in any case than the naive estimator based on the full model, but improves substantially upon the naive estimator when the reduced model is correct? Again, the James– Stein estimator provides such a solution. However, it selects either the origin or the full model. Furthermore, the ideal estimator should do thresholding; namely, it gives zero as an estimate for the components which are smaller than a threshold, and preserves (or shrinks) the other components. However, to the best knowledge of the authors, no such minimax estimators have been constructed. In this paper,