Results 1  10
of
10
Learning to be Bayesian without supervision
 in Adv. Neural Information Processing Systems (NIPS*06
, 2007
"... Bayesian estimators are defined in terms of the posterior distribution. Typically, this is written as the product of the likelihood function and a prior probability density, both of which are assumed to be known. But in many situations, the prior density is not known, and is difficult to learn from ..."
Abstract

Cited by 19 (8 self)
 Add to MetaCart
Bayesian estimators are defined in terms of the posterior distribution. Typically, this is written as the product of the likelihood function and a prior probability density, both of which are assumed to be known. But in many situations, the prior density is not known, and is difficult to learn from data since one does not have access to uncorrupted samples of the variable being estimated. We show that for a wide variety of observation models, the Bayes least squares (BLS) estimator may be formulated without explicit reference to the prior. Specifically, we derive a direct expression for the estimator, and a related expression for the mean squared estimation error, both in terms of the density of the observed measurements. Each of these priorfree formulations allows us to approximate the estimator given a sufficient amount of observed data. We use the first form to develop practical nonparametric approximations of BLS estimators for several different observation processes, and the second form to develop a parametric family of estimators for use in the additive Gaussian noise case. We examine the empirical performance of these estimators as a function of the amount of observed data. 1
Generalized SURE for exponential families: Applications to regularization
 IEEE Trans. on Signal Processing
, 2009
"... ..."
Eaton's Markov chain, its conjugate partner and Padmissibility
 Annals of Statistics
, 1999
"... Suppose that X is a random variable with density f(xj`) and that ï¿½ï¿½(`jx) is a proper posterior corresponding to an improper prior (`). The prior is called Padmissible if the generalized Bayes estimator of every bounded function of ` is almostadmissible under squared error loss. Eaton (1992) s ..."
Abstract

Cited by 6 (5 self)
 Add to MetaCart
Suppose that X is a random variable with density f(xj`) and that ï¿½ï¿½(`jx) is a proper posterior corresponding to an improper prior (`). The prior is called Padmissible if the generalized Bayes estimator of every bounded function of ` is almostadmissible under squared error loss. Eaton (1992) showed that recurrence of the Markov chain with transition density R(jj`) = R ï¿½ï¿½(jjx)f(xj`)dx is a sufficient condition for Padmissibility of (`). We show that Eaton's Markov chain is recurrent if and only if its conjugate partner, with transition density
Smoothing spline estimation of variance functions
 Journal of Computational and Graphical Statistics
, 2006
"... This article considers spline smoothing of variance functions. We focus on selection of smoothing parameters and develop three direct datadriven methods: unbiased risk (UBR), generalized approximate cross validation (GACV) and generalized maximum likelihood (GML). In addition to guaranteed converge ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
This article considers spline smoothing of variance functions. We focus on selection of smoothing parameters and develop three direct datadriven methods: unbiased risk (UBR), generalized approximate cross validation (GACV) and generalized maximum likelihood (GML). In addition to guaranteed convergence, simulations show that these direct methods perform better than existing indirect UBR, generalized cross validation (GCV) and GML methods. The direct UBR and GML methods perform better than the GACV method. An application to arraybased comparative genomic hybridization data illustrates the usefulness of the proposed methods. KEY WORDS: arraybased comparative genomic hybridization; generalized approximate cross validation; generalized maximum likelihood; heteroscedasticity; smoothing parameter; unbiased risk. 1.
Learning least squares estimators without assumed priors or supervision
, 2009
"... The two standard methods of obtaining a leastsquares optimal estimator are (1) Bayesian estimation, in which one assumes a prior distribution on the true values and combines this with a model of the measurement process to obtain an optimal estimator, and (2) supervised regression, in which one opti ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
The two standard methods of obtaining a leastsquares optimal estimator are (1) Bayesian estimation, in which one assumes a prior distribution on the true values and combines this with a model of the measurement process to obtain an optimal estimator, and (2) supervised regression, in which one optimizes a parametric estimator over a training set containing pairs of corrupted measurements and their associated true values. But many realworld systems do not have access to either supervised training examples or a prior model. Here, we study the problem of obtaining an optimal estimator given a measurement process with known statistics, and a set of corrupted measurements of random values drawn from an unknown prior. We develop a general form of nonparametric empirical Bayesian estimator that is written as a direct function of the measurement density, with no explicit reference to the prior. We study the observation conditions under which such “priorfree ” estimators may be obtained, and we derive specific forms for a variety of different corruption processes. Each of these priorfree estimators may also be used to express the mean squared estimation error as an expectation over the measurement density, thus generalizing Stein’s unbiased risk estimator (SURE) which provides such an expression for the additive Gaussian noise case. Minimizing this expression over measurement samples provides an “unsupervised
MINIMAX ESTIMATION WITH THRESHOLDING AND ITS APPLICATION TO WAVELET ANALYSIS
, 2005
"... Many statistical practices involve choosing between a full model and reduced models where some coefficients are reduced to zero. Data were used to select a model with estimated coefficients. Is it possible to do so and still come up with an estimator always better than the traditional estimator base ..."
Abstract
 Add to MetaCart
Many statistical practices involve choosing between a full model and reduced models where some coefficients are reduced to zero. Data were used to select a model with estimated coefficients. Is it possible to do so and still come up with an estimator always better than the traditional estimator based on the full model? The James–Stein estimator is such an estimator, having a property called minimaxity. However, the estimator considers only one reduced model, namely the origin. Hence it reduces no coefficient estimator to zero or every coefficient estimator to zero. In many applications including wavelet analysis, what should be more desirable is to reduce to zero only the estimators smaller than a threshold, called thresholding in this paper. Is it possible to construct this kind of estimators which are minimax? In this paper, we construct such minimax estimators which perform thresholding. We apply our recommended estimator to the wavelet analysis and show that it performs the best among the wellknown estimators aiming simultaneously at estimation and model selection. Some of our estimators are also shown to be asymptotically optimal.
unknown title
"... and another linear subspace. However, it always chooses between the two. The nice idea of George (1986a, b) in multiple shrinkage does allow the data to choose among several models; it, however, does not do thresholding, as is the aim of the paper. Models based on wavelets are very important in many ..."
Abstract
 Add to MetaCart
and another linear subspace. However, it always chooses between the two. The nice idea of George (1986a, b) in multiple shrinkage does allow the data to choose among several models; it, however, does not do thresholding, as is the aim of the paper. Models based on wavelets are very important in many statistical applications. Using these models involves model selection among the full model or the models with smaller dimensions where some of the wavelet coefficients are zero. Is there a way to select a reduced model so that the estimator based on it does no worse in any case than the naive estimator based on the full model, but improves substantially upon the naive estimator when the reduced model is correct? Again, the James– Stein estimator provides such a solution. However, it selects either the origin or the full model. Furthermore, the ideal estimator should do thresholding; namely, it gives zero as an estimate for the components which are smaller than a threshold, and preserves (or shrinks) the other components. However, to the best knowledge of the authors, no such minimax estimators have been constructed. In this paper,
MINIMAX ESTIMATION WITH THRESHOLDING AND ITS APPLICATION TO WAVELET ANALYSIS Harrison H. Zhou* and
, 2004
"... Abstract. Many statistical practices involve choosing between a full model and reduced models where some coefficients are reduced to zero. Data were used to select a model with estimated coefficients. Is it possible to do so and still come up with an estimator always better than the traditional esti ..."
Abstract
 Add to MetaCart
Abstract. Many statistical practices involve choosing between a full model and reduced models where some coefficients are reduced to zero. Data were used to select a model with estimated coefficients. Is it possible to do so and still come up with an estimator always better than the traditional estimator based on the full model? The James–Stein estimator is such an estimator, having a property called minimaxity. However, the estimator considers only one reduced model, namely the origin. Hence it reduces no coefficient estimator to zero or every coefficient estimator to zero. In many applications including wavelet analysis, what should be more desirable is to reduce to zero only the estimators smaller than a threshold, called thresholding in this paper. Is it possible to construct this kind of estimators which are minimax? In this paper, we construct such minimax estimators which perform thresholding. We apply our recommended estimator to the wavelet analysis and show that it performs the best among the well–known estimator aiming simultaneously at estimation and model selection. Some of our estimators are also shown to be asymptotically optimal. Key words and phrases: James–Stein estimator, model selection, VisuShrink,
ARTICLE Communicated by Konrad Paul Kording Least Squares Estimation Without Priors or Supervision
"... Selection of an optimal estimator typically relies on either supervised training samples (pairs of measurements and their associated true values) or a prior probability model for the true values. Here, we consider the problem of obtaining a least squares estimator given a measurement process with kn ..."
Abstract
 Add to MetaCart
Selection of an optimal estimator typically relies on either supervised training samples (pairs of measurements and their associated true values) or a prior probability model for the true values. Here, we consider the problem of obtaining a least squares estimator given a measurement process with known statistics (i.e., a likelihood function) and a set of unsupervised measurements, each arising from a corresponding true value drawn randomly from an unknown distribution. We develop a general expression for a nonparametric empirical Bayes least squares (NEBLS) estimator, which expresses the optimal least squares estimator in terms of the measurement density, with no explicit reference to the unknown (prior) density. We study the conditions under which such estimators exist and derive specific forms for a variety of different measurement processes. We further show that each of these NEBLS estimators may be used to express the mean squared estimation error as an expectation over the measurement density alone, thus generalizing Stein’s unbiased