Results 1  10
of
19
The Horseshoe Estimator for Sparse Signals
, 2008
"... This paper proposes a new approach to sparsity called the horseshoe estimator. The horseshoe is a close cousin of other widely used Bayes rules arising from, for example, doubleexponential and Cauchy priors, in that it is a member of the same family of multivariate scale mixtures of normals. But th ..."
Abstract

Cited by 21 (6 self)
 Add to MetaCart
This paper proposes a new approach to sparsity called the horseshoe estimator. The horseshoe is a close cousin of other widely used Bayes rules arising from, for example, doubleexponential and Cauchy priors, in that it is a member of the same family of multivariate scale mixtures of normals. But the horseshoe enjoys a number of advantages over existing approaches, including its robustness, its adaptivity to different sparsity patterns, and its analytical tractability. We prove two theorems that formally characterize both the horseshoe’s adeptness at large outlying signals, and its superefficient rate of convergence to the correct estimate of the sampling density in sparse situations. Finally, using a combination of real and simulated data, we show that the horseshoe estimator corresponds quite closely to the answers one would get by pursuing a full Bayesian modelaveraging approach using a discrete mixture prior to model signals and noise.
Semiparametric Stochastic Mixed Models for Longitudinal Data
 Journal of the American Statistical Association
, 1997
"... We consider inference for a semiparametric stochastic mixed model for longitudinal data. This model uses parametric fixed effects to represent the covariate effects and an arbitrary smooth function to model the time effect. The withinsubject correlation is modeled using random effects and a station ..."
Abstract

Cited by 18 (4 self)
 Add to MetaCart
We consider inference for a semiparametric stochastic mixed model for longitudinal data. This model uses parametric fixed effects to represent the covariate effects and an arbitrary smooth function to model the time effect. The withinsubject correlation is modeled using random effects and a stationary or nonstationary stochastic process. We derive maximum penalized likelihood estimators of the regression coefficients and the nonparametric function. The resulting estimator of the nonparametric function is a smoothing spline. Frequentist and Bayesian inference on these model components are proposed and compared. Restricted maximum likelihood is used to estimate the smoothing parameter and the variance components simultaneously. We show that estimation of all model components of interest can proceed by fitting a modified linear mixed model. The proposed method is illustrated by analyzing a hormone data set and its performance is evaluated through simulations. KEY WORDS: Correlated data; ...
Shrink Globally, Act Locally: Sparse Bayesian Regularization and Prediction
, 2010
"... We use Lévy processes to generate joint prior distributions for a location parameter β = (β1,..., βp) as p grows large. This approach, which generalizes normal scalemixture priors to an infinitedimensional setting, has a number of connections with mathematical finance and Bayesian nonparametrics. ..."
Abstract

Cited by 16 (5 self)
 Add to MetaCart
We use Lévy processes to generate joint prior distributions for a location parameter β = (β1,..., βp) as p grows large. This approach, which generalizes normal scalemixture priors to an infinitedimensional setting, has a number of connections with mathematical finance and Bayesian nonparametrics. We argue that it provides an intuitive framework for generating new regularization penalties and shrinkage rules; for performing asymptotic analysis on existing models; and for simplifying proofs of some classic results on normal scale mixtures.
Handling sparsity via the horseshoe
 Journal of Machine Learning Research, W&CP
"... This paper presents a general, fully Bayesian framework for sparse supervisedlearning problems based on the horseshoe prior. The horseshoe prior is a member of the family of multivariate scale mixtures of normals, and is therefore closely related to widely used approaches for sparse Bayesian learni ..."
Abstract

Cited by 10 (1 self)
 Add to MetaCart
This paper presents a general, fully Bayesian framework for sparse supervisedlearning problems based on the horseshoe prior. The horseshoe prior is a member of the family of multivariate scale mixtures of normals, and is therefore closely related to widely used approaches for sparse Bayesian learning, including, among others, Laplacian priors (e.g. the LASSO) and Studentt priors (e.g. the relevance vector machine). The advantages of the horseshoe are its robustness at handling unknown sparsity and large outlying signals. These properties are justified theoretically via a representation theorem and accompanied by comprehensive empirical experiments that compare its performance to benchmark alternatives. 1
Posterior propriety and admissibility of hyperpriors in normal hierarchical models
 The Annals of Statistics
, 2005
"... Hierarchical modeling is wonderful and here to stay, but hyperparameter priors are often chosen in a casual fashion. Unfortunately, as the number of hyperparameters grows, the effects of casual choices can multiply, leading to considerably inferior performance. As an extreme, but not uncommon, examp ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
Hierarchical modeling is wonderful and here to stay, but hyperparameter priors are often chosen in a casual fashion. Unfortunately, as the number of hyperparameters grows, the effects of casual choices can multiply, leading to considerably inferior performance. As an extreme, but not uncommon, example use of the wrong hyperparameter priors can even lead to impropriety of the posterior. For exchangeable hierarchical multivariate normal models, we first determine when a standard class of hierarchical priors results in proper or improper posteriors. We next determine which elements of this class lead to admissible estimators of the mean under quadratic loss; such considerations provide one useful guideline for choice among hierarchical priors. Finally, computational issues with the resulting posterior distributions are addressed. 1. Introduction. 1.1. The model and the problems. Consider the block multivariate normal situation (sometimes called the “matrix of means problem”) specified by the following hierarchical Bayesian model:
Bayesian generalized double Pareto shrinkage
, 2010
"... We propose a generalized double Pareto prior for shrinkage estimation in linear models. The prior can be obtained via a scale mixture of Laplace or normal distributions, while forming a bridge between the Laplace and NormalJeffreys ’ priors. While it has a spike at zero like the Laplace density, it ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
We propose a generalized double Pareto prior for shrinkage estimation in linear models. The prior can be obtained via a scale mixture of Laplace or normal distributions, while forming a bridge between the Laplace and NormalJeffreys ’ priors. While it has a spike at zero like the Laplace density, it also has a Studenttlike tail behavior. We show strong consistency of the posterior in regression models with a diverging number of parameters, providing a template to be used for other priors in similar settings. Bayesian computation is straightforward via a simple Gibbs sampling algorithm. We also investigate the properties of the maximum a posteriori estimator and reveal connections with some wellestablished regularization procedures. The performance of the new prior is tested through simulations.
A NEW CLASS OF GENERALIZED BAYES MINIMAX RIDGE REGRESSION ESTIMATORS
, 2005
"... Let y = Aβ + ε, where y is an N × 1 vector of observations, β is a p×1 vector of unknown regression coefficients, A is an N × p design matrix and ε is a spherically symmetric error term with unknown scale parameter σ. We consider estimation of β under general quadratic loss functions, and, in partic ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
Let y = Aβ + ε, where y is an N × 1 vector of observations, β is a p×1 vector of unknown regression coefficients, A is an N × p design matrix and ε is a spherically symmetric error term with unknown scale parameter σ. We consider estimation of β under general quadratic loss functions, and, in particular, extend the work of Strawderman
On the halfCauchy prior for a global scale parameter
, 2010
"... This paper argues that the halfCauchy distribution should replace the inverseGamma distribution as a default prior for a toplevel scale parameter in Bayesian hierarchical models, at least for cases where a proper prior is necessary. Our arguments involve a blend of Bayesian and frequentist reason ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
This paper argues that the halfCauchy distribution should replace the inverseGamma distribution as a default prior for a toplevel scale parameter in Bayesian hierarchical models, at least for cases where a proper prior is necessary. Our arguments involve a blend of Bayesian and frequentist reasoning, and are intended to complement the original case made by Gelman (2006) in support of the foldedt family of priors. First, we generalize the halfCauchy prior to the wider class of hypergeometric invertedbeta priors. We derive expressions for posterior moments and marginal densities when these priors are used for a toplevel normal variance in a Bayesian hierarchical model. We go on to prove a proposition that, together with the results for moments and marginals, allows us to characterize the frequentist risk of the Bayes estimators under all globalshrinkage priors in the class. These theoretical results, in turn, allow us to study the frequentist properties of the halfCauchy prior versus a wide class of alternatives. The halfCauchy occupies a sensible “middle ground ” within this class: it performs very well near the origin, but does not lead to drastic compromises in other parts of the parameter space. This provides an alternative, classical justification for the repeated, routine use of this prior. We also consider situations where the underlying mean vector is sparse, where we argue that the usual conjugate choice of an inversegamma prior is particularly inappropriate, and can lead to highly distorted posterior inferences. Finally, we briefly summarize some open issues in the specification of default priors for scale terms in hierarchical models.
Alternative Global–Local Shrinkage Priors Using Hypergeometric–Beta Mixtures
, 2009
"... This paper introduces an approach to estimation in possibly sparse data sets using shrinkage priors based upon the class of hypergeometricbeta distributions. These widely applicable priors turn out to be a fourparameter generalization of the beta family, and are pseudoconjugate: they cannot thems ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
This paper introduces an approach to estimation in possibly sparse data sets using shrinkage priors based upon the class of hypergeometricbeta distributions. These widely applicable priors turn out to be a fourparameter generalization of the beta family, and are pseudoconjugate: they cannot themselves be expressed in closed form, but they do yield tractable moments and marginal likelihoods when used as priors for the mean of a normal distribution. These priors are useful in situations where standard priors are inappropriate or illbehaved. NonBayesians will find these priors useful for generating easily computable shrinkage estimators that have excellent risk properties. Bayesians will find them useful for generating computationally tractable priors for a variance parameter. We illustrate the use of these priors on a variety of global and local shrinkage problems, and we prove a theorem that characterizes their risk proprieties when used for estimation of a normal mean under a quadratic loss function.