Results 1  10
of
18
Reversible jump Markov chain Monte Carlo computation and Bayesian model determination
 Biometrika
, 1995
"... Markov chain Monte Carlo methods for Bayesian computation have until recently been restricted to problems where the joint distribution of all variables has a density with respect to some xed standard underlying measure. They have therefore not been available for application to Bayesian model determi ..."
Abstract

Cited by 1079 (21 self)
 Add to MetaCart
Markov chain Monte Carlo methods for Bayesian computation have until recently been restricted to problems where the joint distribution of all variables has a density with respect to some xed standard underlying measure. They have therefore not been available for application to Bayesian model determination, where the dimensionality of the parameter vector is typically not xed. This article proposes a new framework for the construction of reversible Markov chain samplers that jump between parameter subspaces of di ering dimensionality, which is exible and entirely constructive. It should therefore have wide applicability in model determination problems. The methodology is illustrated with applications to multiple changepoint analysis in one and two dimensions, and toaBayesian comparison of binomial experiments.
Multiple Shrinkage and Subset Selection in Wavelets
, 1997
"... This paper discusses Bayesian methods for multiple shrinkage estimation in wavelets. Wavelets are used in applications for data denoising, via shrinkage of the coefficients towards zero, and for data compression, by shrinkage and setting small coefficients to zero. We approach wavelet shrinkage by u ..."
Abstract

Cited by 132 (16 self)
 Add to MetaCart
This paper discusses Bayesian methods for multiple shrinkage estimation in wavelets. Wavelets are used in applications for data denoising, via shrinkage of the coefficients towards zero, and for data compression, by shrinkage and setting small coefficients to zero. We approach wavelet shrinkage by using Bayesian hierarchical models, assigning a positive prior probability to the wavelet coefficients being zero. The resulting estimator for the wavelet coefficients is a multiple shrinkage estimator that exhibits a wide variety of nonlinear shrinkage patterns. We discuss fast computational implementations, with a focus on easytocompute analytic approximations as well as importance sampling and Markov chain Monte Carlo methods. Multiple shrinkage estimators prove to have excellent mean squared error performance in reconstructing standard test functions. We demonstrate this in simulated test examples, comparing various implementations of multiple shrinkage to commonly used shrinkage rules. Finally, we illustrate our approach with an application to the socalled "glint" data.
Prediction via Orthogonalized Model Mixing
 JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
, 1994
"... In this paper we introduce an approach and algorithms for model mixing in large prediction problems with correlated predictors. We focus on the choice of predictors in linear models, and mix over possible subsets of candidate predictors. Our approach is based on expressing the space of models in ter ..."
Abstract

Cited by 57 (9 self)
 Add to MetaCart
(Show Context)
In this paper we introduce an approach and algorithms for model mixing in large prediction problems with correlated predictors. We focus on the choice of predictors in linear models, and mix over possible subsets of candidate predictors. Our approach is based on expressing the space of models in terms of an orthogonalization of the design matrix. Advantages are both statistical and computational. Statistically, orthogonalization often leads to a reduction in the number of competing models by eliminating correlations. Computationally, large model spaces cannot be enumerated; recent approaches are based on sampling models with high posterior probability via Markov chains. Based on orthogonalization of the space of candidate predictors, we can approximate the posterior probabilities of models by products of predictorspecific terms. This leads to an importance sampling function for sampling directly from the joint distribution over the model space, without resorting to Markov chains. Comp...
Frequentist model average estimators
 Journal of the American Statistical Association
, 2003
"... Abstract. The traditional use of model selection methods in practice is to proceed as if the final selected model had been chosen in advance, without acknowledging the additional uncertainty introduced by model selection. This often means underreporting of variability and too optimistic confidence ..."
Abstract

Cited by 51 (2 self)
 Add to MetaCart
Abstract. The traditional use of model selection methods in practice is to proceed as if the final selected model had been chosen in advance, without acknowledging the additional uncertainty introduced by model selection. This often means underreporting of variability and too optimistic confidence intervals. We build a general largesample likelihood apparatus in which limiting distributions and risk properties of estimatorspostselection as well as of model average estimators are precisely described, also explicitly taking modelling bias into account. This allows a drastic reduction of complexity, as competing model averaging schemes may be developed, discussed and compared inside a statistical prototype experiment where only a few crucial quantities matter. In particular we offer a frequentist view on Bayesian model averaging methods and give a link to generalised ridge estimators. Our work also leads to new model selection criteria. The methods are illustrated with real data applications. Key words: bias and variance balance, growing models, likelihood inference, model average estimators, model information criteria, moderate misspecification 1. Introduction and
Exponential screening and optimal rates of sparse estimation. Available at ArXiv:1003.2654v3
, 2010
"... In highdimensional linear regression, the goal pursued here is to estimate an unknown regression function using linear combinations of a suitable set of covariates. One of the key assumptions for the success of any statistical procedure in this setup is to assume that the linear combination is sp ..."
Abstract

Cited by 20 (3 self)
 Add to MetaCart
In highdimensional linear regression, the goal pursued here is to estimate an unknown regression function using linear combinations of a suitable set of covariates. One of the key assumptions for the success of any statistical procedure in this setup is to assume that the linear combination is sparse in some sense, for example, that it involves only few covariates. We consider a general, nonnecessarily linear, regression with Gaussian noise and study a related question, that is, to find a linear combination of approximating functions, which is at the same time sparse and has small mean squared error (MSE). We introduce a new estimation procedure, called Exponential Screening, that shows remarkable adaptation properties. It adapts to the linear combination that optimally balances MSE and sparsity, whether the latter is measured in terms of the number of nonzero entries in the combination (0 norm) or in terms of the global weight of the combination (1 norm). The power of this adaptation result is illustrated by showing that Exponential Screening solves optimally and simultaneously all the problems of aggregation in Gaussian regression that have been discussed in the literature. Moreover, we show that the performance of the Exponential Screening estimator cannot be improved in a minimax sense, even if the optimal sparsity is known in advance. The theoretical and numerical superiority of Exponential Screening compared to stateoftheart sparse procedures is also discussed. 1. Introduction. The
Distribution of eigenvalues and eigenvectors of Wishart matrix when the population eigenvalues are infinitely dispersed
, 2002
"... We consider the asymptotic joint distribution of the eigenvalues and eigenvectors of Wishart matrix when the population eigenvalues become infinitely dispersed. We show that the normalized sample eigenvalues and the relevant elements of the sample eigenvectors are asymptotically all mutually indepen ..."
Abstract

Cited by 8 (5 self)
 Add to MetaCart
We consider the asymptotic joint distribution of the eigenvalues and eigenvectors of Wishart matrix when the population eigenvalues become infinitely dispersed. We show that the normalized sample eigenvalues and the relevant elements of the sample eigenvectors are asymptotically all mutually independently distributed. The limiting distributions of the normalized sample eigenvalues are chisquared distributions with varying degrees of freedom and the distribution of the relevant elements of the eigenvectors is the standard normal distribution. As an application of this result, we investigate tail minimaxity in the estimation of the population covariance matrix of Wishart distribution with respect to Stein's loss function and the quadratic loss function. Under mild regularity conditions, we show that the behavior of a broad class of minimax estimators is identical when the sample eigenvalues become infinitely dispersed. Keywords and phrases asymptotic distribution, covariance matrix, minimax estimator, quadratic loss, singular parameter, Stein's loss, tail minimaxity. 1
Improved minimax predictive densities under Kullback–Leibler loss
 Ann. Statist
, 2006
"... Let Xµ ∼ Np(µ, vxI)and Y µ ∼ Np(µ, vyI)be independent pdimensional multivariate normal vectors with common unknown mean µ. Based on only observing X = x, we consider the problem of obtaining a predictive density ˆp(yx) for Y that is close to p(yµ) as measured by expected Kullback–Leibler loss. ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
Let Xµ ∼ Np(µ, vxI)and Y µ ∼ Np(µ, vyI)be independent pdimensional multivariate normal vectors with common unknown mean µ. Based on only observing X = x, we consider the problem of obtaining a predictive density ˆp(yx) for Y that is close to p(yµ) as measured by expected Kullback–Leibler loss. A natural procedure for this problem is the (formal) Bayes predictive density ˆpU(yx) under the uniform prior πU(µ) ≡ 1, which is best invariant and minimax. We show that any Bayes predictive density will be minimax if it is obtained by a prior yielding a marginal that is superharmonic or whose square root is superharmonic. This yields wide classes of minimax procedures that dominate ˆpU(yx), including Bayes predictive densities under superharmonic priors. Fundamental similarities and differences with the parallel theory of estimating a multivariate normal mean under quadratic loss are described. 1. Introduction. Let Xµ ∼ Np(µ, vxI) and Y µ ∼ Np(µ, vyI) be independent pdimensional multivariate normal vectors with common unknown mean µ,
Orthogonalizations and Prior Distributions for Orthogonalized Model Mixing
 In Modelling and Prediction
, 1996
"... Prediction methods based on mixing over a set of plausible models can help alleviate the sensitivity of inference and decisions to modeling assumptions. One important application area is prediction in linear models. Computing techniques for model mixing in linear models include Markov chain Monte Ca ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
Prediction methods based on mixing over a set of plausible models can help alleviate the sensitivity of inference and decisions to modeling assumptions. One important application area is prediction in linear models. Computing techniques for model mixing in linear models include Markov chain Monte Carlo methods as well as importance sampling. Clyde, DeSimone and Parmigiani (1996) developed an importance sampling strategy based on expressing the space of predictors in terms of an orthogonal basis. This leads both to a better identified problem and to simple approximations to the posterior model probabilities. Such approximations can be used to construct efficient importance samplers. For brevity, we call this strategy orthogonalized model mixing. Two key elements of orthogonalized model mixing are: a) the orthogonalization method and b) the prior probability distributions assigned to the models and the coefficients. In this paper we consider in further detail the specification of these t...
Bayesian semiparametric multiple shrinkage
"... National Institute of Health High dimensional and highly correlated data leading to non or weaklyidentified effects are commonplace. Maximum likelihood will typically fail in such situations and a variety of shrinkage methods have been proposed. Standard techniques, such as ridge regression or th ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
National Institute of Health High dimensional and highly correlated data leading to non or weaklyidentified effects are commonplace. Maximum likelihood will typically fail in such situations and a variety of shrinkage methods have been proposed. Standard techniques, such as ridge regression or the lasso, shrink estimates toward zero, with some approaches allowing coefficients to be selected out of the model by achieving a value of zero. When substantive information is available, estimates can be shrunk to nonnull values; however, such information may not be available. We propose a Bayesian semiparametric approach that allows shrinkage to multiple locations. Coefficients are given a mixture of heavy tailed double exponential priors, with location and scale parameters assigned Dirichlet process hyperpriors to allow groups of coefficients to be shrunk toward the same, possibly nonzero, mean. Our approach favors sparse, but flexible structure, by shrinking towards a small number of random locations. The methods are illustrated using a study of genetic polymorphisms and multiple myeloma.