Results 1  10
of
68
Covariance regularization by thresholding
, 2007
"... This paper considers regularizing a covariance matrix of p variables estimated from n observations, by hard thresholding. We show that the thresholded estimate is consistent in the operator norm as long as the true covariance matrix is sparse in a suitable sense, the variables are Gaussian or subGa ..."
Abstract

Cited by 69 (9 self)
 Add to MetaCart
This paper considers regularizing a covariance matrix of p variables estimated from n observations, by hard thresholding. We show that the thresholded estimate is consistent in the operator norm as long as the true covariance matrix is sparse in a suitable sense, the variables are Gaussian or subGaussian, and (log p)/n → 0, and obtain explicit rates. The results are uniform over families of covariance matrices which satisfy a fairly natural notion of sparsity. We discuss an intuitive resampling scheme for threshold selection and prove a general crossvalidation result that justifies this approach. We also compare thresholding to other covariance estimators in simulations and on an example from climate data. 1. Introduction. Estimation
Needles and straw in haystacks: Empirical Bayes estimates of possibly sparse sequences
 Ann. Statist
, 2002
"... An empirical Bayes approach to the estimation of possibly sparse sequences observed in Gaussian white noise is set out and investigated. The prior considered is a mixture of an atom of probability at zero and a heavytailed density, with the mixing weight chosen by marginal maximum likelihood, in ..."
Abstract

Cited by 56 (5 self)
 Add to MetaCart
An empirical Bayes approach to the estimation of possibly sparse sequences observed in Gaussian white noise is set out and investigated. The prior considered is a mixture of an atom of probability at zero and a heavytailed density, with the mixing weight chosen by marginal maximum likelihood, in the hope of adapting between sparse and dense sequences. If estimation is then carried out using the posterior median, this is a random thresholding procedure. Other thresholding rules using the same threshold can also be used. Probability bounds on the threshold chosen by the marginal maximum likelihood approach lead to overall bounds on the risk of the method over the class of signal sequences of length n with normalized ` p norm bounded by , for > 0 and 0 < p 2: Estimation error is measured by mean q loss, for 0 < q 2: For all p and q in (0; 2], the method achieves the optimal estimation rate as n ! 1 and ! 0 at various rates, and in this sense adapts automatically to the sparseness or otherwise of the underlying signal. In addition the risk is uniformly bounded over all signals. If the posterior mean is used as the estimator, the results still hold for q > 1: Simulations show excellent performance. Computationally, the method is tractable and essentially of O(n) complexity, and software is available. The extension to a modi ed thresholding method relevant to the wavelet estimation of derivatives of functions is also considered.
Mixtures of gpriors for Bayesian variable selection
 Journal of the American Statistical Association
, 2008
"... Zellner’s gprior remains a popular conventional prior for use in Bayesian variable selection, despite several undesirable consistency issues. In this paper, we study mixtures of gpriors as an alternative to default gpriors that resolve many of the problems with the original formulation, while mai ..."
Abstract

Cited by 36 (4 self)
 Add to MetaCart
Zellner’s gprior remains a popular conventional prior for use in Bayesian variable selection, despite several undesirable consistency issues. In this paper, we study mixtures of gpriors as an alternative to default gpriors that resolve many of the problems with the original formulation, while maintaining the computational tractability that has made the gprior so popular. We present theoretical properties of the mixture gpriors and provide real and simulated examples to compare the mixture formulation with fixed gpriors, Empirical Bayes approaches and other default procedures.
Variable selection in data mining: Building a predictive model for bankruptcy
 Journal of the American Statistical Association
, 2004
"... We predict the onset of personal bankruptcy using least squares regression. Although well publicized, only 2,244 bankruptcies occur in our data set of 2.9 million months of creditcard activity. We use stepwise selection to find predictors from a mix of payment history, debt load, demographics, and ..."
Abstract

Cited by 34 (9 self)
 Add to MetaCart
We predict the onset of personal bankruptcy using least squares regression. Although well publicized, only 2,244 bankruptcies occur in our data set of 2.9 million months of creditcard activity. We use stepwise selection to find predictors from a mix of payment history, debt load, demographics, and their interactions. This combination of rare responses and over 67,000 possible predictors leads to a challenging modeling question: How does one separate coincidental from useful predictors? We show that three modifications turn stepwise regression into an effective methodology for predicting bankruptcy. Our version of stepwise regression (1) organizes calculations to accommodate interactions, (2) exploits modern decision theoretic criteria to choose predictors, and (3) conservatively estimates pvalues to handle sparse data and a binary response. Omitting any one of these leads to poor performance. A final step in our procedure calibrates regression predictions. With these modifications, stepwise regression predicts bankruptcy as well, if not better, than recently developed datamining tools. When sorted, the largest 14,000 resulting predictions hold 1000 of the 1800 bankruptcies hidden in a validation sample of 2.3 million observations. If the cost of missing a bankruptcy is 200 times that of a false positive, our predictions incur less than 2/3 of the costs of classification errors produced by the treebased classifier C4.5. Key Phrases: AIC, Cp, Bonferroni, calibration, hard thresholding, risk inflation criterion (RIC),
Efficient empirical Bayes variable selection and estimation in linear models
 J. Amer. Statist. Assoc
, 2005
"... We propose an empirical Bayes method for variable selection and coefficient estimation in linear regression models. The method is based on a particular hierarchical Bayes formulation, and the empirical Bayes estimator is shown to be closely related to the LASSO estimator. Such a connection allows u ..."
Abstract

Cited by 23 (4 self)
 Add to MetaCart
We propose an empirical Bayes method for variable selection and coefficient estimation in linear regression models. The method is based on a particular hierarchical Bayes formulation, and the empirical Bayes estimator is shown to be closely related to the LASSO estimator. Such a connection allows us to take advantage of the recently developed quick LASSO algorithm to compute the empirical Bayes estimate, and provides a new way to select the tuning parameter in the LASSO method. Unlike previous empirical Bayes variable selection methods, which in most practical situations can only be implemented through a greedy stepwise algorithm, our method gives a global solution efficiently. Simulations and real examples show that the proposed method is very competitive in terms of variable selection, estimation accuracy, and computation speed when compared with other variable selection and estimation methods.
General empirical Bayes wavelet methods and exactly adaptive minimax estimation

, 2005
"... In many statistical problems, stochastic signals can be represented as a sequence of noisy wavelet coefficients. In this paper, we develop general empirical Bayes methods for the estimation of true signal. Our estimators approximate certain oracle separable rules and achieve adaptation to ideal risk ..."
Abstract

Cited by 17 (1 self)
 Add to MetaCart
In many statistical problems, stochastic signals can be represented as a sequence of noisy wavelet coefficients. In this paper, we develop general empirical Bayes methods for the estimation of true signal. Our estimators approximate certain oracle separable rules and achieve adaptation to ideal risks and exact minimax risks in broad collections of classes of signals. In particular, our estimators are uniformly adaptive to the minimum risk of separable estimators and the exact minimax risks simultaneously in Besov balls of all smoothness and shape indices, and they are uniformly superefficient in convergence rates in all compact sets in Besov spaces with a finite secondary shape parameter. Furthermore, in classes nested between Besov balls of the same smoothness index, our estimators dominate threshold and James–Stein estimators within an infinitesimal fraction of the minimax risks. More general block empirical Bayes estimators are developed. Both white noise with drift and nonparametric regression are considered.
Sparse Regression Learning by Aggregation and Langevin MonteCarlo
, 2009
"... We consider the problem of regression learning for deterministic design and independent random errors. We start by proving a sharp PACBayesian type bound for the exponentially weighted aggregate (EWA) under the expected squared empirical loss. For a broad class of noise distributions the presented ..."
Abstract

Cited by 15 (1 self)
 Add to MetaCart
We consider the problem of regression learning for deterministic design and independent random errors. We start by proving a sharp PACBayesian type bound for the exponentially weighted aggregate (EWA) under the expected squared empirical loss. For a broad class of noise distributions the presented bound is valid whenever the temperature parameter β of the EWA is larger than or equal to 4σ 2, where σ 2 is the noise variance. A remarkable feature of this result is that it is valid even for unbounded regression functions and the choice of the temperature parameter depends exclusively on the noise level. Next, we apply this general bound to the problem of aggregating the elements of a finitedimensional linear space spanned by a dictionary of functions φ1,...,φM. We allow M to be much larger than the sample size n but we assume that the true regression function can be well approximated by a sparse linear combination of functions φj. Under this sparsity scenario, we propose an EWA with a heavy tailed prior and we show that it satisfies a sparsity oracle inequality with leading constant one. Finally, we propose several Langevin MonteCarlo algorithms to approximately compute such an EWA when the number M of aggregated functions can be large. We discuss in some detail the convergence of these algorithms and present numerical experiments that confirm our theoretical findings.
On optimality of Bayesian wavelet estimators
, 2004
"... We investigate the asymptotic optimality of several Bayesian wavelet estimators corresponding to dierent losses, namely, posterior mean, posterior median and Bayes Factor. The considered prior is a mixture of a mass function at zero and a Gaussian density. We show that in terms of the mean squared e ..."
Abstract

Cited by 11 (0 self)
 Add to MetaCart
We investigate the asymptotic optimality of several Bayesian wavelet estimators corresponding to dierent losses, namely, posterior mean, posterior median and Bayes Factor. The considered prior is a mixture of a mass function at zero and a Gaussian density. We show that in terms of the mean squared error, for the properly chosen hyperparameters of the prior all the three resulting Bayesian wavelet estimators achieve optimal minimax rates within any prescribed Besov p;q for p 2. For 1 p < 2, the Bayes Factor is still optimal for (2s+2)=(2s+1) p < 2 and always outperforms the posterior mean and the posterior median that can achieve only the best possible rates for linear estimators in this case. Key words: Bayes Factor, Bayes model; Besov spaces; minimax estimation; nonlinear estimation; nonparametric regression; posterior mean; posterior median; wavelets. 1
Bayesian decision theoretic scale adaptive estimation of a logspectral density
, 2004
"... Abstract: The problem of estimating the logspectrum of a stationary Gaussian time series by Bayesianly induced shrinkage of empirical wavelet coefficients is studied. A model in the wavelet domain that accounts for distributional properties of the logperiodogram at levels of fine detail and approx ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
Abstract: The problem of estimating the logspectrum of a stationary Gaussian time series by Bayesianly induced shrinkage of empirical wavelet coefficients is studied. A model in the wavelet domain that accounts for distributional properties of the logperiodogram at levels of fine detail and approximate normality at coarse levels in the wavelet decomposition, is proposed. The smoothing procedure, called BAMSLP (Bayesian Adaptive Multiscale Shrinker of LogPeriodogram), ensures that the reconstructed logspectrum is as noisefree as possible. It is also shown that the resulting Bayes estimators are asymptotically optimal (in the frequentist sense). Comparisons with nonwavelet and waveletnonBayesian methods are discussed. Key words and phrases: Spectral Density, LogSpectral Density, Wavelets. 1