Results 1  10
of
117
Covariance regularization by thresholding
, 2007
"... This paper considers regularizing a covariance matrix of p variables estimated from n observations, by hard thresholding. We show that the thresholded estimate is consistent in the operator norm as long as the true covariance matrix is sparse in a suitable sense, the variables are Gaussian or subGa ..."
Abstract

Cited by 148 (11 self)
 Add to MetaCart
(Show Context)
This paper considers regularizing a covariance matrix of p variables estimated from n observations, by hard thresholding. We show that the thresholded estimate is consistent in the operator norm as long as the true covariance matrix is sparse in a suitable sense, the variables are Gaussian or subGaussian, and (log p)/n → 0, and obtain explicit rates. The results are uniform over families of covariance matrices which satisfy a fairly natural notion of sparsity. We discuss an intuitive resampling scheme for threshold selection and prove a general crossvalidation result that justifies this approach. We also compare thresholding to other covariance estimators in simulations and on an example from climate data. 1. Introduction. Estimation
Mixtures of gpriors for Bayesian variable selection
 Journal of the American Statistical Association
, 2008
"... Zellner’s gprior remains a popular conventional prior for use in Bayesian variable selection, despite several undesirable consistency issues. In this paper, we study mixtures of gpriors as an alternative to default gpriors that resolve many of the problems with the original formulation, while mai ..."
Abstract

Cited by 87 (7 self)
 Add to MetaCart
(Show Context)
Zellner’s gprior remains a popular conventional prior for use in Bayesian variable selection, despite several undesirable consistency issues. In this paper, we study mixtures of gpriors as an alternative to default gpriors that resolve many of the problems with the original formulation, while maintaining the computational tractability that has made the gprior so popular. We present theoretical properties of the mixture gpriors and provide real and simulated examples to compare the mixture formulation with fixed gpriors, Empirical Bayes approaches and other default procedures.
Needles and straw in haystacks: Empirical Bayes estimates of possibly sparse sequences
 Ann. Statist
, 2002
"... An empirical Bayes approach to the estimation of possibly sparse sequences observed in Gaussian white noise is set out and investigated. The prior considered is a mixture of an atom of probability at zero and a heavytailed density, with the mixing weight chosen by marginal maximum likelihood, in ..."
Abstract

Cited by 87 (5 self)
 Add to MetaCart
An empirical Bayes approach to the estimation of possibly sparse sequences observed in Gaussian white noise is set out and investigated. The prior considered is a mixture of an atom of probability at zero and a heavytailed density, with the mixing weight chosen by marginal maximum likelihood, in the hope of adapting between sparse and dense sequences. If estimation is then carried out using the posterior median, this is a random thresholding procedure. Other thresholding rules using the same threshold can also be used. Probability bounds on the threshold chosen by the marginal maximum likelihood approach lead to overall bounds on the risk of the method over the class of signal sequences of length n with normalized ` p norm bounded by , for > 0 and 0 < p 2: Estimation error is measured by mean q loss, for 0 < q 2: For all p and q in (0; 2], the method achieves the optimal estimation rate as n ! 1 and ! 0 at various rates, and in this sense adapts automatically to the sparseness or otherwise of the underlying signal. In addition the risk is uniformly bounded over all signals. If the posterior mean is used as the estimator, the results still hold for q > 1: Simulations show excellent performance. Computationally, the method is tractable and essentially of O(n) complexity, and software is available. The extension to a modi ed thresholding method relevant to the wavelet estimation of derivatives of functions is also considered.
Mixtures of g priors for Bayesian Variable Selection,” ISDS working paper
, 2005
"... Zellner’s g prior remains a popular conventional prior for use in Bayesian variable selection, despite several undesirable consistency issues. In this article we study mixtures of g priors as an alternative to default g priors that resolve many of the problems with the original formulation while mai ..."
Abstract

Cited by 85 (4 self)
 Add to MetaCart
(Show Context)
Zellner’s g prior remains a popular conventional prior for use in Bayesian variable selection, despite several undesirable consistency issues. In this article we study mixtures of g priors as an alternative to default g priors that resolve many of the problems with the original formulation while maintaining the computational tractability that has made the g prior so popular. We present theoretical properties of the mixture g priors and provide real and simulated examples to compare the mixture formulation with fixed g priors, empirical Bayes approaches, and other default procedures.
Variable selection in data mining: Building a predictive model for bankruptcy
 Journal of the American Statistical Association
, 2004
"... We predict the onset of personal bankruptcy using least squares regression. Although well publicized, only 2,244 bankruptcies occur in our data set of 2.9 million months of creditcard activity. We use stepwise selection to find predictors from a mix of payment history, debt load, demographics, and ..."
Abstract

Cited by 52 (10 self)
 Add to MetaCart
(Show Context)
We predict the onset of personal bankruptcy using least squares regression. Although well publicized, only 2,244 bankruptcies occur in our data set of 2.9 million months of creditcard activity. We use stepwise selection to find predictors from a mix of payment history, debt load, demographics, and their interactions. This combination of rare responses and over 67,000 possible predictors leads to a challenging modeling question: How does one separate coincidental from useful predictors? We show that three modifications turn stepwise regression into an effective methodology for predicting bankruptcy. Our version of stepwise regression (1) organizes calculations to accommodate interactions, (2) exploits modern decision theoretic criteria to choose predictors, and (3) conservatively estimates pvalues to handle sparse data and a binary response. Omitting any one of these leads to poor performance. A final step in our procedure calibrates regression predictions. With these modifications, stepwise regression predicts bankruptcy as well, if not better, than recently developed datamining tools. When sorted, the largest 14,000 resulting predictions hold 1000 of the 1800 bankruptcies hidden in a validation sample of 2.3 million observations. If the cost of missing a bankruptcy is 200 times that of a false positive, our predictions incur less than 2/3 of the costs of classification errors produced by the treebased classifier C4.5. Key Phrases: AIC, Cp, Bonferroni, calibration, hard thresholding, risk inflation criterion (RIC),
Efficient empirical Bayes variable selection and estimation in linear models
 J. Amer. Statist. Assoc
, 2005
"... We propose an empirical Bayes method for variable selection and coefficient estimation in linear regression models. The method is based on a particular hierarchical Bayes formulation, and the empirical Bayes estimator is shown to be closely related to the LASSO estimator. Such a connection allows u ..."
Abstract

Cited by 41 (5 self)
 Add to MetaCart
We propose an empirical Bayes method for variable selection and coefficient estimation in linear regression models. The method is based on a particular hierarchical Bayes formulation, and the empirical Bayes estimator is shown to be closely related to the LASSO estimator. Such a connection allows us to take advantage of the recently developed quick LASSO algorithm to compute the empirical Bayes estimate, and provides a new way to select the tuning parameter in the LASSO method. Unlike previous empirical Bayes variable selection methods, which in most practical situations can only be implemented through a greedy stepwise algorithm, our method gives a global solution efficiently. Simulations and real examples show that the proposed method is very competitive in terms of variable selection, estimation accuracy, and computation speed when compared with other variable selection and estimation methods.
Sparse regression learning by aggregation and Langevin MonteCarlo.
 Journal of Computer and System Science,
, 2012
"... Abstract We consider the problem of regression learning for deterministic design and independent random errors. We start by proving a sharp PACBayesian type bound for the exponentially weighted aggregate (EWA) under the expected squared empirical loss. For a broad class of noise distributions the ..."
Abstract

Cited by 33 (4 self)
 Add to MetaCart
(Show Context)
Abstract We consider the problem of regression learning for deterministic design and independent random errors. We start by proving a sharp PACBayesian type bound for the exponentially weighted aggregate (EWA) under the expected squared empirical loss. For a broad class of noise distributions the presented bound is valid whenever the temperature parameter β of the EWA is larger than or equal to 4σ 2 , where σ 2 is the noise variance. A remarkable feature of this result is that it is valid even for unbounded regression functions and the choice of the temperature parameter depends exclusively on the noise level. Next, we apply this general bound to the problem of aggregating the elements of a finitedimensional linear space spanned by a dictionary of functions φ 1 , . . . , φ M . We allow M to be much larger than the sample size n but we assume that the true regression function can be well approximated by a sparse linear combination of functions φ j . Under this sparsity scenario, we propose an EWA with a heavy tailed prior and we show that it satisfies a sparsity oracle inequality with leading constant one. Finally, we propose several Langevin MonteCarlo algorithms to approximately compute such an EWA when the number M of aggregated functions can be large. We discuss in some detail the convergence of these algorithms and present numerical experiments that confirm our theoretical findings.
General empirical Bayes wavelet methods and exactly adaptive minimax estimation

, 2005
"... In many statistical problems, stochastic signals can be represented as a sequence of noisy wavelet coefficients. In this paper, we develop general empirical Bayes methods for the estimation of true signal. Our estimators approximate certain oracle separable rules and achieve adaptation to ideal risk ..."
Abstract

Cited by 25 (3 self)
 Add to MetaCart
In many statistical problems, stochastic signals can be represented as a sequence of noisy wavelet coefficients. In this paper, we develop general empirical Bayes methods for the estimation of true signal. Our estimators approximate certain oracle separable rules and achieve adaptation to ideal risks and exact minimax risks in broad collections of classes of signals. In particular, our estimators are uniformly adaptive to the minimum risk of separable estimators and the exact minimax risks simultaneously in Besov balls of all smoothness and shape indices, and they are uniformly superefficient in convergence rates in all compact sets in Besov spaces with a finite secondary shape parameter. Furthermore, in classes nested between Besov balls of the same smoothness index, our estimators dominate threshold and James–Stein estimators within an infinitesimal fraction of the minimax risks. More general block empirical Bayes estimators are developed. Both white noise with drift and nonparametric regression are considered.
Multiscale methods for data on graphs and irregular multidimensional situations,
, 2008
"... Summary. For regularly spaced 1D data, wavelet shrinkage has proven to be a compelling method for nonparametric function estimation. We create three new multiscale methods that provide waveletlike transforms for both data arising on graphs and for irregularly spaced spatial data in more than 1D. T ..."
Abstract

Cited by 20 (0 self)
 Add to MetaCart
(Show Context)
Summary. For regularly spaced 1D data, wavelet shrinkage has proven to be a compelling method for nonparametric function estimation. We create three new multiscale methods that provide waveletlike transforms for both data arising on graphs and for irregularly spaced spatial data in more than 1D. The concept of scale still exists within these transforms but as a continuous quantity rather than dyadic levels. Further, we adapt recent empirical Bayesian shrinkage techniques to enable us to perform multiscale shrinkage for function estimation both on graphs and for irregular spatial data. We demonstrate that our methods perform very well when compared to several other methods for spatial regression for both real and simulated data. Although our article concentrates on multiscale shrinkage (regression) we present our new 'wavelet transforms' as generic tools intended to be the basis of methods that might benefit from a multiscale representation of data either on graphs or for irregular spatial data.