Results 1  10
of
93
Model selection and estimation in regression with grouped variables
 Journal of the Royal Statistical Society, Series B
, 2006
"... We consider the problem of selecting grouped variables (factors) for accurate prediction in regression. Such a problem arises naturally in many practical situations with the multifactor ANOVA problem as the most important and well known example. Instead of selecting factors by stepwise backward el ..."
Abstract

Cited by 509 (7 self)
 Add to MetaCart
We consider the problem of selecting grouped variables (factors) for accurate prediction in regression. Such a problem arises naturally in many practical situations with the multifactor ANOVA problem as the most important and well known example. Instead of selecting factors by stepwise backward elimination, we focus on estimation accuracy and consider extensions of the LASSO, the LARS, and the nonnegative garrote for factor selection. The LASSO, the LARS, and the nonnegative garrote are recently proposed regression methods that can be used to select individual variables. We study and propose efficient algorithms for the extensions of these methods for factor selection, and show that these extensions give superior performance to the traditional stepwise backward elimination method in factor selection problems. We study the similarities and the differences among these methods. Simulations and real examples are used to illustrate the methods.
Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties
, 2001
"... Variable selection is fundamental to highdimensional statistical modeling, including nonparametric regression. Many approaches in use are stepwise selection procedures, which can be computationally expensive and ignore stochastic errors in the variable selection process. In this article, penalized ..."
Abstract

Cited by 346 (27 self)
 Add to MetaCart
Variable selection is fundamental to highdimensional statistical modeling, including nonparametric regression. Many approaches in use are stepwise selection procedures, which can be computationally expensive and ignore stochastic errors in the variable selection process. In this article, penalized likelihood approaches are proposed to handle these kinds of problems. The proposed methods select variables and estimate coefficients simultaneously. Hence they enable us to construct confidence intervals for estimated parameters. The proposed approaches are distinguished from others in that the penalty functions are symmetric, nonconcave on (0, ∞), and have singularities at the origin to produce sparse solutions. Furthermore, the penalty functions should be bounded by a constant to reduce bias and satisfy certain conditions to yield continuous solutions. A new algorithm is proposed for optimizing penalized likelihood functions. The proposed ideas are widely applicable. They are readily applied to a variety of parametric models such as generalized linear models and robust regression models. They can also be applied easily to nonparametric modeling by using wavelets and splines. Rates of convergence of the proposed penalized likelihood estimators are established. Furthermore, with proper choice of regularization parameters, we show that the proposed estimators perform as well as the oracle procedure in variable selection; namely, they work as well as if the correct submodel were known. Our simulation shows that the newly proposed methods compare favorably with other variable selection techniques. Furthermore, the standard error formulas are tested to be accurate enough for practical applications.
Bayesian Model Averaging for Linear Regression Models
 Journal of the American Statistical Association
, 1997
"... We consider the problem of accounting for model uncertainty in linear regression models. Conditioning on a single selected model ignores model uncertainty, and thus leads to the underestimation of uncertainty when making inferences about quantities of interest. A Bayesian solution to this problem in ..."
Abstract

Cited by 184 (13 self)
 Add to MetaCart
We consider the problem of accounting for model uncertainty in linear regression models. Conditioning on a single selected model ignores model uncertainty, and thus leads to the underestimation of uncertainty when making inferences about quantities of interest. A Bayesian solution to this problem involves averaging over all possible models (i.e., combinations of predictors) when making inferences about quantities of
Model selection and estimation in the Gaussian graphical model
 BIOMETRIKA (2007), PP. 1–17
, 2007
"... ..."
Sure independence screening for ultrahigh dimensional feature space
, 2006
"... Variable selection plays an important role in high dimensional statistical modeling which nowadays appears in many areas and is key to various scientific discoveries. For problems of large scale or dimensionality p, estimation accuracy and computational cost are two top concerns. In a recent paper, ..."
Abstract

Cited by 90 (12 self)
 Add to MetaCart
Variable selection plays an important role in high dimensional statistical modeling which nowadays appears in many areas and is key to various scientific discoveries. For problems of large scale or dimensionality p, estimation accuracy and computational cost are two top concerns. In a recent paper, Candes and Tao (2007) propose the Dantzig selector using L1 regularization and show that it achieves the ideal risk up to a logarithmic factor log p. Their innovative procedure and remarkable result are challenged when the dimensionality is ultra high as the factor log p can be large and their uniform uncertainty principle can fail. Motivated by these concerns, we introduce the concept of sure screening and propose a sure screening method based on a correlation learning, called the Sure Independence Screening (SIS), to reduce dimensionality from high to a moderate scale that is below sample size. In a fairly general asymptotic framework, the SIS is shown to have the sure screening property for even exponentially growing dimensionality. As a methodological extension, an iterative SIS (ISIS) is also proposed to enhance its finite sample performance. With dimension reduced accurately from high to below sample size, variable selection can be improved on both speed and accuracy, and can then be ac
Regularization of Wavelets Approximations
, 1999
"... this paper, weintroduce nonlinear regularized wavelet estimators for estimating nonparametric regression functions when sampling points are not uniformly spaced. The approach can apply readily to many other statistical contexts. Various new penalty functions are proposed. The hardthresholding and s ..."
Abstract

Cited by 85 (7 self)
 Add to MetaCart
this paper, weintroduce nonlinear regularized wavelet estimators for estimating nonparametric regression functions when sampling points are not uniformly spaced. The approach can apply readily to many other statistical contexts. Various new penalty functions are proposed. The hardthresholding and softthresholding estimators of Donoho and Johnstone (1994) are specic members of nonlinear regularized wavelet estimators. They correspond to the lower and upper bound of a class of the penalized leastsquares estimators. Necessary conditions for penalty functions are given for regularized estimators to possess thresholding properties. Oracle inequalities and universal thresholding parameters are obtained for a large class of penalty functions. The sampling properties of nonlinear regularized wavelet estimators are established, and are shown to be adaptively minimax. To eciently solve penalized leastsquares problems, Nonlinear Regularized Sobolev Interpolators (NRSI) are proposed as initial estimators, which are shown to have good sampling properties. The NRSI is further ameliorated by Regularized OneStep Estimators (ROSE), which are the onestep estimators of the penalized leastsquares problems using the NRSI as initial estimators. Two other approaches, the graduated nonconvexity algorithm and wavelet networks, are also introduced to handle penalized leastsquares problems. The newly introduced approaches are also illustrated by a few numerical examples. ####### ########## ## ########## ########### ## ############# ## ####### ######################### ##### ######## ##### ## ####### ######## ### ## ########## ########## ## ########### ########## ## ########### ### ######## ## ########## ### ### ####### ########## ## #### ##### ##### ########### ######### ######### ## ###...
The composite absolute penalties family for grouped and hierarchical variable selection
 Ann. Statist
"... Extracting useful information from highdimensional data is an important focus of today’s statistical research and practice. Penalized loss function minimization has been shown to be effective for this task both theoretically and empirically. With the virtues of both regularization and sparsity, the ..."
Abstract

Cited by 70 (3 self)
 Add to MetaCart
Extracting useful information from highdimensional data is an important focus of today’s statistical research and practice. Penalized loss function minimization has been shown to be effective for this task both theoretically and empirically. With the virtues of both regularization and sparsity, the L1penalized squared error minimization method Lasso has been popular in regression models and beyond. In this paper, we combine different norms including L1 to form an intelligent penalty in order to add side information to the fitting of a regression or classification model to obtain reasonable estimates. Specifically, we introduce the Composite Absolute Penalties (CAP) family, which allows given grouping and hierarchical relationships between the predictors to be expressed. CAP penalties are built by defining groups and combining the properties of norm penalties at the acrossgroup and withingroup levels. Grouped selection occurs for nonoverlapping groups. Hierarchical variable selection is reached
Wavelet Shrinkage Denoising Using the NonNegative Garrote
, 1997
"... In this paper, we combine Donoho and Johnstone's Wavelet Shrinkage denoising technique (known as WaveShrink) with Breiman's nonnegative garrote. We show that the nonnegative garrote shrinkage estimate enjoys the same asymptotic convergence rate as the hard and the soft shrinkage estimates. Simulat ..."
Abstract

Cited by 52 (1 self)
 Add to MetaCart
In this paper, we combine Donoho and Johnstone's Wavelet Shrinkage denoising technique (known as WaveShrink) with Breiman's nonnegative garrote. We show that the nonnegative garrote shrinkage estimate enjoys the same asymptotic convergence rate as the hard and the soft shrinkage estimates. Simulations are used to demonstrate that garrote shrinkage offers advantages over both hard shrinkage (generally smaller meansquare error and less sensitivity to small perturbations in the data) and soft shrinkage (generally smaller bias and overall meansquareerror). The minimax thresholds for the nonnegative garrote are derived and the threshold selection procedure based on Stein's Unbiased Risk Estimate (SURE) is studied. We also propose a threshold selection procedure based on combining Coifman and Donoho's cyclespinning and SURE. The procedure is called SPINSURE. We use examples to show that SPINSURE is more stable than SURE: smaller standard deviation and smaller range. Key Words and Phra...
WaveletBased Image Estimation: An Empirical Bayes Approach Using Jeffreys' Noninformative Prior
, 2001
"... The sparseness and decorrelation properties of the discrete wavelet transform have been exploited to develop powerful denoising methods. However, most of these methods have free parameters which have to be adjusted or estimated. In this paper, we propose a waveletbased denoising technique without a ..."
Abstract

Cited by 50 (11 self)
 Add to MetaCart
The sparseness and decorrelation properties of the discrete wavelet transform have been exploited to develop powerful denoising methods. However, most of these methods have free parameters which have to be adjusted or estimated. In this paper, we propose a waveletbased denoising technique without any free parameters; it is, in this sense, a "universal" method. Our approach uses empirical Bayes estimation based on a Jeffreys' noninformative prior; it is a step toward objective Bayesian waveletbased denoising. The result is a remarkably simple fixed nonlinear shrinkage/thresholding rule which performs better than other more computationally demanding methods.
An empirical bayesian strategy for solving the simultaneous sparse approximation problem
 IEEE Trans. Sig. Proc
, 2007
"... Abstract—Given a large overcomplete dictionary of basis vectors, the goal is to simultaneously represent 1 signal vectors using coefficient expansions marked by a common sparsity profile. This generalizes the standard sparse representation problem to the case where multiple responses exist that were ..."
Abstract

Cited by 40 (8 self)
 Add to MetaCart
Abstract—Given a large overcomplete dictionary of basis vectors, the goal is to simultaneously represent 1 signal vectors using coefficient expansions marked by a common sparsity profile. This generalizes the standard sparse representation problem to the case where multiple responses exist that were putatively generated by the same small subset of features. Ideally, the associated sparse generating weights should be recovered, which can have physical significance in many applications (e.g., source localization). The generic solution to this problem is intractable and, therefore, approximate procedures are sought. Based on the concept of automatic relevance determination, this paper uses an empirical Bayesian prior to estimate a convenient posterior distribution over candidate basis vectors. This particular approximation enforces a common sparsity profile and consistently places its prominent posterior mass on the appropriate region of weightspace necessary for simultaneous sparse recovery. The resultant algorithm is then compared with multiple response extensions of matching pursuit, basis pursuit, FOCUSS, and Jeffreys priorbased Bayesian methods, finding that it often outperforms the others. Additional motivation for this particular choice of cost function is also provided, including the analysis of global and local minima and a variational derivation that highlights the similarities and differences between the proposed algorithm and previous approaches. Index Terms—Automatic relevance determination, empirical Bayes, multiple response models, simultaneous sparse approximation, sparse Bayesian learning, variable selection. I.