Results 1  10
of
94
Model selection and estimation in regression with grouped variables
 Journal of the Royal Statistical Society, Series B
, 2006
"... We consider the problem of selecting grouped variables (factors) for accurate prediction in regression. Such a problem arises naturally in many practical situations with the multifactor ANOVA problem as the most important and well known example. Instead of selecting factors by stepwise backward el ..."
Abstract

Cited by 509 (7 self)
 Add to MetaCart
We consider the problem of selecting grouped variables (factors) for accurate prediction in regression. Such a problem arises naturally in many practical situations with the multifactor ANOVA problem as the most important and well known example. Instead of selecting factors by stepwise backward elimination, we focus on estimation accuracy and consider extensions of the LASSO, the LARS, and the nonnegative garrote for factor selection. The LASSO, the LARS, and the nonnegative garrote are recently proposed regression methods that can be used to select individual variables. We study and propose efficient algorithms for the extensions of these methods for factor selection, and show that these extensions give superior performance to the traditional stepwise backward elimination method in factor selection problems. We study the similarities and the differences among these methods. Simulations and real examples are used to illustrate the methods.
Regularization and variable selection via the Elastic Net
 Journal of the Royal Statistical Society, Series B
, 2005
"... Summary. We propose the elastic net, a new regularization and variable selection method. Real world data and a simulation study show that the elastic net often outperforms the lasso, while enjoying a similar sparsity of representation. In addition, the elastic net encourages a grouping effect, where ..."
Abstract

Cited by 360 (8 self)
 Add to MetaCart
Summary. We propose the elastic net, a new regularization and variable selection method. Real world data and a simulation study show that the elastic net often outperforms the lasso, while enjoying a similar sparsity of representation. In addition, the elastic net encourages a grouping effect, where strongly correlated predictors tend to be in or out of the model together.The elastic net is particularly useful when the number of predictors (p) is much bigger than the number of observations (n). By contrast, the lasso is not a very satisfactory variable selection method in the p n case. An algorithm called LARSEN is proposed for computing elastic net regularization paths efficiently, much like algorithm LARS does for the lasso.
Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties
, 2001
"... Variable selection is fundamental to highdimensional statistical modeling, including nonparametric regression. Many approaches in use are stepwise selection procedures, which can be computationally expensive and ignore stochastic errors in the variable selection process. In this article, penalized ..."
Abstract

Cited by 346 (27 self)
 Add to MetaCart
Variable selection is fundamental to highdimensional statistical modeling, including nonparametric regression. Many approaches in use are stepwise selection procedures, which can be computationally expensive and ignore stochastic errors in the variable selection process. In this article, penalized likelihood approaches are proposed to handle these kinds of problems. The proposed methods select variables and estimate coefficients simultaneously. Hence they enable us to construct confidence intervals for estimated parameters. The proposed approaches are distinguished from others in that the penalty functions are symmetric, nonconcave on (0, ∞), and have singularities at the origin to produce sparse solutions. Furthermore, the penalty functions should be bounded by a constant to reduce bias and satisfy certain conditions to yield continuous solutions. A new algorithm is proposed for optimizing penalized likelihood functions. The proposed ideas are widely applicable. They are readily applied to a variety of parametric models such as generalized linear models and robust regression models. They can also be applied easily to nonparametric modeling by using wavelets and splines. Rates of convergence of the proposed penalized likelihood estimators are established. Furthermore, with proper choice of regularization parameters, we show that the proposed estimators perform as well as the oracle procedure in variable selection; namely, they work as well as if the correct submodel were known. Our simulation shows that the newly proposed methods compare favorably with other variable selection techniques. Furthermore, the standard error formulas are tested to be accurate enough for practical applications.
Pathwise coordinate optimization
, 2007
"... We consider “oneatatime ” coordinatewise descent algorithms for a class of convex optimization problems. An algorithm of this kind has been proposed for the L1penalized regression (lasso) in the lterature, but it seems to have been largely ignored. Indeed, it seems that coordinatewise algorith ..."
Abstract

Cited by 166 (19 self)
 Add to MetaCart
We consider “oneatatime ” coordinatewise descent algorithms for a class of convex optimization problems. An algorithm of this kind has been proposed for the L1penalized regression (lasso) in the lterature, but it seems to have been largely ignored. Indeed, it seems that coordinatewise algorithms are not often used in convex optimization. We show that this algorithm is very competitive with the well known LARS (or homotopy) procedure in large lasso problems, and that it can be applied to related methods such as the garotte and elastic net. It turns out that coordinatewise descent does not work in the “fused lasso ” however, so we derive a generalized algorithm that yields the solution in much less time that a standard convex optimizer. Finally we generalize the procedure to the twodimensional fused lasso, and demonstrate its performance on some image smoothing problems.
On the LASSO and Its Dual
 Journal of Computational and Graphical Statistics
, 1999
"... Proposed by Tibshirani (1996), the LASSO (least absolute shrinkage and selection operator) estimates a vector of regression coe#cients by minimising the residual sum of squares subject to a constraint on the l 1 norm of coe#cient vector. The LASSO estimator typically has one or more zero elements ..."
Abstract

Cited by 146 (2 self)
 Add to MetaCart
Proposed by Tibshirani (1996), the LASSO (least absolute shrinkage and selection operator) estimates a vector of regression coe#cients by minimising the residual sum of squares subject to a constraint on the l 1 norm of coe#cient vector. The LASSO estimator typically has one or more zero elements and thus shares characteristics of both shrinkage estimation and variable selection. In this paper we treat the LASSO as a convex programming problem and derive its dual. Consideration of the primal and dual problems together leads to important new insights into the characteristics of the LASSO estimator and to an improved method for estimating its covariance matrix. Using these results we also develop an e#cient algorithm for computing LASSO estimates which is usable even in cases where the number of regressors exceeds the number of observations. KEY WORDS AND PHRASES. Convex Programming, Dual Problem, Partial Least Squares, Quadratic Programming, Penalised Regression, Regression, Shrinkag...
Asymptotics for Lassotype estimators
, 2000
"... this paper, we consider the asymptotic behaviour of regression estimators that minimize the residual sum of squares plus a penalty proportional to ..."
Abstract

Cited by 138 (3 self)
 Add to MetaCart
this paper, we consider the asymptotic behaviour of regression estimators that minimize the residual sum of squares plus a penalty proportional to
Online learning for matrix factorization and sparse coding
"... Sparse coding—that is, modelling data vectors as sparse linear combinations of basis elements—is widely used in machine learning, neuroscience, signal processing, and statistics. This paper focuses on the largescale matrix factorization problem that consists of learning the basis set, adapting it t ..."
Abstract

Cited by 97 (18 self)
 Add to MetaCart
Sparse coding—that is, modelling data vectors as sparse linear combinations of basis elements—is widely used in machine learning, neuroscience, signal processing, and statistics. This paper focuses on the largescale matrix factorization problem that consists of learning the basis set, adapting it to specific data. Variations of this problem include dictionary learning in signal processing, nonnegative matrix factorization and sparse principal component analysis. In this paper, we propose to address these tasks with a new online optimization algorithm, based on stochastic approximations, which scales up gracefully to large datasets with millions of training samples, and extends naturally to various matrix factorization formulations, making it suitable for a wide range of learning problems. A proof of convergence is presented, along with experiments with natural images and genomic data demonstrating that it leads to stateoftheart performance in terms of speed and optimization for both small and large datasets.
Sparse Permutation Invariant Covariance Estimation
 Electronic Journal of Statistics
, 2008
"... The paper proposes a method for constructing a sparse estimator for the inverse covariance (concentration) matrix in highdimensional settings. The estimator uses a penalized normal likelihood approach and forces sparsity by using a lassotype penalty. We establish a rate of convergence in the Fro ..."
Abstract

Cited by 83 (5 self)
 Add to MetaCart
The paper proposes a method for constructing a sparse estimator for the inverse covariance (concentration) matrix in highdimensional settings. The estimator uses a penalized normal likelihood approach and forces sparsity by using a lassotype penalty. We establish a rate of convergence in the Frobenius norm as both data dimension p and sample size n are allowed to grow, and show that the rate depends explicitly on how sparse the true concentration matrix is. We also show that a correlationbased version of the method exhibits better rates in the operator norm. The estimator is required to be positive definite, but we avoid having to use semidefinite programming by reparameterizing the objective function
Onestep sparse estimates in nonconcave penalized likelihood models. Ann. Statist., to appear. 36 Proof of Theorems 2(ii) and 4 Proof of Theorem 2(ii). To prove asymptotic normality for ˆφ n1, note that by (A.23), for αn with ‖αn‖ = 1 and νn = αnHnαn, n 1
 n1) = I1 + I2 + I3, (S.1) where I2 = λn(nνn) −1/2 α T n G−1 11 Wns/2 , I3
, 2008
"... Fan and Li propose a family of variable selection methods via penalized likelihood using concave penalty functions. The nonconcave penalized likelihood estimators enjoy the oracle properties, but maximizing the penalized likelihood function is computationally challenging, because the objective funct ..."
Abstract

Cited by 58 (0 self)
 Add to MetaCart
Fan and Li propose a family of variable selection methods via penalized likelihood using concave penalty functions. The nonconcave penalized likelihood estimators enjoy the oracle properties, but maximizing the penalized likelihood function is computationally challenging, because the objective function is nondifferentiable and nonconcave. In this article, we propose a new unified algorithm based on the local linear approximation (LLA) for maximizing the penalized likelihood for a broad class of concave penalty functions. Convergence and other theoretical properties of the LLA algorithm are established. A distinguished feature of the LLA algorithm is that at each LLA step, the LLA estimator can naturally adopt a sparse representation. Thus, we suggest using the onestep LLA estimator from the LLA algorithm as the final estimates. Statistically, we show that if the regularization parameter is appropriately chosen, the onestep LLA estimates enjoy the oracle properties with good initial estimators. Computationally, the onestep LLA estimation methods dramatically reduce the computational cost in maximizing the nonconcave penalized likelihood. We conduct some Monte Carlo simulation to assess the finite sample performance of the onestep sparse estimation methods. The results are very encouraging. 1. Introduction. Variable