Results 21  30
of
948
Variable Selection for Cox's Proportional Hazards Model and Frailty Model
 ANNALS OF STATISTICS
, 2002
"... A class of variable selection procedures for parametric models via nonconcave penalized likelihood was proposed in Fan and Li (2001a). It has been shown there that the resulting procedures perform as well as if the subset of significant variables were known in advance. Such a property is called an o ..."
Abstract

Cited by 92 (14 self)
 Add to MetaCart
A class of variable selection procedures for parametric models via nonconcave penalized likelihood was proposed in Fan and Li (2001a). It has been shown there that the resulting procedures perform as well as if the subset of significant variables were known in advance. Such a property is called an oracle property. The proposed procedures were illustrated in the context of linear regression, robust linear regression and generalized linear models. In this paper, the nonconcave penalized likelihood approach is extended further to the Cox proportional hazards model and the Cox proportional hazards frailty model, two commonly used semiparametric models in survival analysis. As a result, new variable selection procedures for these two commonlyused models are proposed. It is demonstrated how the rates of convergence depend on the regularization parameter in the penalty function. Further, with a proper choice of the regularization parameter and the penalty function, the proposed estimators possess an oracle property. Standard error formulae are derived and their accuracies are empirically tested. Simulation studies show that the proposed procedures are more stable in prediction and more effective in computation than the best subset variable selection, and they reduce model complexity as effectively as the best subset variable selection. Compared with the LASSO, which is the penalized likelihood method with the L1penalty, proposed by Tibshirani, the newly proposed approaches have better theoretic properties and finite sample performance.
A note on the LASSO and related procedures in model selection
 STATISTICA SINICA
, 2004
"... The Lasso, the Forward Stagewise regression and the Lars are closely related procedures recently proposed for linear regression problems. Each of them can produce sparse models and can be used both for estimation and variable selection. In practical implementations these algorithms are typically tu ..."
Abstract

Cited by 78 (12 self)
 Add to MetaCart
The Lasso, the Forward Stagewise regression and the Lars are closely related procedures recently proposed for linear regression problems. Each of them can produce sparse models and can be used both for estimation and variable selection. In practical implementations these algorithms are typically tuned to achieve optimal prediction accuracy. We show that, when the prediction accuracy is used as the criterion to choose the tuning parameter, in general these procedures are not consistent in terms of variable selection. That is, the sets of variables selected are not consistent at finding the true set of important variables. In particular, we show that for any sample size n, when there are superfluous variables in the linear regression model and the design matrix is orthogonal, the probability of the procedures correctly identifying the true set of important variables is less than a constant (smaller than one) not depending on n. This result is also shown to hold for two dimensional problems with general correlated design matrices. The results indicate that in problems where
SUPPORT UNION RECOVERY IN HIGHDIMENSIONAL MULTIVARIATE REGRESSION
 SUBMITTED TO THE ANNALS OF STATISTICS
, 2010
"... In multivariate regression, a Kdimensional response vector is regressed upon a common set of p covariates, with a matrix B ∗ ∈ R p×K of regression coefficients. We study the behavior of the multivariate group Lasso, in which block regularization based on the ℓ1/ℓ2 norm is used for support union re ..."
Abstract

Cited by 78 (3 self)
 Add to MetaCart
In multivariate regression, a Kdimensional response vector is regressed upon a common set of p covariates, with a matrix B ∗ ∈ R p×K of regression coefficients. We study the behavior of the multivariate group Lasso, in which block regularization based on the ℓ1/ℓ2 norm is used for support union recovery, or recovery of the set of s rows for which B ∗ is nonzero. Under highdimensional scaling, we show that the multivariate group Lasso exhibits a threshold for the recovery of the exact row pattern with high probability over the random design and noise that is specified by the sample complexity parameter θ(n, p, s) : = n/[2ψ(B ∗ ) log(p − s)]. Here n is the sample size, and ψ(B ∗ ) is a sparsityoverlap function measuring a combination of the sparsities and overlaps of the Kregression coefficient vectors that constitute the model. We prove that the multivariate group Lasso succeeds for problem sequences (n, p, s) such that θ(n, p, s) exceeds a critical level θu, and fails for sequences such that θ(n, p, s) lies below a critical level θℓ. For the special case of the standard Gaussian ensemble, we show that θℓ = θu so that the characterization is sharp. The sparsityoverlap function ψ(B ∗ ) reveals that, if the design is uncorrelated on the active rows, ℓ1/ℓ2 regularization for multivariate regression never harms performance relative to an ordinary Lasso approach, and can yield substantial improvements in sample complexity (up to a factor of K) when the coefficient vectors are suitably orthogonal. For more general designs, it is possible for the ordinary Lasso to outperform the multivariate group Lasso. We complement our analysis with simulations that demonstrate the sharpness of our theoretical results, even for relatively small problems.
2007b). Tuning parameter selectors for the smoothly clipped absolute deviation method
 Biometrika
"... The penalised least squares approach with smoothly clipped absolute deviation penalty has been consistently demonstrated to be an attractive regression shrinkage and selection method. It not only automatically and consistently selects the important variables, but also produces estimators which are ..."
Abstract

Cited by 78 (11 self)
 Add to MetaCart
The penalised least squares approach with smoothly clipped absolute deviation penalty has been consistently demonstrated to be an attractive regression shrinkage and selection method. It not only automatically and consistently selects the important variables, but also produces estimators which are as efficient as the oracle estimator. However, these attractive features depend on appropriately choosing the tuning parameter. We show that the commonly used the generalised crossvalidation cannot select the tuning parameter satisfactorily, with a nonignorable overfitting effect in the resulting model. In addition, we propose a bic tuning parameter selector, which is shown to be able to identify the true model consistently. Simulation studies are presented to support theoretical findings, and an empirical example is given to illustrate its use in the Female Labor Supply data. Some key words: aic; bic; Generalised crossvalidation; Least absolute shrinkage and selection operator; Smoothly clipped absolute deviation 1 1.
Component selection and smoothing in multivariate nonparametric regression
"... We propose a new method for model selection and model fitting in multivariate nonparametric regression models, in the framework of smoothing spline ANOVA. The “COSSO ” is a method of regularization with the penalty functional being the sum of component norms, instead of the squared norm employed in ..."
Abstract

Cited by 76 (1 self)
 Add to MetaCart
(Show Context)
We propose a new method for model selection and model fitting in multivariate nonparametric regression models, in the framework of smoothing spline ANOVA. The “COSSO ” is a method of regularization with the penalty functional being the sum of component norms, instead of the squared norm employed in the traditional smoothing spline method. The COSSO provides a unified framework for several recent proposals for model selection in linear models and smoothing spline ANOVA models. Theoretical properties, such as the existence and the rate of convergence of the COSSO estimator, are studied. In the special case of a tensor product design with periodic functions, a detailed analysis reveals that the COSSO does model selection by applying a novel soft thresholding type operation to the function components. We give an equivalent formulation of the COSSO estimator which leads naturally to an iterative algorithm. We compare the COSSO with MARS, a popular method that builds functional ANOVA models, in simulations and real examples. The COSSO method can be extended to classification problems and we compare its performance with those of a number of machine learning algorithms on real datasets. The COSSO gives very competitive performance in these studies. 1. Introduction. Consider
Variable Selection Using MM Algorithm
 Annals of Statistics
, 2005
"... Variable selection is fundamental to highdimensional statistical modeling. Many variable selection techniques may be implemented by maximum penalized likelihood using various penalty functions. Optimizing the penalized likelihood function is often challenging because it may be nondifferentiable and ..."
Abstract

Cited by 75 (7 self)
 Add to MetaCart
(Show Context)
Variable selection is fundamental to highdimensional statistical modeling. Many variable selection techniques may be implemented by maximum penalized likelihood using various penalty functions. Optimizing the penalized likelihood function is often challenging because it may be nondifferentiable and/or nonconcave. This article proposes a new class of algorithms for finding a maximizer of the penalized likelihood for a broad class of penalty functions. These algorithms operate by perturbing the penalty function slightly to render it differentiable, then optimizing this differentiable function using a minorize–maximize (MM) algorithm. MM algorithms are useful extensions of the wellknown class of EM algorithms, a fact that allows us to analyze the local and global convergence of the proposed algorithm using some of the techniques employed for EM algorithms. In particular, we prove that when our MM algorithms converge, they must converge to a desirable point; we also discuss conditions under which this convergence may be guaranteed. We exploit the Newton–Raphsonlike aspect of these algorithms
Extended Bayesian information criteria for model selection with large model spaces
 Biometrika
, 2008
"... The ordinary Bayes information criterion is too liberal for model selection when the model space is large. In this article, we reexamine the Bayesian paradigm for model selection and propose an extended family of Bayes information criteria. The new criteria take into account both the number of unkn ..."
Abstract

Cited by 72 (2 self)
 Add to MetaCart
The ordinary Bayes information criterion is too liberal for model selection when the model space is large. In this article, we reexamine the Bayesian paradigm for model selection and propose an extended family of Bayes information criteria. The new criteria take into account both the number of unknown parameters and the complexity of the model space. Their consistency is established, in particular allowing the number of covariates to increase to infinity with the sample size. Their performance in various situations is evaluated by simulation studies. It is demonstrated that the extended Bayes information criteria incur a small loss in the positive selection rate but tightly control the false discovery rate, a desirable property in many applications. The extended Bayes information criteria are extremely useful for variable selection in problems with a moderate sample size but a huge number of covariates, especially in genomewide association studies, which are now an active area in genetics research.
SparseNet: Coordinate Descent with NonConvex Penalties
, 2009
"... We address the problem of sparse selection in linear models. A number of nonconvex penalties have been proposed for this purpose, along with a variety of convexrelaxation algorithms for finding good solutions. In this paper we pursue the coordinatedescent approach for optimization, and study its ..."
Abstract

Cited by 71 (0 self)
 Add to MetaCart
(Show Context)
We address the problem of sparse selection in linear models. A number of nonconvex penalties have been proposed for this purpose, along with a variety of convexrelaxation algorithms for finding good solutions. In this paper we pursue the coordinatedescent approach for optimization, and study its convergence properties. We characterize the properties of penalties suitable for this approach, study their corresponding threshold functions, and describe a dfstandardizing reparametrization that assists our pathwise algorithm. The MC+ penalty (Zhang 2010) is ideally suited to this task, and we use it to demonstrate the performance of our algorithm. 1
A SELECTIVE OVERVIEW OF VARIABLE SELECTION IN HIGH DIMENSIONAL FEATURE SPACE
, 2010
"... High dimensional statistical problems arise from diverse fields of scientific research and technological development. Variable selection plays a pivotal role in contemporary statistical learning and scientific discoveries. The traditional idea of best subset selection methods, which can be regarded ..."
Abstract

Cited by 70 (6 self)
 Add to MetaCart
High dimensional statistical problems arise from diverse fields of scientific research and technological development. Variable selection plays a pivotal role in contemporary statistical learning and scientific discoveries. The traditional idea of best subset selection methods, which can be regarded as a specific form of penalized likelihood, is computationally too expensive for many modern statistical applications. Other forms of penalized likelihood methods have been successfully developed over the last decade to cope with high dimensionality. They have been widely applied for simultaneously selecting important variables and estimating their effects in high dimensional statistical inference. In this article, we present a brief account of the recent developments of theory, methods, and implementations for high dimensional variable selection. What limits of the dimensionality such methods can handle, what the role of penalty functions is, and what the statistical properties are rapidly drive the advances of the field. The properties of nonconcave penalized likelihood and its roles in high dimensional statistical modeling are emphasized. We also review some recent advances in ultrahigh dimensional variable selection, with emphasis on independence screening and twoscale methods.