Results 1  10
of
33
Sure independence screening for ultrahigh dimensional feature space
, 2006
"... Variable selection plays an important role in high dimensional statistical modeling which nowadays appears in many areas and is key to various scientific discoveries. For problems of large scale or dimensionality p, estimation accuracy and computational cost are two top concerns. In a recent paper, ..."
Abstract

Cited by 90 (12 self)
 Add to MetaCart
Variable selection plays an important role in high dimensional statistical modeling which nowadays appears in many areas and is key to various scientific discoveries. For problems of large scale or dimensionality p, estimation accuracy and computational cost are two top concerns. In a recent paper, Candes and Tao (2007) propose the Dantzig selector using L1 regularization and show that it achieves the ideal risk up to a logarithmic factor log p. Their innovative procedure and remarkable result are challenged when the dimensionality is ultra high as the factor log p can be large and their uniform uncertainty principle can fail. Motivated by these concerns, we introduce the concept of sure screening and propose a sure screening method based on a correlation learning, called the Sure Independence Screening (SIS), to reduce dimensionality from high to a moderate scale that is below sample size. In a fairly general asymptotic framework, the SIS is shown to have the sure screening property for even exponentially growing dimensionality. As a methodological extension, an iterative SIS (ISIS) is also proposed to enhance its finite sample performance. With dimension reduced accurately from high to below sample size, variable selection can be improved on both speed and accuracy, and can then be ac
Onestep sparse estimates in nonconcave penalized likelihood models. Ann. Statist., to appear. 36 Proof of Theorems 2(ii) and 4 Proof of Theorem 2(ii). To prove asymptotic normality for ˆφ n1, note that by (A.23), for αn with ‖αn‖ = 1 and νn = αnHnαn, n 1
 n1) = I1 + I2 + I3, (S.1) where I2 = λn(nνn) −1/2 α T n G−1 11 Wns/2 , I3
, 2008
"... Fan and Li propose a family of variable selection methods via penalized likelihood using concave penalty functions. The nonconcave penalized likelihood estimators enjoy the oracle properties, but maximizing the penalized likelihood function is computationally challenging, because the objective funct ..."
Abstract

Cited by 58 (0 self)
 Add to MetaCart
Fan and Li propose a family of variable selection methods via penalized likelihood using concave penalty functions. The nonconcave penalized likelihood estimators enjoy the oracle properties, but maximizing the penalized likelihood function is computationally challenging, because the objective function is nondifferentiable and nonconcave. In this article, we propose a new unified algorithm based on the local linear approximation (LLA) for maximizing the penalized likelihood for a broad class of concave penalty functions. Convergence and other theoretical properties of the LLA algorithm are established. A distinguished feature of the LLA algorithm is that at each LLA step, the LLA estimator can naturally adopt a sparse representation. Thus, we suggest using the onestep LLA estimator from the LLA algorithm as the final estimates. Statistically, we show that if the regularization parameter is appropriately chosen, the onestep LLA estimates enjoy the oracle properties with good initial estimators. Computationally, the onestep LLA estimation methods dramatically reduce the computational cost in maximizing the nonconcave penalized likelihood. We conduct some Monte Carlo simulation to assess the finite sample performance of the onestep sparse estimation methods. The results are very encouraging. 1. Introduction. Variable
Asymptotic properties of bridge estimators in sparse highdimensional regression models
 Ann. Statist
, 2007
"... We study the asymptotic properties of bridge estimators in sparse, highdimensional, linear regression models when the number of covariates may increase to infinity with the sample size. We are particularly interested in the use of bridge estimators to distinguish between covariates whose coefficien ..."
Abstract

Cited by 40 (9 self)
 Add to MetaCart
We study the asymptotic properties of bridge estimators in sparse, highdimensional, linear regression models when the number of covariates may increase to infinity with the sample size. We are particularly interested in the use of bridge estimators to distinguish between covariates whose coefficients are zero and covariates whose coefficients are nonzero. We show that under appropriate conditions, bridge estimators correctly select covariates with nonzero coefficients with probability converging to one and that the estimators of nonzero coefficients have the same asymptotic distribution that they would have if the zero coefficients were known in advance. Thus, bridge estimators have an oracle property in the sense of Fan and Li [J. Amer. Statist. Assoc. 96 (2001) 1348–1360] and Fan and Peng [Ann. Statist. 32 (2004) 928–961]. In general, the oracle property holds only if the number of covariates is smaller than the sample size. However, under a partial orthogonality condition in which the covariates of the zero coefficients are uncorrelated or weakly correlated with the covariates of nonzero coefficients, we show that marginal bridge estimators can correctly distinguish between covariates with nonzero and zero coefficients with probability converging to one even when the number of covariates is greater than the sample size.
High dimensional classification using features annealed independence rules
 Ann. Statist
, 2008
"... ABSTRACT. Classification using highdimensional features arises frequently in many contemporary statistical studies such as tumor classification using microarray or other highthroughput data. The impact of dimensionality on classifications is largely poorly understood. In a seminal paper, Bickel an ..."
Abstract

Cited by 27 (8 self)
 Add to MetaCart
ABSTRACT. Classification using highdimensional features arises frequently in many contemporary statistical studies such as tumor classification using microarray or other highthroughput data. The impact of dimensionality on classifications is largely poorly understood. In a seminal paper, Bickel and Levina (2004) show that the Fisher discriminant performs poorly due to diverging spectra and they propose to use the independence rule to overcome the problem. We first demonstrate that even for the independence classification rule, classification using all the features can be as bad as the random guessing due to noise accumulation in estimating population centroids in highdimensional feature space. In fact, we demonstrate further that almost all linear discriminants can perform as bad as the random guessing. Thus, it is paramountly important to select a subset of important features for highdimensional classification, resulting in Features Annealed Independence Rules (FAIR). The conditions under which all the important features can be selected by the twosample tstatistic are established. The choice of the optimal number of features, or equivalently, the threshold value of the test statistics are proposed based on an upper bound of the classification error. Simulation studies and real data analysis support our theoretical results and demonstrate convincingly the advantage of our new classification procedure.
High dimensional statistical inference and random matrices
 IN: PROCEEDINGS OF INTERNATIONAL CONGRESS OF MATHEMATICIANS
, 2006
"... Multivariate statistical analysis is concerned with observations on several variables which are thought to possess some degree of interdependence. Driven by problems in genetics and the social sciences, it first flowered in the earlier half of the last century. Subsequently, random matrix theory ..."
Abstract

Cited by 25 (1 self)
 Add to MetaCart
Multivariate statistical analysis is concerned with observations on several variables which are thought to possess some degree of interdependence. Driven by problems in genetics and the social sciences, it first flowered in the earlier half of the last century. Subsequently, random matrix theory (RMT) developed, initially within physics, and more recently widely in mathematics. While some of the central objects of study in RMT are identical to those of multivariate statistics, statistical theory was slow to exploit the connection. However, with vast data collection ever more common, data sets now often have as many or more variables than the number of individuals observed. In such contexts, the techniques and results of RMT have much to offer multivariate statistics. The paper reviews some of the progress to date.
A SELECTIVE OVERVIEW OF VARIABLE SELECTION IN HIGH DIMENSIONAL FEATURE SPACE
, 2010
"... High dimensional statistical problems arise from diverse fields of scientific research and technological development. Variable selection plays a pivotal role in contemporary statistical learning and scientific discoveries. The traditional idea of best subset selection methods, which can be regarded ..."
Abstract

Cited by 23 (4 self)
 Add to MetaCart
High dimensional statistical problems arise from diverse fields of scientific research and technological development. Variable selection plays a pivotal role in contemporary statistical learning and scientific discoveries. The traditional idea of best subset selection methods, which can be regarded as a specific form of penalized likelihood, is computationally too expensive for many modern statistical applications. Other forms of penalized likelihood methods have been successfully developed over the last decade to cope with high dimensionality. They have been widely applied for simultaneously selecting important variables and estimating their effects in high dimensional statistical inference. In this article, we present a brief account of the recent developments of theory, methods, and implementations for high dimensional variable selection. What limits of the dimensionality such methods can handle, what the role of penalty functions is, and what the statistical properties are rapidly drive the advances of the field. The properties of nonconcave penalized likelihood and its roles in high dimensional statistical modeling are emphasized. We also review some recent advances in ultrahigh dimensional variable selection, with emphasis on independence screening and twoscale methods.
When do stepwise algorithms meet subset selection criteria?
, 2007
"... Recent results in homotopy and solution paths demonstrate that certain welldesigned greedy algorithms, with a range of values of the algorithmic parameter, can provide solution paths to a sequence of convex optimization problems. On the other hand, in regression many existing criteria in subset sel ..."
Abstract

Cited by 9 (3 self)
 Add to MetaCart
Recent results in homotopy and solution paths demonstrate that certain welldesigned greedy algorithms, with a range of values of the algorithmic parameter, can provide solution paths to a sequence of convex optimization problems. On the other hand, in regression many existing criteria in subset selection (including Cp, AIC, BIC, MDL, RIC, etc.) involve optimizing an objective function that contains a counting measure. The two optimization problems are formulated as (P1) and (P0) in the present paper. The latter is generally combinatoric and has been proven to be NPhard. We study the conditions under which the two optimization problems have common solutions. Hence, in these situations a stepwise algorithm can be used to solve the seemingly unsolvable problem. Our main result is motivated by recent work in sparse representation, while two others emerge from different angles: a direct analysis of sufficiency and necessity and a condition on the mostly correlated covariates. An extreme example connected with least angle regression is of independent interest.
ProfileKernel likelihood inference with diverging number of parameters
, 2008
"... The generalized varying coefficient partially linear model with growing number of predictors arises in many contemporary scientific endeavor. In this paper we set foot on both theoretical and practical sides of profile likelihood estimation and inference. When the number of parameters grows with sam ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
The generalized varying coefficient partially linear model with growing number of predictors arises in many contemporary scientific endeavor. In this paper we set foot on both theoretical and practical sides of profile likelihood estimation and inference. When the number of parameters grows with sample size, the existence and asymptotic normality of the profile likelihood estimator are established under some regularity conditions. Profile likelihood ratio inference for the growing number of parameters is proposed and Wilk’s phenomenon is demonstrated. A new algorithm, called the accelerated profilekernel algorithm, for computing profilekernel estimator is proposed and investigated. Simulation studies show that the resulting estimates are as efficient as the fully iterative profilekernel estimates. For moderate sample sizes, our proposed procedure saves much computational time over the fully iterative profilekernel one and gives stabler estimates. A set of real data is analyzed using our proposed algorithm. Short Title: Highdimensional profile likelihood.
THE ASYMPTOTIC DISTRIBUTION AND BERRY–ESSEEN BOUND OF A NEW TEST FOR INDEPENDENCE IN HIGH DIMENSION WITH AN APPLICATION TO STOCHASTIC OPTIMIZATION
, 901
"... Let X1,...,Xn be a random sample from a pdimensional population distribution. Assume that c1n α ≤ p ≤ c2n α for some positive constants c1,c2 and α. In this paper we introduce a new statistic for testing independence of the pvariates of the population and prove that the limiting distribution is th ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
Let X1,...,Xn be a random sample from a pdimensional population distribution. Assume that c1n α ≤ p ≤ c2n α for some positive constants c1,c2 and α. In this paper we introduce a new statistic for testing independence of the pvariates of the population and prove that the limiting distribution is the extreme distribution of type I with a rate of convergence O((log n) 5/2 / √ n). This is much faster than O(1/log n), a typical convergence rate for this type of extreme distribution. A simulation study and application to stochastic optimization are discussed.
To how many simultaneous hypothesis tests the normal, Student’s t, or bootstrap calibration be applied
, 2007
"... ABSTRACT. In the analysis of microarray data, and in some other contemporary statistical problems, it is not uncommon to apply hypothesis tests in a highly simultaneous way. The number, N say, of tests used can be much larger than the sample sizes, n, to which the tests are applied, yet we wish to c ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
ABSTRACT. In the analysis of microarray data, and in some other contemporary statistical problems, it is not uncommon to apply hypothesis tests in a highly simultaneous way. The number, N say, of tests used can be much larger than the sample sizes, n, to which the tests are applied, yet we wish to calibrate the tests so that the overall level of the simultaneous test is accurate. Often the sampling distribution is quite different for each test, so there may not be an opportunity for combining data across samples. In this setting, how large can N be, as a function of n, before level accuracy becomes poor? In the present paper we answer this question in cases where the statistic under test is of Student’s t type. We show that if either Normal or Student’s t distribution is used for calibration then the level of the simultaneous test is accurate provided log N increases at a strictly slower rate than n 1/3 as n diverges. On the other hand, if bootstrap methods are used for calibration then we may choose log N almost as large as n 1/2 and still achieve asymptotic level accuracy. The implications of these results are explored both theoretically and numerically. KEYWORDS. Bonferroni’s inequality, Edgeworth expansion, genetic data, largedeviation expansion, level accuracy, microarray data, quantile estimation, skewness, Student’s t statistic.