Results 1 - 10
of
23
Sure independence screening for ultra-high dimensional feature space
, 2006
"... Variable selection plays an important role in high dimensional statistical modeling which nowa-days appears in many areas and is key to various scientific discoveries. For problems of large scale or dimensionality p, estimation accuracy and computational cost are two top concerns. In a recent paper, ..."
Abstract
-
Cited by 32 (3 self)
- Add to MetaCart
Variable selection plays an important role in high dimensional statistical modeling which nowa-days appears in many areas and is key to various scientific discoveries. For problems of large scale or dimensionality p, estimation accuracy and computational cost are two top concerns. In a recent paper, Candes and Tao (2007) propose the Dantzig selector using L1 regularization and show that it achieves the ideal risk up to a logarithmic factor log p. Their innovative procedure and remarkable result are challenged when the dimensionality is ultra high as the factor log p can be large and their uniform uncertainty principle can fail. Motivated by these concerns, we introduce the concept of sure screening and propose a sure screening method based on a correlation learning, called the Sure Independence Screening (SIS), to reduce dimensionality from high to a moderate scale that is below sample size. In a fairly general asymptotic framework, the SIS is shown to have the sure screening property for even exponentially growing dimensionality. As a methodological extension, an iterative SIS (ISIS) is also proposed to enhance its finite sample performance. With dimension reduced accurately from high to below sample size, variable selection can be improved on both speed and accuracy, and can then be ac-
Asymptotic properties of bridge estimators in sparse high-dimensional regression models
- Ann. Statist
, 2006
"... Summary. We study the asymptotic properties of bridge estimators in sparse, high-dimensional, linear regression models when the number of covariates may increase to infinity with the sample size. We are particularly interested in the use of bridge estima-tors to distinguish between covariates whose ..."
Abstract
-
Cited by 19 (9 self)
- Add to MetaCart
Summary. We study the asymptotic properties of bridge estimators in sparse, high-dimensional, linear regression models when the number of covariates may increase to infinity with the sample size. We are particularly interested in the use of bridge estima-tors to distinguish between covariates whose coefficients are zero and covariates whose coefficients are nonzero. We show that under appropriate conditions, bridge estimators correctly select covariates with nonzero coefficients with probability converging to one and that the estimators of nonzero coefficients have the same asymptotic distribution that they would have if the zero coefficients were known in advance. Thus, bridge es-timators have an oracle property in the sense of Fan and Li (2001) and Fan and Peng (2004). In general, the oracle property holds only if the number of covariates is smaller than the sample size. However, under a partial orthogonality condition in which the co-variates of the zero coefficients are uncorrelated or weakly correlated with the covariates of nonzero coefficients, we show that marginal bridge estimators can correctly distin-guish between covariates with nonzero and zero coefficients with probability converging to one even when the number of covariates is greater than the sample size. Key Words and phrases. Penalized regression, high-dimensional data, variable selection, asymptotic normality, oracle property. Short title. Sparse high-dimensional regression
High dimensional statistical inference and random matrices
- IN: PROCEEDINGS OF INTERNATIONAL CONGRESS OF MATHEMATICIANS
, 2006
"... Multivariate statistical analysis is concerned with observations on several variables which are thought to possess some degree of inter-dependence. Driven by problems in genetics and the social sciences, it first flowered in the earlier half of the last century. Subsequently, random matrix theory ..."
Abstract
-
Cited by 17 (0 self)
- Add to MetaCart
Multivariate statistical analysis is concerned with observations on several variables which are thought to possess some degree of inter-dependence. Driven by problems in genetics and the social sciences, it first flowered in the earlier half of the last century. Subsequently, random matrix theory (RMT) developed, initially within physics, and more recently widely in mathematics. While some of the central objects of study in RMT are identical to those of multivariate statistics, statistical theory was slow to exploit the connection. However, with vast data collection ever more common, data sets now often have as many or more variables than the number of individuals observed. In such contexts, the techniques and results of RMT have much to offer multivariate statistics. The paper reviews some of the progress to date.
High dimensional classification using features annealed independence rules
- Ann. Statist
, 2008
"... ABSTRACT. Classification using high-dimensional features arises frequently in many contemporary statistical studies such as tumor classification using microarray or other high-throughput data. The impact of dimensionality on classifications is largely poorly understood. In a seminal paper, Bickel an ..."
Abstract
-
Cited by 14 (4 self)
- Add to MetaCart
ABSTRACT. Classification using high-dimensional features arises frequently in many contemporary statistical studies such as tumor classification using microarray or other high-throughput data. The impact of dimensionality on classifications is largely poorly understood. In a seminal paper, Bickel and Levina (2004) show that the Fisher discriminant performs poorly due to diverging spectra and they propose to use the independence rule to overcome the problem. We first demonstrate that even for the independence classification rule, classification using all the features can be as bad as the random guessing due to noise accumulation in estimating population centroids in high-dimensional feature space. In fact, we demonstrate further that almost all linear discriminants can perform as bad as the random guessing. Thus, it is paramountly important to select a subset of important features for high-dimensional classification, resulting in Features Annealed Independence Rules (FAIR). The conditions under which all the important features can be selected by the two-sample t-statistic are established. The choice of the optimal number of features, or equivalently, the threshold value of the test statistics are proposed based on an upper bound of the classification error. Simulation studies and real data analysis support our theoretical results and demonstrate convincingly the advantage of our new classification procedure.
When do stepwise algorithms meet subset selection criteria
- ISyE Statistics Techical Report, URL = http://www.isye.gatech.edu/statistics/papers
, 2005
"... Recent results in homotopy and solution paths demonstrate that certain well-designed greedy algorithms, with a range of values of the algorithmic parameter, can provide solution paths to a sequence of convex optimization problems. On the other hand, in regression many existing criteria in subset sel ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
Recent results in homotopy and solution paths demonstrate that certain well-designed greedy algorithms, with a range of values of the algorithmic parameter, can provide solution paths to a sequence of convex optimization problems. On the other hand, in regression many existing criteria in subset selection (including Cp, AIC, BIC, MDL, RIC, etc.) involve optimizing an objective function that contains a counting measure. The two optimization problems are formulated as (P1) and(P0) in the present paper. The latter is generally combinatoric and has been proven to be NP-hard. We study the conditions under which the two optimization problems have common solutions. Hence, in these situations a stepwise algorithm can be used to solve the seemingly unsolvable problem. Our main result is motivated by recent work in sparse representation, while two others emerge from different angles: a direct analysis of sufficiency and necessity and a condition on the mostly correlated covariates. An extreme example connected with least angle regression is of independent interest. 1. Introduction. We
To how many simultaneous hypothesis tests the normal, Student’s t, or bootstrap calibration be applied
, 2007
"... ABSTRACT. In the analysis of microarray data, and in some other contemporary statistical problems, it is not uncommon to apply hypothesis tests in a highly simultaneous way. The number, N say, of tests used can be much larger than the sample sizes, n, to which the tests are applied, yet we wish to c ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
ABSTRACT. In the analysis of microarray data, and in some other contemporary statistical problems, it is not uncommon to apply hypothesis tests in a highly simultaneous way. The number, N say, of tests used can be much larger than the sample sizes, n, to which the tests are applied, yet we wish to calibrate the tests so that the overall level of the simultaneous test is accurate. Often the sampling distribution is quite different for each test, so there may not be an opportunity for combining data across samples. In this setting, how large can N be, as a function of n, before level accuracy becomes poor? In the present paper we answer this question in cases where the statistic under test is of Student’s t type. We show that if either Normal or Student’s t distribution is used for calibration then the level of the simultaneous test is accurate provided log N increases at a strictly slower rate than n 1/3 as n diverges. On the other hand, if bootstrap methods are used for calibration then we may choose log N almost as large as n 1/2 and still achieve asymptotic level accuracy. The implications of these results are explored both theoretically and numerically. KEYWORDS. Bonferroni’s inequality, Edgeworth expansion, genetic data, large-deviation expansion, level accuracy, microarray data, quantile estimation, skewness, Student’s t statistic.
Profile-Kernel likelihood inference with diverging number of parameters
, 2008
"... The generalized varying coefficient partially linear model with growing number of predictors arises in many contemporary scientific endeavor. In this paper we set foot on both theoretical and practical sides of profile likelihood estimation and inference. When the number of parameters grows with sam ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
The generalized varying coefficient partially linear model with growing number of predictors arises in many contemporary scientific endeavor. In this paper we set foot on both theoretical and practical sides of profile likelihood estimation and inference. When the number of parameters grows with sample size, the existence and asymptotic normality of the profile likelihood estimator are established under some regularity conditions. Profile likelihood ratio inference for the growing number of parameters is proposed and Wilk’s phenomenon is demonstrated. A new algorithm, called the accelerated profilekernel algorithm, for computing profile-kernel estimator is proposed and investigated. Simulation studies show that the resulting estimates are as efficient as the fully iterative profile-kernel estimates. For moderate sample sizes, our proposed procedure saves much computational time over the fully iterative profile-kernel one and gives stabler estimates. A set of real data is analyzed using our proposed algorithm. Short Title: High-dimensional profile likelihood.
SPLINE ESTIMATION OF SINGLE-INDEX MODELS
"... Abstract: For the past two decades, the single-index model, a special case of projection pursuit regression, has proven to be an efficient way of coping with the high-dimensional problem in nonparametric regression. In this paper, based on a weakly dependent sample, we investigate a robust single-in ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract: For the past two decades, the single-index model, a special case of projection pursuit regression, has proven to be an efficient way of coping with the high-dimensional problem in nonparametric regression. In this paper, based on a weakly dependent sample, we investigate a robust single-index model, where the single-index is identified by the best approximation to the multivariate prediction function of the response variable, regardless of whether the prediction function is a genuine single-index function. A polynomial spline estimator is proposed for the single-index coefficients, and is shown to be root-n consistent and asymptotically normal. An iterative optimization routine is used that is sufficiently fast for the user to analyze large data sets of high dimension within seconds. Simulation experiments have provided strong evidence corroborating the asymptotic theory. Application of the proposed procedure to the river flow data of Iceland has yielded superior out-of-sample rolling forecasts. Key words and phrases: B-spline, geometric mixing, knots, nonparametric regression, root-n rate, strong consistency. 1.
Approximate Dynamic Programming - II: Algorithms
- in R, J. C. Baltzer AG
, 2010
"... Approximate dynamic programming is a powerful class of algorithmic strategies for solving stochastic optimization problems where optimal decisions can be characterized using Bellman’s optimality equation, but where the characteristics of the problem make solving Bellman’s equation computationally in ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Approximate dynamic programming is a powerful class of algorithmic strategies for solving stochastic optimization problems where optimal decisions can be characterized using Bellman’s optimality equation, but where the characteristics of the problem make solving Bellman’s equation computationally intractable. This brief chapter provides an introduction to the basic concepts of approximate dynamic programming while building bridges between the different communities that have contributed to this field. We cover basic approximate value iteration (temporal difference learning), policy approximation, and a brief introduction to strategies for approximating value functions. We cover Q-learning, and the use of the post-decision state for solving problems with vector-valued decisions. The approximate linear programming method is introduced, along with a discussion of stepsize selection issues. The presentation closes with a discussion of some practical issues that arise in the Approximate dynamic programming represents a powerful modeling and algorithmic strategy that can address a wide range of optimization problems that involve making decisions sequentially in the presence of different types of uncertainty. A short list of applications, which illustrate different problem classes, include the following:

