Results 1  10
of
17
Confidence intervals and hypothesis testing for highdimensional regression. arXiv: 1306.3171
"... Fitting highdimensional statistical models often requires the use of nonlinear parameter estimation procedures. As a consequence, it is generally impossible to obtain an exact characterization of the probability distribution of the parameter estimates. This in turn implies that it is extremely cha ..."
Abstract

Cited by 29 (2 self)
 Add to MetaCart
Fitting highdimensional statistical models often requires the use of nonlinear parameter estimation procedures. As a consequence, it is generally impossible to obtain an exact characterization of the probability distribution of the parameter estimates. This in turn implies that it is extremely challenging to quantify the uncertainty associated with a certain parameter estimate. Concretely, no commonly accepted procedure exists for computing classical measures of uncertainty and statistical significance as confidence intervals or pvalues. We consider here a broad class regression problems, and propose an efficient algorithm for constructing confidence intervals and pvalues. The resulting confidence intervals have nearly optimal size. When testing for the null hypothesis that a certain parameter is vanishing, our method has nearly optimal power. Our approach is based on constructing a ‘debiased ’ version of regularized Mestimators. The new construction improves over recent work in the field in that it does not assume a special structure on the design matrix. Furthermore, proofs are remarkably simple. We test our method on a diabetes prediction problem. 1
Highdimensional Inference: Confidence intervals, pvalues and Rsoftware hdi. arXiv:1408.4026v1
, 2014
"... Abstract. We present a (selective) review of recent frequentist highdimensional inference methods for constructing pvalues and confidence intervals in linear and generalized linear models. We include a broad, comparative empirical study which complements the viewpoint from statistical methodology ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
(Show Context)
Abstract. We present a (selective) review of recent frequentist highdimensional inference methods for constructing pvalues and confidence intervals in linear and generalized linear models. We include a broad, comparative empirical study which complements the viewpoint from statistical methodology and theory. Furthermore, we introduce and illustrate the Rpackage hdi which easily allows the use of different methods and supports reproducibility.
The Cluster Elastic Net for HighDimensional Regression With Unknown Variable Grouping
, 2013
"... In the highdimensional regression setting, the elastic net produces a parsimonious model by shrinking all coefficients towards the origin. However, in certain settings, this behavior might not be desirable: if some features are highly correlated with each other and associated with the response, the ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
In the highdimensional regression setting, the elastic net produces a parsimonious model by shrinking all coefficients towards the origin. However, in certain settings, this behavior might not be desirable: if some features are highly correlated with each other and associated with the response, then we might wish to perform less shrinkage on the coefficients corresponding to that subset of features. We propose the cluster elastic net, which selectively shrinks the coefficients for such variables towards each other, rather than towards the origin. Instead of assuming that the clusters are known a priori, the cluster elastic net infers clusters of features from the data, on the basis of correlation among the variables as well as association with the response. These clusters are then used in order to more accurately perform regression. We demonstrate the theoretical advantages of our proposed approach, and explore its performance in a simulation study, and in an application to HIV drug resistance data. Supplementary Materials are available online.
Selecting the number of principal components: estimation of the true rank of a noisy matrix
"... Principal component analysis (PCA) is a wellknown tool in multivariate statistics. One big challenge in using the method is the choice of the number of components. In this paper, we propose an exact distributionbased method for this purpose: our approach is related to the adaptive regression frame ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Principal component analysis (PCA) is a wellknown tool in multivariate statistics. One big challenge in using the method is the choice of the number of components. In this paper, we propose an exact distributionbased method for this purpose: our approach is related to the adaptive regression framework of Taylor et al. (2013). Assuming Gaussian noise, we use the conditional distribution of the eigenvalues of a Wishart matrix as our test statistic, and derive exact hypothesis tests and confidence intervals for the true singular values. In simulation studies we find that our proposed method compares well to the proposal of Kritchman & Nadler (2008), which uses the asymptotic distribution of singular values based on the TracyWidom laws.
SLOPE is Adaptive to Unknown Sparsity and Asymptotically Minimax
, 2015
"... We consider highdimensional sparse regression problems in which we observe y = Xβ + z, where X is an n × p design matrix and z is an ndimensional vector of independent Gaussian errors, each with variance σ2. Our focus is on the recently introduced SLOPE estimator [15], which regularizes the least ..."
Abstract
 Add to MetaCart
(Show Context)
We consider highdimensional sparse regression problems in which we observe y = Xβ + z, where X is an n × p design matrix and z is an ndimensional vector of independent Gaussian errors, each with variance σ2. Our focus is on the recently introduced SLOPE estimator [15], which regularizes the leastsquares estimates with the rankdependent penalty 1≤i≤p λiβ̂(i), where β̂(i) is the ith largest magnitude of the fitted coefficients. Under Gaussian designs, where the entries of X are i.i.d. N (0, 1/n), we show that SLOPE, with weights λi just about equal to σ · Φ−1(1 − iq/(2p)) (Φ−1(α) is the αth quantile of a standard normal and q is a fixed number in (0, 1)) achieves a squared error of estimation obeying sup ‖β‖0≤k P
Goodness of fit tests for highdimensional models
, 2015
"... In this work we propose a framework for constructing goodness of fit tests in both low and highdimensional linear models. We advocate applying regression methods to the scaled residuals following either an ordinary least squares or Lasso fit to the data, and using some proxy for prediction error as ..."
Abstract
 Add to MetaCart
In this work we propose a framework for constructing goodness of fit tests in both low and highdimensional linear models. We advocate applying regression methods to the scaled residuals following either an ordinary least squares or Lasso fit to the data, and using some proxy for prediction error as the final test statistic. We call this family Residual Prediction (RP) tests. We show that simulation can be used to obtain the critical values for such tests in the lowdimensional setting, and demonstrate using both theoretical results and extensive numerical studies that some form of the parametric bootstrap can do the same when the highdimensional linear model is under consideration. We show that RP tests can be used to test for significance of groups or individual variables as special cases, and here they compare favourably with state of the art methods, but we also argue that they can be designed to test for as diverse model misspecifications as heteroscedasticity and different types of nonlinearity. 1
LETTER Communicated by Ilya M. Nemenman Bayesian Feature Selection with Strongly Regularizing Priors Maps to the Ising Model
"... Identifying small subsets of features that are relevant for prediction and classification tasks is a central problem inmachine learning and statistics. The feature selection task is especially important, and computationally difficult, for modern data sets where the number of features can be compara ..."
Abstract
 Add to MetaCart
(Show Context)
Identifying small subsets of features that are relevant for prediction and classification tasks is a central problem inmachine learning and statistics. The feature selection task is especially important, and computationally difficult, for modern data sets where the number of features can be comparable to or even exceed the number of samples. Here, we show that feature selection with Bayesian inference takes a universal form and reduces to calculating the magnetizations of an Ising model under some mild conditions. Our results exploit the observation that the evidence takes a universal form for strongly regularizing priors—priors that have a large effect on the posterior probability even in the infinite data limit. We derive explicit expressions for feature selection for generalized linear models, a large class of statistical techniques that includes linear and logistic regression. We illustrate the power of our approach by analyzing feature selection in a logistic regressionbased classifier trained to distinguish between the letters B and D in the notMNIST data set. 1
Statistical Estimation and Testing via the Ordered `1 Norm
, 2013
"... We introduce a novel method for sparse regression and variable selection, which is inspired by modern ideas in multiple testing. Imagine we have observations from the linear model y = Xβ+ z, then we suggest estimating the regression coefficients by means of a new estimator called the ordered lasso, ..."
Abstract
 Add to MetaCart
(Show Context)
We introduce a novel method for sparse regression and variable selection, which is inspired by modern ideas in multiple testing. Imagine we have observations from the linear model y = Xβ+ z, then we suggest estimating the regression coefficients by means of a new estimator called the ordered lasso, which is the solution to minimize b 1 2‖y −Xb‖2`2 + λ1b(1) + λ2b(2) +...+ λpb(p); here, λ1 ≥ λ2 ≥... ≥ λp and b(1) ≥ b(2) ≥... ≥ b(p) is the order statistic of the magnitudes of b. In short, the regularizer is an ordered `1 norm which penalizes the regression coefficients according to their rank: the higher the rank—the closer to the top—the larger the penalty. This is similar to the famous BenjaminiHochberg procedure (BHq) [9], which compares the value of a test statistic taken from a family to a critical threshold that depends on its rank in the family. The ordered lasso is a convex program and we demonstrate an efficient algorithm
Statistical Estimation and Testing via the Sorted `1 Norm
, 2013
"... We introduce a novel method for sparse regression and variable selection, which is inspired by modern ideas in multiple testing. Imagine we have observations from the linear model y = Xβ+ z, then we suggest estimating the regression coefficients by means of a new estimator called SLOPE, which is the ..."
Abstract
 Add to MetaCart
(Show Context)
We introduce a novel method for sparse regression and variable selection, which is inspired by modern ideas in multiple testing. Imagine we have observations from the linear model y = Xβ+ z, then we suggest estimating the regression coefficients by means of a new estimator called SLOPE, which is the solution to minimize b 1 2‖y −Xb‖2`2 + λ1b(1) + λ2b(2) +...+ λpb(p); here, λ1 ≥ λ2 ≥... ≥ λp and b(1) ≥ b(2) ≥... ≥ b(p) is the order statistics of the magnitudes of b. In short, the regularizer is a sorted `1 norm which penalizes the regression coefficients according to their rank: the higher the rank—the closer to the top—the larger the penalty. This is similar to the famous BenjaminiHochberg procedure (BHq) [9], which compares the value of a test statistic taken from a family to a critical threshold that depends on its rank in the family. SLOPE is a convex program and we demonstrate an efficient algorithm