Results 1  10
of
183
Least angle regression
, 2004
"... The purpose of model selection algorithms such as All Subsets, Forward Selection and Backward Elimination is to choose a linear model on the basis of the same set of data to which the model will be applied. Typically we have available a large collection of possible covariates from which we hope to s ..."
Abstract

Cited by 1326 (37 self)
 Add to MetaCart
The purpose of model selection algorithms such as All Subsets, Forward Selection and Backward Elimination is to choose a linear model on the basis of the same set of data to which the model will be applied. Typically we have available a large collection of possible covariates from which we hope to select a parsimonious set for the efficient prediction of a response variable. Least Angle Regression (LARS), a new model selection algorithm, is a useful and less greedy version of traditional forward selection methods. Three main properties are derived: (1) A simple modification of the LARS algorithm implements the Lasso, an attractive version of ordinary least squares that constrains the sum of the absolute regression coefficients; the LARS modification calculates all possible Lasso estimates for a given problem, using an order of magnitude less computer time than previous methods. (2) A different LARS modification efficiently implements Forward Stagewise linear regression, another promising new model selection method; this connection explains the similar numerical results previously observed for the Lasso and Stagewise, and helps us understand the properties of both methods, which are seen as constrained versions of the simpler LARS algorithm. (3) A simple approximation for the degrees of freedom of a LARS estimate is available, from which we derive a Cp estimate of prediction error; this allows a principled choice among the range of possible LARS estimates. LARS and its variants are computationally efficient: the paper describes a publicly available algorithm that requires only the same order of magnitude of computational effort as ordinary least squares applied to the full set of covariates.
Sparse solution of underdetermined linear equations by stagewise orthogonal matching pursuit
, 2006
"... Finding the sparsest solution to underdetermined systems of linear equations y = Φx is NPhard in general. We show here that for systems with ‘typical’/‘random ’ Φ, a good approximation to the sparsest solution is obtained by applying a fixed number of standard operations from linear algebra. Our pr ..."
Abstract

Cited by 274 (22 self)
 Add to MetaCart
(Show Context)
Finding the sparsest solution to underdetermined systems of linear equations y = Φx is NPhard in general. We show here that for systems with ‘typical’/‘random ’ Φ, a good approximation to the sparsest solution is obtained by applying a fixed number of standard operations from linear algebra. Our proposal, Stagewise Orthogonal Matching Pursuit (StOMP), successively transforms the signal into a negligible residual. Starting with initial residual r0 = y, at the sth stage it forms the ‘matched filter ’ Φ T rs−1, identifies all coordinates with amplitudes exceeding a speciallychosen threshold, solves a leastsquares problem using the selected coordinates, and subtracts the leastsquares fit, producing a new residual. After a fixed number of stages (e.g. 10), it stops. In contrast to Orthogonal Matching Pursuit (OMP), many coefficients can enter the model at each stage in StOMP while only one enters per stage in OMP; and StOMP takes a fixed number of stages (e.g. 10), while OMP can take many (e.g. n). StOMP runs much faster than competing proposals for sparse solutions, such as ℓ1 minimization and OMP, and so is attractive for solving largescale problems. We use phase diagrams to compare algorithm performance. The problem of recovering a ksparse vector x0 from (y, Φ) where Φ is random n × N and y = Φx0 is represented by a point (n/N, k/n)
Identifying differentially expressed genes using false discovery rate controlling procedures
 BIOINFORMATICS 19: 368–375
, 2003
"... Motivation: DNA microarrays have recently been used for the purpose of monitoring expression levels of thousands of genes simultaneously and identifying those genes that are differentially expressed. The probability that a false identification (type I error) is committed can increase sharply when th ..."
Abstract

Cited by 233 (2 self)
 Add to MetaCart
Motivation: DNA microarrays have recently been used for the purpose of monitoring expression levels of thousands of genes simultaneously and identifying those genes that are differentially expressed. The probability that a false identification (type I error) is committed can increase sharply when the number of tested genes gets large. Correlation between the test statistics attributed to gene coregulation and dependency in the measurement errors of the gene expression levels further complicates the problem. In this paper we address this very large multiplicity problem by adopting the false discovery rate (FDR) controlling approach. In order to address the dependency problem, we present three resamplingbased FDR controlling procedures, that account for the test statistics distribution, and compare their performance to that of the naïve application of the linear stepup procedure in Benjamini and Hochberg (1995). The procedures are studied using simulated microarray data, and their performance is examined relative to their ease of implementation. Results: Comparative simulation analysis shows that all four FDR controlling procedures control the FDR at the desired level, and retain substantially more power then the familywise error rate controlling procedures. In terms of power, using resampling of the marginal distribution of each test statistics substantially improves the performance over the naïve one. The highest power is achieved, at the expense of a more sophisticated algorithm, by the resamplingbased procedures that resample the joint distribution of the test statistics and estimate the level of FDR control.
Sparsity oracle inequalities for the lasso
 Electronic Journal of Statistics
"... Abstract: This paper studies oracle properties of ℓ1penalized least squares in nonparametric regression setting with random design. We show that the penalized least squares estimator satisfies sparsity oracle inequalities, i.e., bounds in terms of the number of nonzero components of the oracle vec ..."
Abstract

Cited by 171 (12 self)
 Add to MetaCart
(Show Context)
Abstract: This paper studies oracle properties of ℓ1penalized least squares in nonparametric regression setting with random design. We show that the penalized least squares estimator satisfies sparsity oracle inequalities, i.e., bounds in terms of the number of nonzero components of the oracle vector. The results are valid even when the dimension of the model is (much) larger than the sample size and the regression matrix is not positive definite. They can be applied to highdimensional linear regression, to nonparametric adaptive regression estimation and to the problem of aggregation of arbitrary estimators.
Higher criticism for detecting sparse heterogeneous mixtures
 Ann. Statist
, 2004
"... Higher Criticism, or secondlevel significance testing, is a multiple comparisons concept mentioned in passing by Tukey (1976). It concerns a situation where there are many independent tests of significance and one is interested in rejecting the joint null hypothesis. Tukey suggested to compare the ..."
Abstract

Cited by 154 (22 self)
 Add to MetaCart
(Show Context)
Higher Criticism, or secondlevel significance testing, is a multiple comparisons concept mentioned in passing by Tukey (1976). It concerns a situation where there are many independent tests of significance and one is interested in rejecting the joint null hypothesis. Tukey suggested to compare the fraction of observed significances at a given αlevel to the expected fraction under the joint null, in fact he suggested to standardize the difference of the two quantities and form a zscore; the resulting zscore tests the significance of the body of significance tests. We consider a generalization, where we maximize this zscore over a range of significance levels 0 < α ≤ α0. We are able to show that the resulting Higher Criticism statistic is effective at resolving a very subtle testing problem: testing whether n normal means are all zero versus the alternative that a small fraction is nonzero. The subtlety of this ‘sparse normal means ’ testing problem can be seen from work of Ingster (1999) and Jin (2002), who studied such problems in great detail. In their studies, they identified an interesting range of cases where the small fraction of nonzero means is so
Covariance regularization by thresholding
, 2007
"... This paper considers regularizing a covariance matrix of p variables estimated from n observations, by hard thresholding. We show that the thresholded estimate is consistent in the operator norm as long as the true covariance matrix is sparse in a suitable sense, the variables are Gaussian or subGa ..."
Abstract

Cited by 148 (11 self)
 Add to MetaCart
(Show Context)
This paper considers regularizing a covariance matrix of p variables estimated from n observations, by hard thresholding. We show that the thresholded estimate is consistent in the operator norm as long as the true covariance matrix is sparse in a suitable sense, the variables are Gaussian or subGaussian, and (log p)/n → 0, and obtain explicit rates. The results are uniform over families of covariance matrices which satisfy a fairly natural notion of sparsity. We discuss an intuitive resampling scheme for threshold selection and prove a general crossvalidation result that justifies this approach. We also compare thresholding to other covariance estimators in simulations and on an example from climate data. 1. Introduction. Estimation
A stochastic process approach to False discovery rates
, 2001
"... This paper extends the theory of false discovery rates (FDR) pioneered by Benjamini and Hochberg (1995). We develop a framework in which the False Discovery Proportion (FDP) – the number of false rejections divided by the number of rejections – is treated as a stochastic process. After obtaining th ..."
Abstract

Cited by 132 (6 self)
 Add to MetaCart
This paper extends the theory of false discovery rates (FDR) pioneered by Benjamini and Hochberg (1995). We develop a framework in which the False Discovery Proportion (FDP) – the number of false rejections divided by the number of rejections – is treated as a stochastic process. After obtaining the limiting distribution of the process, we demonstrate the validitiy of a class of procedures for controlling the False Discovery Rate (the expected FDP). We construct a confidence envelope for the whole FDP process. From these envelopes we derive confidence thresholds, for controlling the quantiles of the distribution of the FDP as well as controlling the number of false discoveries. We also
EMPIRICAL BAYES SELECTION OF WAVELET THRESHOLDS
, 2005
"... This paper explores a class of empirical Bayes methods for leveldependent threshold selection in wavelet shrinkage. The prior considered for each wavelet coefficient is a mixture of an atom of probability at zero and a heavytailed density. The mixing weight, or sparsity parameter, for each level of ..."
Abstract

Cited by 117 (5 self)
 Add to MetaCart
This paper explores a class of empirical Bayes methods for leveldependent threshold selection in wavelet shrinkage. The prior considered for each wavelet coefficient is a mixture of an atom of probability at zero and a heavytailed density. The mixing weight, or sparsity parameter, for each level of the transform is chosen by marginal maximum likelihood. If estimation is carried out using the posterior median, this is a random thresholding procedure; the estimation can also be carried out using other thresholding rules with the same threshold. Details of the calculations needed for implementing the procedure are included. In practice, the estimates are quick to compute and there is software available. Simulations on the standard model functions show excellent performance, and applications to data drawn from various fields of application are used to explore the practical performance of the approach. By using a general result on the risk of the corresponding marginal maximum likelihood approach for a single sequence, overall bounds on
Wavelet estimators in nonparametric regression: a comparative simulation study
 Journal of Statistical Software
, 2001
"... OATAO is an open access repository that collects the work of Toulouse researchers and makes it freely available over the web where possible. ..."
Abstract

Cited by 114 (19 self)
 Add to MetaCart
OATAO is an open access repository that collects the work of Toulouse researchers and makes it freely available over the web where possible.
The Nonparanormal: Semiparametric Estimation of High Dimensional Undirected Graphs
"... Recent methods for estimating sparse undirected graphs for realvalued data in high dimensional problems rely heavily on the assumption of normality. We show how to use a semiparametric Gaussian copula—or “nonparanormal”—for high dimensional inference. Just as additive models extend linear models by ..."
Abstract

Cited by 92 (21 self)
 Add to MetaCart
Recent methods for estimating sparse undirected graphs for realvalued data in high dimensional problems rely heavily on the assumption of normality. We show how to use a semiparametric Gaussian copula—or “nonparanormal”—for high dimensional inference. Just as additive models extend linear models by replacing linear functions with a set of onedimensional smooth functions, the nonparanormal extends the normal by transforming the variables by smooth functions. We derive a method for estimating the nonparanormal, study the method’s theoretical properties, and show that it works well in many examples.