Results 1  10
of
115
Sure independence screening for ultrahigh dimensional feature space
, 2006
"... Variable selection plays an important role in high dimensional statistical modeling which nowadays appears in many areas and is key to various scientific discoveries. For problems of large scale or dimensionality p, estimation accuracy and computational cost are two top concerns. In a recent paper, ..."
Abstract

Cited by 279 (27 self)
 Add to MetaCart
Variable selection plays an important role in high dimensional statistical modeling which nowadays appears in many areas and is key to various scientific discoveries. For problems of large scale or dimensionality p, estimation accuracy and computational cost are two top concerns. In a recent paper, Candes and Tao (2007) propose the Dantzig selector using L1 regularization and show that it achieves the ideal risk up to a logarithmic factor log p. Their innovative procedure and remarkable result are challenged when the dimensionality is ultra high as the factor log p can be large and their uniform uncertainty principle can fail. Motivated by these concerns, we introduce the concept of sure screening and propose a sure screening method based on a correlation learning, called the Sure Independence Screening (SIS), to reduce dimensionality from high to a moderate scale that is below sample size. In a fairly general asymptotic framework, the SIS is shown to have the sure screening property for even exponentially growing dimensionality. As a methodological extension, an iterative SIS (ISIS) is also proposed to enhance its finite sample performance. With dimension reduced accurately from high to below sample size, variable selection can be improved on both speed and accuracy, and can then be ac
A unified framework for highdimensional analysis of Mestimators with decomposable regularizers
"... ..."
Sparse Permutation Invariant Covariance Estimation
 Electronic Journal of Statistics
, 2008
"... The paper proposes a method for constructing a sparse estimator for the inverse covariance (concentration) matrix in highdimensional settings. The estimator uses a penalized normal likelihood approach and forces sparsity by using a lassotype penalty. We establish a rate of convergence in the Fro ..."
Abstract

Cited by 171 (7 self)
 Add to MetaCart
The paper proposes a method for constructing a sparse estimator for the inverse covariance (concentration) matrix in highdimensional settings. The estimator uses a penalized normal likelihood approach and forces sparsity by using a lassotype penalty. We establish a rate of convergence in the Frobenius norm as both data dimension p and sample size n are allowed to grow, and show that the rate depends explicitly on how sparse the true concentration matrix is. We also show that a correlationbased version of the method exhibits better rates in the operator norm. The estimator is required to be positive definite, but we avoid having to use semidefinite programming by reparameterizing the objective function
Covariance regularization by thresholding
, 2007
"... This paper considers regularizing a covariance matrix of p variables estimated from n observations, by hard thresholding. We show that the thresholded estimate is consistent in the operator norm as long as the true covariance matrix is sparse in a suitable sense, the variables are Gaussian or subGa ..."
Abstract

Cited by 154 (11 self)
 Add to MetaCart
(Show Context)
This paper considers regularizing a covariance matrix of p variables estimated from n observations, by hard thresholding. We show that the thresholded estimate is consistent in the operator norm as long as the true covariance matrix is sparse in a suitable sense, the variables are Gaussian or subGaussian, and (log p)/n → 0, and obtain explicit rates. The results are uniform over families of covariance matrices which satisfy a fairly natural notion of sparsity. We discuss an intuitive resampling scheme for threshold selection and prove a general crossvalidation result that justifies this approach. We also compare thresholding to other covariance estimators in simulations and on an example from climate data. 1. Introduction. Estimation
High dimensional covariance matrix estimation using a factor model
 Princeton Univ
, 2008
"... ar ..."
Optimal rates of convergence for covariance matrix estimation
 Ann. Statist
, 2010
"... Covariance matrix plays a central role in multivariate statistical analysis. Significant advances have been made recently on developing both theory and methodology for estimating large covariance matrices. However, a minimax theory has yet been developed. In this paper we establish the optimal rates ..."
Abstract

Cited by 93 (18 self)
 Add to MetaCart
Covariance matrix plays a central role in multivariate statistical analysis. Significant advances have been made recently on developing both theory and methodology for estimating large covariance matrices. However, a minimax theory has yet been developed. In this paper we establish the optimal rates of convergence for estimating the covariance matrix under both the operator norm and Frobenius norm. It is shown that optimal procedures under the two norms are different and consequently matrix estimation under the operator norm is fundamentally different from vector estimation. The minimax upper bound is obtained by constructing a special class of tapering estimators and by studying their risk properties. A key step in obtaining the optimal rate of convergence is the derivation of the minimax lower bound. The technical analysis requires new ideas that are quite different from those used in the more conventional function/sequence estimation problems. 1. Introduction. Suppose
Latent Variable Graphical Model Selection via Convex Optimization
, 2010
"... Suppose we have samples of a subset of a collection of random variables. No additional information is provided about the number of latent variables, nor of the relationship between the latent and observed variables. Is it possible to discover the number of hidden components, and to learn a statistic ..."
Abstract

Cited by 75 (5 self)
 Add to MetaCart
Suppose we have samples of a subset of a collection of random variables. No additional information is provided about the number of latent variables, nor of the relationship between the latent and observed variables. Is it possible to discover the number of hidden components, and to learn a statistical model over the entire collection of variables? We address this question in the setting in which the latent and observed variables are jointly Gaussian, with the conditional statistics of the observed variables conditioned on the latent variables being specified by a graphical model. As a first step we give natural conditions under which such latentvariable Gaussian graphical models are identifiable given marginal statistics of only the observed variables. Essentially these conditions require that the conditional graphical model among the observed variables is sparse, while the effect of the latent variables is “spread out ” over most of the observed variables. Next we propose a tractable convex program based on regularized maximumlikelihood for model selection in this latentvariable setting; the regularizer uses both the ℓ1 norm and the nuclear norm. Our modeling framework can be viewed as a combination of dimensionality reduction (to identify latent variables) and graphical modeling (to capture remaining statistical structure not attributable to the latent variables), and it consistently estimates both the number of hidden components and the conditional graphical model structure among the observed variables. These results are applicable in the highdimensional setting in which the number of latent/observed variables grows with the number of samples of the observed variables. The geometric properties of the algebraic varieties of sparse matrices and of lowrank matrices play an important role in our analysis.
Highdimensional covariance estimation by minimizing ℓ1penalized logdeterminant divergence
, 2008
"... ..."
Generalized thresholding of large covariance matrices
 J. Amer. Statist. Assoc. (Theory and Methods
, 2009
"... We propose a new class of generalized thresholding operators which combine thresholding with shrinkage, and study generalized thresholding of the sample covariance matrix in high dimensions. Generalized thresholding of the covariance matrix has good theoretical properties and carries almost no compu ..."
Abstract

Cited by 67 (4 self)
 Add to MetaCart
We propose a new class of generalized thresholding operators which combine thresholding with shrinkage, and study generalized thresholding of the sample covariance matrix in high dimensions. Generalized thresholding of the covariance matrix has good theoretical properties and carries almost no computational burden. We obtain an explicit convergence rate in the operator norm that shows the tradeoff between the sparsity of the true model, dimension, and the sample size, and show that generalized thresholding is consistent over a large class of models as long as the dimension p and the sample size n satisfy log p/n → 0. In addition, we show
Highdimensional covariance estimation by minimizing 1penalized logdeterminant divergence
 Electron. J. Stat
, 2011
"... Given i.i.d. observations of a random vector X ∈ Rp, we study the problem of estimating both its covariance matrix Σ∗, and its inverse covariance or concentration matrix Θ ∗ = (Σ∗)−1. When X is multivariate Gaussian, the nonzero structure of Θ ∗ is specified by the graph of an associated Gaussian ..."
Abstract

Cited by 54 (6 self)
 Add to MetaCart
Given i.i.d. observations of a random vector X ∈ Rp, we study the problem of estimating both its covariance matrix Σ∗, and its inverse covariance or concentration matrix Θ ∗ = (Σ∗)−1. When X is multivariate Gaussian, the nonzero structure of Θ ∗ is specified by the graph of an associated Gaussian Markov random field; and a popular estimator for such sparse Θ ∗ is the `1regularized Gaussian MLE. This estimator is sensible even for for nonGaussian X, since it corresponds to minimizing an `1penalized logdeterminant Bregman divergence. We analyze its performance under highdimensional scaling, in which the number of nodes in the graph p, the number of edges s, and the maximum node degree d, are allowed to grow as a function of the sample size n. In addition to the parameters (p, s, d), our analysis identifies other key quantities that control rates: (a) the `∞operator norm of the true covariance matrix Σ∗; and (b) the ` ∞ operator norm of the submatrix Γ∗SS, where S indexes the graph edges, and Γ ∗ = (Θ∗)−1 ⊗ (Θ∗)−1; and (c) a mutual incoherence or irrepresentability measure on the matrix Γ ∗ and (d) the rate of decay 1/f(n, δ) on the probabilities {Σ̂nij − Σ∗ij > δ}, where Σ̂n is the sample covariance based on n samples. Our first result establishes consistency of our estimate Θ ̂ in the elementwise maximumnorm. This in turn allows us to derive convergence rates in Frobenius and spectral norms, with improvements upon existing results for graphs with maximum node degrees d = o( s). In our second result, we show that with probability converging to one, the estimate Θ ̂ correctly speci