Results 1 
7 of
7
Sparsistency and rates of convergence in large covariance matrices estimation
, 2007
"... This paper studies the sparsistency and rates of convergence for estimating sparse covariance and precision matrices based on penalized likelihood with nonconvex penalty functions. Here, sparsistency refers to the property that all parameters that are zero are actually estimated as zero with probabi ..."
Abstract

Cited by 43 (5 self)
 Add to MetaCart
This paper studies the sparsistency and rates of convergence for estimating sparse covariance and precision matrices based on penalized likelihood with nonconvex penalty functions. Here, sparsistency refers to the property that all parameters that are zero are actually estimated as zero with probability tending to one. Depending on the case of applications, sparsity priori may occur on the covariance matrix, its inverse or its Cholesky decomposition. We study these three sparsity exploration problems under a unified framework with a general penalty function. We show that the rates of convergence for these problems under the Frobenius norm are of order (sn log pn/n) 1/2, where sn is the number of nonzero elements, pn is the size of the covariance matrix and n is the sample size. This explicitly spells out the contribution of highdimensionality is merely of a logarithmic factor. The conditions on the rate with which the tuning parameter λn goes to 0 have been made explicit and compared under different penalties. As a result, for the L1penalty, to guarantee the sparsistency and optimal rate of convergence, the number of nonzero elements should be small: s ′ n = O(pn) at most, among O(p2 n) parameters, for estimating sparse covariance or correlation matrix, sparse precision or inverse correlation matrix or sparse Cholesky factor, where s ′ n is the number of the nonzero elements on the offdiagonal entries. On the other hand, using the SCAD or hardthresholding penalty functions, there is no such a restriction. 1. Introduction. Covariance
Generalized thresholding of large covariance matrices
 J. Amer. Statist. Assoc. (Theory and Methods
, 2009
"... We propose a new class of generalized thresholding operators which combine thresholding with shrinkage, and study generalized thresholding of the sample covariance matrix in high dimensions. Generalized thresholding of the covariance matrix has good theoretical properties and carries almost no compu ..."
Abstract

Cited by 21 (2 self)
 Add to MetaCart
We propose a new class of generalized thresholding operators which combine thresholding with shrinkage, and study generalized thresholding of the sample covariance matrix in high dimensions. Generalized thresholding of the covariance matrix has good theoretical properties and carries almost no computational burden. We obtain an explicit convergence rate in the operator norm that shows the tradeoff between the sparsity of the true model, dimension, and the sample size, and show that generalized thresholding is consistent over a large class of models as long as the dimension p and the sample size n satisfy log p/n → 0. In addition, we show
Covariance Estimation: The GLM and Regularization Perspectives
"... Finding an unconstrained and statistically interpretable reparameterization of a covariance matrix is still an open problem in statistics. Its solution is of central importance in covariance estimation, particularly in the recent highdimensional data environment where enforcing the positivedefinit ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Finding an unconstrained and statistically interpretable reparameterization of a covariance matrix is still an open problem in statistics. Its solution is of central importance in covariance estimation, particularly in the recent highdimensional data environment where enforcing the positivedefiniteness constraint could be computationally expensive. We provide a survey of the progress made in modeling covariance matrices from the perspectives of generalized linear models (GLM) or parsimony and use of covariates in low dimensions, regularization (shrinkage, sparsity) for highdimensional data, and the role of various matrix factorizations. A viable and emerging regressionbased setup which is suitable for both the GLM and the regularization approaches is to link a covariance matrix, its inverse or their factors to certain regression models and then solve the relevant (penalized) least squares problems. We point out several instances of this regressionbased setup in the literature. A notable case is in the Gaussian graphical models where linear regressions with LASSO penalty are used to estimate the neighborhood of one node at a time (Meinshausen and Bühlmann, 2006). Some advantages
REGULARIZED LEARNING WITH FEATURE NETWORKS
"... First and foremost, I would like to thank my academic advisor, Lyle Ungar. Lyle ..."
Abstract
 Add to MetaCart
First and foremost, I would like to thank my academic advisor, Lyle Ungar. Lyle
Accelerated [KLH05].
, 2013
"... Version 1.21 Title word crossreference (m) [KNN05]. 2 [MBGO11]. 2 × 2 [HVRA96]. 3 [Dri08, LTW + 10]. [0, 1]d [WC94]. 1 [NST12]. R [JKD92]. ℓ1 [Kat10, Yua08]. ..."
Abstract
 Add to MetaCart
Version 1.21 Title word crossreference (m) [KNN05]. 2 [MBGO11]. 2 × 2 [HVRA96]. 3 [Dri08, LTW + 10]. [0, 1]d [WC94]. 1 [NST12]. R [JKD92]. ℓ1 [Kat10, Yua08].
Estimation of Large Precision Matrices
, 805
"... This paper focuses on exploring the sparsity of the inverse covariance matrix Σ −1, or the precision matrix. We form blocks of parameters based on each offdiagonal band of the Cholesky factor from its modified Cholesky decomposition, and penalize each block of parameters using the L2norm instead o ..."
Abstract
 Add to MetaCart
This paper focuses on exploring the sparsity of the inverse covariance matrix Σ −1, or the precision matrix. We form blocks of parameters based on each offdiagonal band of the Cholesky factor from its modified Cholesky decomposition, and penalize each block of parameters using the L2norm instead of individual elements. We develop a onestep estimator, and prove an oracle property which consists of a notion of block signconsistency and asymptotic normality. In particular, provided the initial estimator of the Cholesky factor is good enough and the true Cholesky has finite number of nonzero offdiagonal bands, oracle property holds for the onestep estimator even if pn ≫ n, and can even be as large as log pn = o(n), where the data y has mean zero and tail probability P(yj > x) ≤ K exp(−Cx d), d> 0, and pn is the number of variables. We also prove an operator norm convergence result, showing the cost of dimensionality is just log pn. The advantage of this method over banding by Bickel and Levina (2008) or nested LASSO by Levina et al. (2007) is that it allows for elimination of weaker signals that precede stronger ones in the Cholesky factor. A method for obtaining an initial estimator for the Cholesky factor is discussed, and a gradient projection algorithm is developed for calculating the onestep estimate. Simulation results are in favor of the newly proposed method and a set of real data is analyzed using the new procedure and the banding method.
Large Covariance Matrices Estimation ∗
, 711
"... This paper studies the sparsistency and rates of convergence for estimating sparse covariance and precision matrices based on penalized likelihood with nonconcave penalty functions. Here, sparsistency refers to the property that all parameters that are zero are actually estimated as zero with proba ..."
Abstract
 Add to MetaCart
This paper studies the sparsistency and rates of convergence for estimating sparse covariance and precision matrices based on penalized likelihood with nonconcave penalty functions. Here, sparsistency refers to the property that all parameters that are zero are actually estimated as zero with probability tending to one. Depending on the case of applications, sparsity priori may occur on the covariance matrix, its inverse or its Cholesky decomposition. We study these three sparsity exploration problems under a unified framework with a general penalty function. We show that the rates of convergence for these problems under the Frobenius norm are of order (sn log pn/n) 1/2, where sn is the number of nonsparse elements, pn is the size of the covariance matrix and n is the sample size. This explicitly spells out the contribution of highdimensionality is merely of a logarithmic factor. The biases of the estimators using different penalty functions are explicitly obtained. As a result, for the L1penalty, to guarantee the sparsistency and optimal rate of convergence, the nonsparsity rates should be low: s ′ n = O(pn) at most, among O(p 2 n) parameters, for estimating sparse covariance or correlation matrix, sparse precision or inverse correlation matrix or sparse Cholesky factor, where s ′ n is the number of the nonsparse elements on the offdiagonal entries. On the other hand, using the SCAD or hardthresholding penalty functions, there is no such a restriction.