Results 1  10
of
40
Regularized estimation of large covariance matrices
 Ann. Statist
, 2008
"... This paper considers estimating a covariance matrix of p variables from n observations by either banding or tapering the sample covariance matrix, or estimating a banded version of the inverse of the covariance. We show that these estimates are consistent in the operator norm as long as (log p)/n → ..."
Abstract

Cited by 89 (13 self)
 Add to MetaCart
This paper considers estimating a covariance matrix of p variables from n observations by either banding or tapering the sample covariance matrix, or estimating a banded version of the inverse of the covariance. We show that these estimates are consistent in the operator norm as long as (log p)/n → 0, and obtain explicit rates. The results are uniform over some fairly natural wellconditioned families of covariance matrices. We also introduce an analogue of the Gaussian white noise model and show that if the population covariance is embeddable in that model and wellconditioned, then the banded approximations produce consistent estimates of the eigenvalues and associated eigenvectors of the covariance matrix. The results can be extended to smooth versions of banding and to nonGaussian distributions with sufficiently short tails. A resampling approach is proposed for choosing the banding parameter in practice. This approach is illustrated numerically on both simulated and real data. 1. Introduction. Estimation
Sparse Permutation Invariant Covariance Estimation
 Electronic Journal of Statistics
, 2008
"... The paper proposes a method for constructing a sparse estimator for the inverse covariance (concentration) matrix in highdimensional settings. The estimator uses a penalized normal likelihood approach and forces sparsity by using a lassotype penalty. We establish a rate of convergence in the Fro ..."
Abstract

Cited by 83 (5 self)
 Add to MetaCart
The paper proposes a method for constructing a sparse estimator for the inverse covariance (concentration) matrix in highdimensional settings. The estimator uses a penalized normal likelihood approach and forces sparsity by using a lassotype penalty. We establish a rate of convergence in the Frobenius norm as both data dimension p and sample size n are allowed to grow, and show that the rate depends explicitly on how sparse the true concentration matrix is. We also show that a correlationbased version of the method exhibits better rates in the operator norm. The estimator is required to be positive definite, but we avoid having to use semidefinite programming by reparameterizing the objective function
Covariance regularization by thresholding
, 2007
"... This paper considers regularizing a covariance matrix of p variables estimated from n observations, by hard thresholding. We show that the thresholded estimate is consistent in the operator norm as long as the true covariance matrix is sparse in a suitable sense, the variables are Gaussian or subGa ..."
Abstract

Cited by 69 (9 self)
 Add to MetaCart
This paper considers regularizing a covariance matrix of p variables estimated from n observations, by hard thresholding. We show that the thresholded estimate is consistent in the operator norm as long as the true covariance matrix is sparse in a suitable sense, the variables are Gaussian or subGaussian, and (log p)/n → 0, and obtain explicit rates. The results are uniform over families of covariance matrices which satisfy a fairly natural notion of sparsity. We discuss an intuitive resampling scheme for threshold selection and prove a general crossvalidation result that justifies this approach. We also compare thresholding to other covariance estimators in simulations and on an example from climate data. 1. Introduction. Estimation
Statistical challenges with high dimensionality: Feature selection in knowledge discovery
 Proceedings of the International Congress of Mathematicians
, 2006
"... Abstract. Technological innovations have revolutionized the process of scientific research and knowledge discovery. The availability of massive data and challenges from frontiers of research and development have reshaped statistical thinking, data analysis and theoretical studies. The challenges of ..."
Abstract

Cited by 35 (9 self)
 Add to MetaCart
Abstract. Technological innovations have revolutionized the process of scientific research and knowledge discovery. The availability of massive data and challenges from frontiers of research and development have reshaped statistical thinking, data analysis and theoretical studies. The challenges of highdimensionality arise in diverse fields of sciences and the humanities, ranging from computational biology and health studies to financial engineering and risk management. In all of these fields, variable selection and feature extraction are crucial for knowledge discovery. We first give a comprehensive overview of statistical challenges with high dimensionality in these diverse disciplines. We then approach the problem of variable selection and feature extraction using a unified framework: penalized likelihood methods. Issues relevant to the choice of penalty functions are addressed. We demonstrate that for a host of statistical problems, as long as the dimensionality is not excessively large, we can estimate the model parameters as well as if the best model is known in advance. The persistence property in risk minimization is also addressed. The applicability of such a theory and method to diverse statistical problems is demonstrated. Other related problems with highdimensionality are also discussed.
Sparse estimation of large covariance matrices via a nested Lasso penalty. Annals of Applied Statistics
, 2007
"... The paper proposes a new covariance estimator for large covariance matrices when the variables have a natural ordering. Using the Cholesky decomposition of the inverse, we impose a banded structure on the Cholesky factor, and select the bandwidth adaptively for each row of the Cholesky factor, using ..."
Abstract

Cited by 33 (8 self)
 Add to MetaCart
The paper proposes a new covariance estimator for large covariance matrices when the variables have a natural ordering. Using the Cholesky decomposition of the inverse, we impose a banded structure on the Cholesky factor, and select the bandwidth adaptively for each row of the Cholesky factor, using a novel penalty we call nested Lasso. This structure has more flexibility than regular banding, but, unlike regular Lasso applied to the entries of the Cholesky factor, results in a sparse estimator for the inverse of the covariance matrix. An iterative algorithm for solving the optimization problem is developed. The estimator is compared to a number of other covariance estimators and is shown to do best, both in simulations and on a real data example. Simulations show that the margin by which the estimator outperforms its competitors tends to increase with dimension. 1. Introduction. Estimating
Optimal rates of convergence for covariance matrix estimation
 Ann. Statist
, 2010
"... Covariance matrix plays a central role in multivariate statistical analysis. Significant advances have been made recently on developing both theory and methodology for estimating large covariance matrices. However, a minimax theory has yet been developed. In this paper we establish the optimal rates ..."
Abstract

Cited by 27 (5 self)
 Add to MetaCart
Covariance matrix plays a central role in multivariate statistical analysis. Significant advances have been made recently on developing both theory and methodology for estimating large covariance matrices. However, a minimax theory has yet been developed. In this paper we establish the optimal rates of convergence for estimating the covariance matrix under both the operator norm and Frobenius norm. It is shown that optimal procedures under the two norms are different and consequently matrix estimation under the operator norm is fundamentally different from vector estimation. The minimax upper bound is obtained by constructing a special class of tapering estimators and by studying their risk properties. A key step in obtaining the optimal rate of convergence is the derivation of the minimax lower bound. The technical analysis requires new ideas that are quite different from those used in the more conventional function/sequence estimation problems. 1. Introduction. Suppose
A SELECTIVE OVERVIEW OF VARIABLE SELECTION IN HIGH DIMENSIONAL FEATURE SPACE
, 2010
"... High dimensional statistical problems arise from diverse fields of scientific research and technological development. Variable selection plays a pivotal role in contemporary statistical learning and scientific discoveries. The traditional idea of best subset selection methods, which can be regarded ..."
Abstract

Cited by 23 (4 self)
 Add to MetaCart
High dimensional statistical problems arise from diverse fields of scientific research and technological development. Variable selection plays a pivotal role in contemporary statistical learning and scientific discoveries. The traditional idea of best subset selection methods, which can be regarded as a specific form of penalized likelihood, is computationally too expensive for many modern statistical applications. Other forms of penalized likelihood methods have been successfully developed over the last decade to cope with high dimensionality. They have been widely applied for simultaneously selecting important variables and estimating their effects in high dimensional statistical inference. In this article, we present a brief account of the recent developments of theory, methods, and implementations for high dimensional variable selection. What limits of the dimensionality such methods can handle, what the role of penalty functions is, and what the statistical properties are rapidly drive the advances of the field. The properties of nonconcave penalized likelihood and its roles in high dimensional statistical modeling are emphasized. We also review some recent advances in ultrahigh dimensional variable selection, with emphasis on independence screening and twoscale methods.
Latent Variable Graphical Model Selection via Convex Optimization
, 2010
"... Suppose we have samples of a subset of a collection of random variables. No additional information is provided about the number of latent variables, nor of the relationship between the latent and observed variables. Is it possible to discover the number of hidden components, and to learn a statistic ..."
Abstract

Cited by 22 (2 self)
 Add to MetaCart
Suppose we have samples of a subset of a collection of random variables. No additional information is provided about the number of latent variables, nor of the relationship between the latent and observed variables. Is it possible to discover the number of hidden components, and to learn a statistical model over the entire collection of variables? We address this question in the setting in which the latent and observed variables are jointly Gaussian, with the conditional statistics of the observed variables conditioned on the latent variables being specified by a graphical model. As a first step we give natural conditions under which such latentvariable Gaussian graphical models are identifiable given marginal statistics of only the observed variables. Essentially these conditions require that the conditional graphical model among the observed variables is sparse, while the effect of the latent variables is “spread out ” over most of the observed variables. Next we propose a tractable convex program based on regularized maximumlikelihood for model selection in this latentvariable setting; the regularizer uses both the ℓ1 norm and the nuclear norm. Our modeling framework can be viewed as a combination of dimensionality reduction (to identify latent variables) and graphical modeling (to capture remaining statistical structure not attributable to the latent variables), and it consistently estimates both the number of hidden components and the conditional graphical model structure among the observed variables. These results are applicable in the highdimensional setting in which the number of latent/observed variables grows with the number of samples of the observed variables. The geometric properties of the algebraic varieties of sparse matrices and of lowrank matrices play an important role in our analysis.
Sparse inverse covariance matrix estimation via linear programming
, 2010
"... This paper considers the problem of estimating a high dimensional inverse covariance matrix that can be well approximated by “sparse ” matrices. Taking advantage of the connection between multivariate linear regression and entries of the inverse covariance matrix, we propose an estimating procedure ..."
Abstract

Cited by 12 (1 self)
 Add to MetaCart
This paper considers the problem of estimating a high dimensional inverse covariance matrix that can be well approximated by “sparse ” matrices. Taking advantage of the connection between multivariate linear regression and entries of the inverse covariance matrix, we propose an estimating procedure that can effectively exploit such “sparsity”. The proposed method can be computed using linear programming and therefore has the potential to be used in very high dimensional problems. Oracle inequalities are established for the estimation error in terms of several operator norms, showing that the method is adaptive to different types of sparsity of the problem.
A path following algorithm for Sparse PseudoLikelihood Inverse Covariance Estimation (SPLICE)
, 2008
"... Given n observations of a pdimensional random vector, the covariance matrix and its inverse (precision matrix) are needed in a wide range of applications. Sample covariance (e.g. its eigenstructure) can misbehave when p is comparable to the sample size n. Regularization is often used to mitigate th ..."
Abstract

Cited by 11 (0 self)
 Add to MetaCart
Given n observations of a pdimensional random vector, the covariance matrix and its inverse (precision matrix) are needed in a wide range of applications. Sample covariance (e.g. its eigenstructure) can misbehave when p is comparable to the sample size n. Regularization is often used to mitigate the problem. In this paper, we proposed an ℓ1 penalized pseudolikelihood estimate for the inverse covariance matrix. This estimate is sparse due to the ℓ1 penalty, and we term this method SPLICE. Its regularization path can be computed via an algorithm based on the homotopy/LARSLasso algorithm. Simulation studies are carried out for various inverse covariance structures for p = 15 and n = 20, 1000. We compare SPLICE with the ℓ1 penalized likelihood estimate and a ℓ1 penalized Cholesky decomposition based method. SPLICE gives the best overall performance in terms of three metrics on the precision matrix and ROC curve for model selection. Moreover, our simulation results demonstrate that the SPLICE estimates are positivedefinite for most of the regularization path even though the restriction is not enforced.