Results 1  10
of
69
A unified framework for highdimensional analysis of Mestimators with decomposable regularizers
"... ..."
Covariance regularization by thresholding
, 2007
"... This paper considers regularizing a covariance matrix of p variables estimated from n observations, by hard thresholding. We show that the thresholded estimate is consistent in the operator norm as long as the true covariance matrix is sparse in a suitable sense, the variables are Gaussian or subGa ..."
Abstract

Cited by 148 (11 self)
 Add to MetaCart
(Show Context)
This paper considers regularizing a covariance matrix of p variables estimated from n observations, by hard thresholding. We show that the thresholded estimate is consistent in the operator norm as long as the true covariance matrix is sparse in a suitable sense, the variables are Gaussian or subGaussian, and (log p)/n → 0, and obtain explicit rates. The results are uniform over families of covariance matrices which satisfy a fairly natural notion of sparsity. We discuss an intuitive resampling scheme for threshold selection and prove a general crossvalidation result that justifies this approach. We also compare thresholding to other covariance estimators in simulations and on an example from climate data. 1. Introduction. Estimation
High dimensional analysis of semidefinite relaxations for sparse principal component analysis
, 2008
"... Principal component analysis (PCA) is a classical method for dimensionality reduction based on extracting the dominant eigenvectors of the sample covariance matrix. However, PCA is well known to behave poorly in the “large p, small n ” setting, in which the problem dimension p is comparable to or la ..."
Abstract

Cited by 85 (5 self)
 Add to MetaCart
(Show Context)
Principal component analysis (PCA) is a classical method for dimensionality reduction based on extracting the dominant eigenvectors of the sample covariance matrix. However, PCA is well known to behave poorly in the “large p, small n ” setting, in which the problem dimension p is comparable to or larger than the sample size n. This paper studies PCA in this highdimensional regime, but under the additional assumption that the maximal eigenvector is sparse, say with at most k nonzero components. We analyze two computationally tractable methods for recovering the support of this maximal eigenvector: (a) a simple diagonal cutoff method, which transitions from success to failure as a function of the order parameter θdia(n, p, k) = n/[k 2 log(p − k)]; and (b) a more sophisticated semidefinite programming (SDP) relaxation, which succeeds once the order parameter θsdp(n, p, k) = n/[k log(p − k)] is larger than a critical threshold. Our results thus highlight an interesting tradeoff between computational and statistical efficiency in highdimensional inference.
Latent Variable Graphical Model Selection via Convex Optimization
, 2010
"... Suppose we have samples of a subset of a collection of random variables. No additional information is provided about the number of latent variables, nor of the relationship between the latent and observed variables. Is it possible to discover the number of hidden components, and to learn a statistic ..."
Abstract

Cited by 76 (4 self)
 Add to MetaCart
Suppose we have samples of a subset of a collection of random variables. No additional information is provided about the number of latent variables, nor of the relationship between the latent and observed variables. Is it possible to discover the number of hidden components, and to learn a statistical model over the entire collection of variables? We address this question in the setting in which the latent and observed variables are jointly Gaussian, with the conditional statistics of the observed variables conditioned on the latent variables being specified by a graphical model. As a first step we give natural conditions under which such latentvariable Gaussian graphical models are identifiable given marginal statistics of only the observed variables. Essentially these conditions require that the conditional graphical model among the observed variables is sparse, while the effect of the latent variables is “spread out ” over most of the observed variables. Next we propose a tractable convex program based on regularized maximumlikelihood for model selection in this latentvariable setting; the regularizer uses both the ℓ1 norm and the nuclear norm. Our modeling framework can be viewed as a combination of dimensionality reduction (to identify latent variables) and graphical modeling (to capture remaining statistical structure not attributable to the latent variables), and it consistently estimates both the number of hidden components and the conditional graphical model structure among the observed variables. These results are applicable in the highdimensional setting in which the number of latent/observed variables grows with the number of samples of the observed variables. The geometric properties of the algebraic varieties of sparse matrices and of lowrank matrices play an important role in our analysis.
Optimal detection of sparse principal components in high dimension
, 2013
"... We perform a finite sample analysis of the detection levels for sparse principal components of a highdimensional covariance matrix. Our minimax optimal test is based on a sparse eigenvalue statistic. Alas, computing this test is known to be NPcomplete in general, and we describe a computationally ..."
Abstract

Cited by 42 (4 self)
 Add to MetaCart
We perform a finite sample analysis of the detection levels for sparse principal components of a highdimensional covariance matrix. Our minimax optimal test is based on a sparse eigenvalue statistic. Alas, computing this test is known to be NPcomplete in general, and we describe a computationally efficient alternative test using convex relaxations. Our relaxation is also proved to detect sparse principal components at near optimal detection levels, and it performs well on simulated datasets. Moreover, using polynomial time reductions from theoretical computer science, we bring significant evidence that our results cannot be improved, thus revealing an inherent trade off between statistical and computational performance.
OPTIMAL RATES OF CONVERGENCE FOR SPARSE COVARIANCE MATRIX ESTIMATION
 SUBMITTED TO THE ANNALS OF STATISTICS
"... This paper considers estimation ofsparse covariance matrices and establishes the optimal rate of convergence under a range of matrix operator norm and Bregman divergence losses. A major focus is on the derivation of a rate sharp minimax lower bound. The problem exhibits new features that are signifi ..."
Abstract

Cited by 34 (10 self)
 Add to MetaCart
This paper considers estimation ofsparse covariance matrices and establishes the optimal rate of convergence under a range of matrix operator norm and Bregman divergence losses. A major focus is on the derivation of a rate sharp minimax lower bound. The problem exhibits new features that are significantly different from those that occur in the conventional nonparametric function estimation problems. Standard techniques fail to yield good results and new tools are thus needed. We first develop a lower bound technique that is particularly well suited for treating “twodirectional ” problems such as estimating sparse covariance matrices. The result can be viewed as a generalization of Le Cam’s method in one direction and Assouad’s Lemma in another. This lower bound technique is of independent interest and can be used for other matrix estimation problems. We then establish a rate sharp minimax lower bound for estimating sparse covariance matrices under the spectral norm by applying the general lower bound technique. A thresholding estimator is shown to attain the optimal rate of convergence under the spectral norm. The results are then extended to the general matrix ℓw operator norms for 1�w��. In addition, we give a unified result on the minimax rate of convergence for sparse covariance matrix estimation under a class of Bregman divergence losses.
A constrained ℓ1minimization approach to sparse precision matrix estimation
 J. Amer. Statist. Assoc
, 2011
"... ar ..."
Minimax rates of estimation for sparse PCA in high dimensions
, 2012
"... We study sparse principal components analysis in the highdimensional setting, where p (the number of variables) can be much larger than n (the number of observations). We prove optimal, nonasymptotic lower and upper bounds on the minimax estimation error for the leading eigenvector when it belongs ..."
Abstract

Cited by 29 (3 self)
 Add to MetaCart
We study sparse principal components analysis in the highdimensional setting, where p (the number of variables) can be much larger than n (the number of observations). We prove optimal, nonasymptotic lower and upper bounds on the minimax estimation error for the leading eigenvector when it belongs to an ℓq ball for q ∈ [0, 1]. Our bounds are sharp in p and n for all q ∈ [0, 1] over a wide class of distributions. The upper bound is obtained by analyzing the performance of ℓqconstrained PCA. In particular, our results provide convergence rates for ℓ1constrained PCA. 1
Estimation of simultaneously sparse and low rank matrices
 In Proc. ICML
, 2012
"... The paper introduces a penalized matrix estimation procedure aiming at solutions which are sparse and lowrank at the same time. Such structures arise in the context of social networks or protein interactions where underlying graphs have adjacency matrices which are blockdiagonal in the appropria ..."
Abstract

Cited by 27 (4 self)
 Add to MetaCart
The paper introduces a penalized matrix estimation procedure aiming at solutions which are sparse and lowrank at the same time. Such structures arise in the context of social networks or protein interactions where underlying graphs have adjacency matrices which are blockdiagonal in the appropriate basis. We introduce a convex mixed penalty which involves `1norm and trace norm simultaneously. We obtain an oracle inequality which indicates how the two effects interact according to the nature of the target matrix. We bound generalization error in the link prediction problem. We also develop proximal descent strategies to solve the optimization problem efficiently and evaluate performance on synthetic and real data sets. 1.