Results 1  10
of
60
The eigenvalues and eigenvectors of finite, low rank perturbations of large random matrices
, 2011
"... ..."
(Show Context)
Sparse principal component analysis and iterative thresholding, The Annals of Statistics 41
"... ar ..."
(Show Context)
Optimal detection of sparse principal components in high dimension
, 2013
"... We perform a finite sample analysis of the detection levels for sparse principal components of a highdimensional covariance matrix. Our minimax optimal test is based on a sparse eigenvalue statistic. Alas, computing this test is known to be NPcomplete in general, and we describe a computationally ..."
Abstract

Cited by 38 (5 self)
 Add to MetaCart
We perform a finite sample analysis of the detection levels for sparse principal components of a highdimensional covariance matrix. Our minimax optimal test is based on a sparse eigenvalue statistic. Alas, computing this test is known to be NPcomplete in general, and we describe a computationally efficient alternative test using convex relaxations. Our relaxation is also proved to detect sparse principal components at near optimal detection levels, and it performs well on simulated datasets. Moreover, using polynomial time reductions from theoretical computer science, we bring significant evidence that our results cannot be improved, thus revealing an inherent trade off between statistical and computational performance.
Supplement to “Asymptotic power of sphericity tests for highdimensional data.” DOI:10.1214/13AOS1100SUPP
, 2013
"... ar ..."
Minimax rates of estimation for sparse PCA in high dimensions
, 2012
"... We study sparse principal components analysis in the highdimensional setting, where p (the number of variables) can be much larger than n (the number of observations). We prove optimal, nonasymptotic lower and upper bounds on the minimax estimation error for the leading eigenvector when it belongs ..."
Abstract

Cited by 26 (2 self)
 Add to MetaCart
(Show Context)
We study sparse principal components analysis in the highdimensional setting, where p (the number of variables) can be much larger than n (the number of observations). We prove optimal, nonasymptotic lower and upper bounds on the minimax estimation error for the leading eigenvector when it belongs to an ℓq ball for q ∈ [0, 1]. Our bounds are sharp in p and n for all q ∈ [0, 1] over a wide class of distributions. The upper bound is obtained by analyzing the performance of ℓqconstrained PCA. In particular, our results provide convergence rates for ℓ1constrained PCA. 1
The singular values and vectors of low rank perturbations of large rectangular random matrices
 J. Multivariate Anal
"... Abstract. In this paper, we consider the singular values and singular vectors of finite, low rank perturbations of large rectangular random matrices. Specifically, we prove almost sure convergence of the extreme singular values and appropriate projections of the corresponding singular vectors of the ..."
Abstract

Cited by 25 (6 self)
 Add to MetaCart
(Show Context)
Abstract. In this paper, we consider the singular values and singular vectors of finite, low rank perturbations of large rectangular random matrices. Specifically, we prove almost sure convergence of the extreme singular values and appropriate projections of the corresponding singular vectors of the perturbed matrix. As in the prequel, where we considered the eigenvalues of Hermitian matrices, the nonrandom limiting value is shown to depend explicitly on the limiting singular value distribution of the unperturbed matrix via an integral transform that linearizes rectangular additive convolution in free probability theory. The asymptotic position of the extreme singular values of the perturbed matrix differs from that of the original matrix if and only if the singular values of the perturbing matrix are above a certain critical threshold which depends on this same aforementioned integral transform. We examine the consequence of this singular value phase transition on the associated left and rightsingulareigenvectorsand discuss the fluctuations aroundthese nonrandom limits. 1.
NonParametric Detection of Signals by Information Theoretic Criteria: Performance Analysis and an Improved Estimator
, 2009
"... Determining the number of sources is a fundamental problem in many scientific fields. In this paper we consider the nonparametric setting, and focus on the detection performance of two popular estimators based on information theoretic criteria, the Akaike information criterion (AIC) and minimum des ..."
Abstract

Cited by 23 (3 self)
 Add to MetaCart
(Show Context)
Determining the number of sources is a fundamental problem in many scientific fields. In this paper we consider the nonparametric setting, and focus on the detection performance of two popular estimators based on information theoretic criteria, the Akaike information criterion (AIC) and minimum description length (MDL). We present three contributions on this subject. First, we derive a new expression for the detection performance of the MDL estimator, which exhibits a much closer fit to simulations in comparison to previous formulas. Second, we present a random matrix theory viewpoint of the performance of the AIC estimator, including approximate analytical formulas for its overestimation probability. Finally, we show that a small increase in the penalty term of AIC leads to an estimator with a very good detection performance and a negligible overestimation probability.
Recursive robust pca or recursive sparse recovery in large but structured noise”, arXiv: 1211.3754 [cs.IT
, 2012
"... We study the recursive robust principal components ’ analysis (PCA) problem. Here, “robust ” refers to robustness to both independent and correlated sparse outliers. If the outlier is the signalofinterest, this problem can be interpreted as one of recursively recovering a time sequence of sparse v ..."
Abstract

Cited by 22 (17 self)
 Add to MetaCart
(Show Context)
We study the recursive robust principal components ’ analysis (PCA) problem. Here, “robust ” refers to robustness to both independent and correlated sparse outliers. If the outlier is the signalofinterest, this problem can be interpreted as one of recursively recovering a time sequence of sparse vectors, St, in the presence of large but structured noise, Lt: the noise needs to lie in a “slowly changing ” low dimensional subspace. We study a novel solution called Recursive Projected CS (ReProCS). Under mild assumptions, we show that, with high probability (w.h.p.), at all times, ReProCS can exactly recover the support set of St; and the reconstruction errors of both St andLt are upper bounded by a timeinvariant and small value. Index Terms — robust PCA, compressive sensing 1.
Treelets — An Adaptive MultiScale Basis for Sparse Unordered Data
"... In many modern applications, including analysis of gene expression and text documents, the data are noisy, highdimensional, and unordered — with no particular meaning to the given order of the variables. Yet, successful learning is often possible due to sparsity: the fact that the data are typicall ..."
Abstract

Cited by 16 (2 self)
 Add to MetaCart
(Show Context)
In many modern applications, including analysis of gene expression and text documents, the data are noisy, highdimensional, and unordered — with no particular meaning to the given order of the variables. Yet, successful learning is often possible due to sparsity: the fact that the data are typically redundant with underlying structures that can be represented by only a few features. In this paper, we present treelets — a novel construction of multiscale bases that extends wavelets to nonsmooth signals. The method is fully adaptive, as it returns a hierarchical tree and an orthonormal basis which both reflect the internal structure of the data. Treelets are especially wellsuited as a dimensionality reduction and feature selection tool prior to regression and classification, in situations where sample sizes are small and the data are sparse with unknown groupings of correlated or collinear variables. The method is also simple to implement and analyze theoretically. Here we describe a variety of situations where treelets perform better than principal component analysis as well as some common variable selection and cluster averaging schemes. We illustrate treelets on a blocked covariance model and on several data sets (hyperspectral image data, DNA microarray data, and internet advertisements) with highly complex dependencies between variables. 1
Refined Perturbation Bounds for Eigenvalues of Hermitian and NonHermitian Matrices, to appear
 SIAM J. Matrix Analysis
, 2008
"... Abstract. We present eigenvalue bounds for perturbations of Hermitian matrices, and express the change in eigenvalues in terms of a projection of the perturbation onto a particular eigenspace, rather than in terms of the full perturbation. The perturbations we consider are Hermitian of rank one, and ..."
Abstract

Cited by 13 (1 self)
 Add to MetaCart
(Show Context)
Abstract. We present eigenvalue bounds for perturbations of Hermitian matrices, and express the change in eigenvalues in terms of a projection of the perturbation onto a particular eigenspace, rather than in terms of the full perturbation. The perturbations we consider are Hermitian of rank one, and Hermitian or nonHermitian with norm smaller than the spectral gap of a specific eigenvalue. Applications include principal component analysis under a spiked covariance model, and pseudo arclength continuation methods for the solution of nonlinear systems.