Results 1  10
of
92
On the distribution of the largest eigenvalue in principal components analysis
 ANN. STATIST
, 2001
"... Let x �1 � denote the square of the largest singular value of an n × p matrix X, all of whose entries are independent standard Gaussian variates. Equivalently, x �1 � is the largest principal component variance of the covariance matrix X ′ X, or the largest eigenvalue of a pvariate Wishart distribu ..."
Abstract

Cited by 422 (4 self)
 Add to MetaCart
Let x �1 � denote the square of the largest singular value of an n × p matrix X, all of whose entries are independent standard Gaussian variates. Equivalently, x �1 � is the largest principal component variance of the covariance matrix X ′ X, or the largest eigenvalue of a pvariate Wishart distribution on n degrees of freedom with identity covariance. Consider the limit of large p and n with n/p = γ ≥ 1. When centered by µ p = � √ n − 1 + √ p � 2 and scaled by σ p = � √ n − 1 + √ p��1 / √ n − 1 + 1 / √ p � 1/3 � the distribution of x �1 � approaches the Tracy–Widom lawof order 1, which is defined in terms of the Painlevé II differential equation and can be numerically evaluated and tabulated in software. Simulations showthe approximation to be informative for n and p as small as 5. The limit is derived via a corresponding result for complex Wishart matrices using methods from random matrix theory. The result suggests that some aspects of large p multivariate distribution theory may be easier to apply in practice than their fixed p counterparts.
Phase transition of the largest eigenvalue for nonnull complex sample covariance matrices
, 2008
"... ..."
Eigenvalues of large sample covariance matrices of spiked population models
, 2006
"... We consider a spiked population model, proposed by Johnstone, whose population eigenvalues are all unit except for a few fixed eigenvalues. The question is to determine how the sample eigenvalues depend on the nonunit population ones when both sample size and population size become large. This pape ..."
Abstract

Cited by 163 (8 self)
 Add to MetaCart
(Show Context)
We consider a spiked population model, proposed by Johnstone, whose population eigenvalues are all unit except for a few fixed eigenvalues. The question is to determine how the sample eigenvalues depend on the nonunit population ones when both sample size and population size become large. This paper completely determines the almost sure limits for a general class of samples. 1
The LittlewoodOfford problem and invertibility of random matrices.
 Adv. Math.
, 2008
"... Abstract We prove two basic conjectures on the distribution of the smallest singular value of random n×n matrices with independent entries. Under minimal moment assumptions, we show that the smallest singular value is of order n −1/2 , which is optimal for Gaussian matrices. Moreover, we give a opt ..."
Abstract

Cited by 105 (18 self)
 Add to MetaCart
(Show Context)
Abstract We prove two basic conjectures on the distribution of the smallest singular value of random n×n matrices with independent entries. Under minimal moment assumptions, we show that the smallest singular value is of order n −1/2 , which is optimal for Gaussian matrices. Moreover, we give a optimal estimate on the tail probability. This comes as a consequence of a new and essentially sharp estimate in the LittlewoodOfford problem: for i.i.d. random variables X k and real numbers a k , determine the probability p that the sum k a k X k lies near some number v. For arbitrary coefficients a k of the same order of magnitude, we show that they essentially lie in an arithmetic progression of length 1/p. Published by Elsevier Inc.
Nonasymptotic theory of random matrices: extreme singular values
 PROCEEDINGS OF THE INTERNATIONAL CONGRESS OF MATHEMATICIANS
, 2010
"... ..."
On Spectral Learning of Mixtures of Distributions
"... We consider the problem of learning mixtures of distributions via spectral methods and derive a tight characterization of when such methods are useful. Specifically, given a mixturesample, let i , C i , w i denote the empirical mean, covariance matrix, and mixing weight of the ith component. We ..."
Abstract

Cited by 77 (0 self)
 Add to MetaCart
We consider the problem of learning mixtures of distributions via spectral methods and derive a tight characterization of when such methods are useful. Specifically, given a mixturesample, let i , C i , w i denote the empirical mean, covariance matrix, and mixing weight of the ith component. We prove that a very simple algorithm, namely spectral projection followed by singlelinkage clustering, properly classifies every point in the sample when each i is separated from all j by 2 (1/w i +1/w j ) plus a term that depends on the concentration properties of the distributions in the mixture. This second term is very small for many distributions, including Gaussians, Logconcave, and many others. As a result, we get the best known bounds for learning mixtures of arbitrary Gaussians in terms of the required mean separation. On the other hand, we prove that given any k means i and mixing weights w i , there are (many) sets of matrices C i such that each i is separated from all j by 2 (1/w i + 1/w j ) , but applying spectral projection to the corresponding Gaussian mixture causes it to collapse completely, i.e., all means and covariance matrices in the projected mixture are identical.
TracyWidom limit for the largest eigenvalue of a large class of complex sample covariance matrices
 ANN. PROBAB
, 2007
"... We consider the asymptotic fluctuation behavior of the largest eigenvalue of certain sample covariance matrices in the asymptotic regime where both dimensions of the corresponding data matrix go to infinity. More precisely, let X be an n × p matrix, and let its rows be i.i.d. complex normal vectors ..."
Abstract

Cited by 68 (7 self)
 Add to MetaCart
(Show Context)
We consider the asymptotic fluctuation behavior of the largest eigenvalue of certain sample covariance matrices in the asymptotic regime where both dimensions of the corresponding data matrix go to infinity. More precisely, let X be an n × p matrix, and let its rows be i.i.d. complex normal vectors with mean 0 and covariance �p. We show that for a large class of covariance matrices �p, the largest eigenvalue of X ∗ X is asymptotically distributed (after recentering and rescaling) as the Tracy–Widom distribution that appears in the study of the Gaussian unitary ensemble. We give explicit formulas for the centering and scaling sequences that are easy to implement and involve only the spectral distribution of the population covariance, n and p. The main theorem applies to a number of covariance models found in applications. For example, wellbehaved Toeplitz matrices as well as covariance matrices whose spectral distribution is a sum of atoms (under some conditions on the mass of the atoms) are among the models the theorem can handle. Generalizations of the theorem to certain spiked versions of our models and a.s. results about the largest eigenvalue are given. We also discuss a simple corollary that does not require normality of the entries of the data matrix and some consequences for applications in multivariate statistics.
High dimensional statistical inference and random matrices
 IN: PROCEEDINGS OF INTERNATIONAL CONGRESS OF MATHEMATICIANS
, 2006
"... Multivariate statistical analysis is concerned with observations on several variables which are thought to possess some degree of interdependence. Driven by problems in genetics and the social sciences, it first flowered in the earlier half of the last century. Subsequently, random matrix theory ..."
Abstract

Cited by 49 (1 self)
 Add to MetaCart
Multivariate statistical analysis is concerned with observations on several variables which are thought to possess some degree of interdependence. Driven by problems in genetics and the social sciences, it first flowered in the earlier half of the last century. Subsequently, random matrix theory (RMT) developed, initially within physics, and more recently widely in mathematics. While some of the central objects of study in RMT are identical to those of multivariate statistics, statistical theory was slow to exploit the connection. However, with vast data collection ever more common, data sets now often have as many or more variables than the number of individuals observed. In such contexts, the techniques and results of RMT have much to offer multivariate statistics. The paper reviews some of the progress to date.