Results 1 - 10
of
79
Consistency of spectral clustering
, 2004
"... Consistency is a key property of statistical algorithms, when the data is drawn from some underlying probability distribution. Surprisingly, despite decades of work, little is known about consistency of most clustering algorithms. In this paper we investigate consistency of a popular family of spe ..."
Abstract
-
Cited by 170 (11 self)
- Add to MetaCart
Consistency is a key property of statistical algorithms, when the data is drawn from some underlying probability distribution. Surprisingly, despite decades of work, little is known about consistency of most clustering algorithms. In this paper we investigate consistency of a popular family of spectral clustering algorithms, which cluster the data with the help of eigenvectors of graph Laplacian matrices. We show that one of the two of major classes of spectral clustering (normalized clustering) converges under some very general conditions, while the other (unnormalized), is only consistent under strong additional assumptions, which, as we demonstrate, are not always satisfied in real data. We conclude that our analysis provides strong evidence for the superiority of normalized spectral clustering in practical applications. We believe that methods used in our analysis will provide a basis for future exploration of Laplacian-based methods in a statistical setting.
On the distribution of the largest eigenvalue in principal components analysis
- Ann. Statist
, 2001
"... Let x �1 � denote the square of the largest singular value of an n × p matrix X, all of whose entries are independent standard Gaussian variates. Equivalently, x �1 � is the largest principal component variance of the covariance matrix X ′ X, or the largest eigenvalue of a p-variate Wishart distribu ..."
Abstract
-
Cited by 119 (1 self)
- Add to MetaCart
Let x �1 � denote the square of the largest singular value of an n × p matrix X, all of whose entries are independent standard Gaussian variates. Equivalently, x �1 � is the largest principal component variance of the covariance matrix X ′ X, or the largest eigenvalue of a p-variate Wishart distribution on n degrees of freedom with identity covariance. Consider the limit of large p and n with n/p = γ ≥ 1. When centered by µ p = � √ n − 1 + √ p � 2 and scaled by σ p = � √ n − 1 + √ p��1 / √ n − 1 + 1 / √ p � 1/3 � the distribution of x �1 � approaches the Tracy–Widom lawof order 1, which is defined in terms of the Painlevé II differential equation and can be numerically evaluated and tabulated in software. Simulations showthe approximation to be informative for n and p as small as 5. The limit is derived via a corresponding result for complex Wishart matrices using methods from random matrix theory. The result suggests that some aspects of large p multivariate distribution theory may be easier to apply in practice than their fixed p counterparts. 1. Introduction. The
On Model Selection Consistency of Lasso
, 2006
"... Sparsity or parsimony of statistical models is crucial for their proper interpretations, as in sciences and social sciences. Model selection is a commonly used method to find such models, but usually involves a computationally heavy combinatorial search. Lasso (Tibshirani, 1996) is now being used ..."
Abstract
-
Cited by 112 (9 self)
- Add to MetaCart
Sparsity or parsimony of statistical models is crucial for their proper interpretations, as in sciences and social sciences. Model selection is a commonly used method to find such models, but usually involves a computationally heavy combinatorial search. Lasso (Tibshirani, 1996) is now being used as a computationally feasible alternative to model selection.
Concentration of the Spectral Measure for Large Matrices
, 2000
"... We derive concentration inequalities for functions of the empirical measure of eigenvalues for large, random, self adjoint matrices, with not necessarily Gaussian entries. The results presented apply in particular to non-Gaussian Wigner and Wishart matrices. We also provide concentration bounds for ..."
Abstract
-
Cited by 44 (8 self)
- Add to MetaCart
We derive concentration inequalities for functions of the empirical measure of eigenvalues for large, random, self adjoint matrices, with not necessarily Gaussian entries. The results presented apply in particular to non-Gaussian Wigner and Wishart matrices. We also provide concentration bounds for non commutative functionals of random matrices. 1 Introduction and statement of results Consider a random N N Hermitian matrix X with i.i.d. complex entries (except for the symmetry constraint) satisfying a moment condition. It is well known since Wigner [28] that the spectral measure of N 1=2 X converges to the semicircle law. This observation has been generalized to a large class of matrices, e.g. sample covariance matrices of the form XRX where R is a deterministic diagonal matrix ([19]), band matrices (see [5, 16, 20]), etc. For the Wigner case, this convergence has been supplemented by Central Limit Theorems, see [15] for the case of Gaussian entries and [17], [22] for the gen...
High-SNR power offset in multiantenna communication
- IEEE Transactions on Information Theory
, 2005
"... Abstract—The analysis of the multiple-antenna capacity in the high- regime has hitherto focused on the high- slope (or maximum multiplexing gain), which quantifies the multiplicative increase as a function of the number of antennas. This traditional characterization is unable to assess the impact of ..."
Abstract
-
Cited by 43 (10 self)
- Add to MetaCart
Abstract—The analysis of the multiple-antenna capacity in the high- regime has hitherto focused on the high- slope (or maximum multiplexing gain), which quantifies the multiplicative increase as a function of the number of antennas. This traditional characterization is unable to assess the impact of prominent channel features since, for a majority of channels, the slope equals the minimum of the number of transmit and receive antennas. Furthermore, a characterization based solely on the slope captures only the scaling but it has no notion of the power required for a certain capacity. This paper advocates a more refined characterization whereby, as a function of �f, the high- capacity is expanded as an affine function where the impact of channel features such as antenna correlation, unfaded components, etc., resides in the zero-order term or power offset. The power offset, for which we find insightful closed-form expressions, is shown to play a chief role for levels of practical interest. Index Terms—Antenna correlation, channel capacity, coherent communication, fading channels, high- analysis, multiantenna arrays, Ricean channels.
A note on universality of the distribution of the largest eigenvalues in certain sample covariance matrices
- J. Statist. Phys
, 2002
"... Recently Johansson (21) and Johnstone (16) proved that the distribution of the (properly rescaled) largest principal component of the complex (real) Wishart matrix X g X(X t X) converges to the Tracy–Widom law as n, p (the dimensions of X) tend to. in some ratio n/p Q c>0.We extend these results in ..."
Abstract
-
Cited by 43 (3 self)
- Add to MetaCart
Recently Johansson (21) and Johnstone (16) proved that the distribution of the (properly rescaled) largest principal component of the complex (real) Wishart matrix X g X(X t X) converges to the Tracy–Widom law as n, p (the dimensions of X) tend to. in some ratio n/p Q c>0.We extend these results in two directions. First of all, we prove that the joint distribution of the first, second, third, etc. eigenvalues of a Wishart matrix converges (after a proper rescaling) to the Tracy–Widom distribution. Second of all, we explain how the combinatorial machinery developed for Wigner random matrices in refs. 27, 38, and 39 allows to extend the results by Johansson and Johnstone to the case of X with non-Gaussian entries, provided n − p=O(p 1/3). We also prove that l max [ (n 1/2 +p 1/2) 2 +O(p 1/2 log(p)) (a.e.) for general c>0. KEY WORDS: Sample covariance matrices; principal component; Tracy– Widom distribution.
Sure independence screening for ultra-high dimensional feature space
, 2006
"... Variable selection plays an important role in high dimensional statistical modeling which nowa-days appears in many areas and is key to various scientific discoveries. For problems of large scale or dimensionality p, estimation accuracy and computational cost are two top concerns. In a recent paper, ..."
Abstract
-
Cited by 32 (3 self)
- Add to MetaCart
Variable selection plays an important role in high dimensional statistical modeling which nowa-days appears in many areas and is key to various scientific discoveries. For problems of large scale or dimensionality p, estimation accuracy and computational cost are two top concerns. In a recent paper, Candes and Tao (2007) propose the Dantzig selector using L1 regularization and show that it achieves the ideal risk up to a logarithmic factor log p. Their innovative procedure and remarkable result are challenged when the dimensionality is ultra high as the factor log p can be large and their uniform uncertainty principle can fail. Motivated by these concerns, we introduce the concept of sure screening and propose a sure screening method based on a correlation learning, called the Sure Independence Screening (SIS), to reduce dimensionality from high to a moderate scale that is below sample size. In a fairly general asymptotic framework, the SIS is shown to have the sure screening property for even exponentially growing dimensionality. As a methodological extension, an iterative SIS (ISIS) is also proposed to enhance its finite sample performance. With dimension reduced accurately from high to below sample size, variable selection can be improved on both speed and accuracy, and can then be ac-
On the Distribution of the Largest Principal Component
- ANN. STATIST
, 2000
"... Let x (1) denote square of the largest singular value of an n p matrix X, all of whose entries are independent standard Gaussian variates. Equivalently, x (1) is the largest principal component of the covariance matrix X 0 X, or the largest eigenvalue of a p variate Wishart distribution on n degr ..."
Abstract
-
Cited by 31 (0 self)
- Add to MetaCart
Let x (1) denote square of the largest singular value of an n p matrix X, all of whose entries are independent standard Gaussian variates. Equivalently, x (1) is the largest principal component of the covariance matrix X 0 X, or the largest eigenvalue of a p variate Wishart distribution on n degrees of freedom with identity covariance. Consider the limit of large p and n with n=p = 1: When centered by p = ( p n 1+ p p) 2 and scaled by p = ( p n 1+ p p)(1= p n 1+1= p p) 1=3 the distribution of x (1) approaches the Tracy-Widom law of order 1, which is dened in terms of the Painleve II dierential equation, and can be numerically evaluated and tabulated in software. Simulations show the approximation to be informative for n and p as small as 5. The limit is derived via a corresponding result for complex Wishart matrices using methods from random matrix theory. The result suggests that some aspects of large p multivariate distribution theory may be easier to ...
CLT for Linear Spectral Statistics of Large Dimensional Sample Covariance Matrices
, 2003
"... This paper shows their of rate of convergence to be 1/n by proving, after proper scaling, they form a tight sequence. Moreover, if EX 11 =0andE|X11 =2, or if X11 and T n are real and EX 11 = 3, they are shown to have Gaussian limits ..."
Abstract
-
Cited by 22 (0 self)
- Add to MetaCart
This paper shows their of rate of convergence to be 1/n by proving, after proper scaling, they form a tight sequence. Moreover, if EX 11 =0andE|X11 =2, or if X11 and T n are real and EX 11 = 3, they are shown to have Gaussian limits
Capacity Scaling and Spectral Efficiency in Wideband Correlated MIMO Channels
- IEEE TRANSACTIONS ON INFORMATION THEORY
, 2002
"... The dramatic linear increase in ergodic capacity with the number of antennas promised by MIMO wireless communication systems is based on idealized channel models representing a rich scattering environment. Is such scaling sustainable in realistic scattering scenarios? Existing physical models, altho ..."
Abstract
-
Cited by 19 (2 self)
- Add to MetaCart
The dramatic linear increase in ergodic capacity with the number of antennas promised by MIMO wireless communication systems is based on idealized channel models representing a rich scattering environment. Is such scaling sustainable in realistic scattering scenarios? Existing physical models, although realistic, are intractable for addressing this problem analytically due to their complicated nonlinear dependence on propagation path parameters, such as the angles of arrival and delays. In this paper, we leverage a recently introduced virtual representation of physical models that is essentially a Fourier series representation of wideband MIMO channels in terms of fixed virtual angles and delays. Motivated by physical considerations, we propose a -connected model for correlated channels defined by a virtual spatial channel matrix consisting of non-vanishing diagonals with i.i.d. Gaussian entries. The parameter provides a meaningful and tractable measure of the richness of scattering. We derive general bounds for the coherent ergodic capacity and investigate capacity scaling with the number of antennas and bandwidth. In the large antenna regime, we show that linear capacity scaling is possible if scales linearly with the number of antennas. This, in turn, is possible if the number of resolvable paths grows quadratically with the number of antennas. The capacity saturates for linear growth in the number of paths (fixed ). The ergodic capacity does not depend on frequency selectivity of the channel in the wideband case. Increasing bandwidth tightens the bounds and hastens the convergence of scaling behavior. For large bandwidth, the capacity scales linearly with SNR as well. We also provide an explicit characterization of the wideband slope recently proposed by Verdu. Nume...

