## Discovering Sparse Covariance Structures with the Isomap

Venue: | Journal of Computational and Graphical Statistics |

Citations: | 6 - 1 self |

### BibTeX

@ARTICLE{Wagaman_discoveringsparse,

author = {A. S. Wagaman and E. Levina},

title = {Discovering Sparse Covariance Structures with the Isomap},

journal = {Journal of Computational and Graphical Statistics},

year = {},

pages = {2009}

}

### OpenURL

### Abstract

Regularization of covariance matrices in high dimensions is usually either based on a known ordering of variables or ignores the ordering entirely. This paper proposes a method for discovering meaningful orderings of variables based on their correlations using the Isomap, a non-linear dimension reduction technique designed for manifold embeddings. These orderings are then used to construct a sparse covariance estimator, which is block-diagonal and/or banded. Finding an ordering to which banding can be applied is desirable because banded estimators have been shown to be consistent in high dimensions. We show that in situations where the variables do have such a structure, the Isomap does very well at discovering it, and the resulting regularized estimator performs better for covariance estimation than other regularization methods that ignore variable order, such as thresholding. We also propose a bootstrap approach to constructing the neighborhood graph used by the Isomap, and show it leads to better estimation. We illustrate our method on data on protein consumption, where the variables (food types) have a structure but it cannot be easily described a priori, and on a gene expression data set.

### Citations

2588 | Normalized Cuts and Image Segmentation
- Shi, Malik
- 2000
(Show Context)
Citation Context ...er of blocks B, apply a clustering method to correlations to obtain B clusters, and construct a block-diagonal estimator with a block per cluster. A graph partitioning algorithm like normalized cuts (=-=Shi and Malik, 2000-=-) or any other such algorithm could be applied as well. We do not pursue this approach here, and note that in our case, the number of blocks B is determined from the data rather than supplied by the u... |

1687 | A global geometric framework for nonlinear dimensionality reduction - Tenenbaum, Silva, et al. |

1613 | Nonlinear dimensionality reduction by locally linear embedding - Roweis, Saul |

1327 |
Rousseeuw: "Finding Groups in Data: An Introduction to Cluster Analysis
- Kaufman, Peter
- 1990
(Show Context)
Citation Context ...er illustrate the differences between the principal components, we projected the data onto the first two principal components and applied agglomerative clustering (bottom-up) via the agnes algorithm (=-=Kaufman and Rousseeuw, 1990-=-) using Euclidean distances between projected data points as dissimilarities. Results for three clusters are shown in Figure 5. Agnes solutions are hierarchical; at the first split, it divides the dat... |

424 | Laplacian eigenmaps and spectral techniques for embedding and clustering - Belkin, Niyogi |

380 | Modern Multidimensional Scaling - Theory and Applications - Borg, Groenen - 1997 |

261 |
Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks
- Khan
- 2001
(Show Context)
Citation Context ...ful reordering of the variables resulting in more meaningful principal components. 5.2 Gene expression data on tumors These data come from a small round blue-cell tumors (SRBC) microarray experiment (=-=Khan et al., 2001-=-). The experiment had 64 training tissue samples, and 2308 gene expression values recorded for each sample. The original dataset had data on 6567 genes and was filtered down by requiring that each gen... |

219 | Correlation clustering
- Bansal, Blum, et al.
- 2004
(Show Context)
Citation Context ...y negatively correlated variables placed as far apart as possible, and positively correlated variables closer together. This case is related to the correlation clustering problem in computer science (=-=Bansal et al., 2002-=-; Demaine and Immorlica, 2003), which aims to partition a weighted graph with positive and negative edge weights so that negative edges are broken up and positive edges are kept together. However, the... |

196 | On the distribution of the largest eigenvalue in principal components analysis
- Johnstone
- 2001
(Show Context)
Citation Context ...sion or concentration matrix. Advances in random matrix theory – from the classical results of Marčenko and Pastur (1967) to the recent work on the theory of the largest eigenvalues and eigenvectors (=-=Johnstone, 2001-=-; Johnstone and Lu, 2004; Paul, 2007; El Karoui, 2007b), and many others – allowed in-depth theoretical studies of the traditional estimator, the sample (empirical) ∗ Revised for the Journal of Comput... |

182 | Hessian eigenmaps: locally linear embedding techniques for high-dimensional data - Donoho, Grimes - 2003 |

178 | Distributions of eigenvalues for some sets of random matrices - Marcenko, Pastur - 1967 |

130 |
A well-conditioned estimator for largedimensional covariance matrices
- Ledoit, Wolf
(Show Context)
Citation Context ...ent shrinkage estimator replaces the sample covariance with a linear combination of the sample covariance and the identity matrix, with optimal (in a suitable sense) coefficients estimated from data (=-=Ledoit and Wolf, 2003-=-). However, shrinking towards the identity matrix does not affect the eigenvectors, and hence cannot be relied on to improve PCA. Shrinkage estimators also do not create sparsity in any sense, and thu... |

97 | Maximum likelihood estimation of intrinsic dimension - Levina, Bickel |

89 | Regularized estimation of large covariance matrics
- Bickel, Levina
- 2008
(Show Context)
Citation Context ... apart are only weakly correlated, and therefore one can improve on the sample covariance by taking advantage of the ordering (Wu and Pourahmadi, 2003; Huang et al., 2006; Furrer and Bengtsson, 2007; =-=Bickel and Levina, 2008-=-; Levina et al., 2008). Consistency and convergence rates have been established for some of these estimators in the highdimensional setting under the normal assumption; see more on this in Section 2. ... |

75 | Sparse permutation invariant covariance estimation - ROTHMAN, BICKEL, et al. - 2008 |

73 |
Some theory for Fisher’s linear discriminant, ‘naive Bayes’, and some alternatives when there are many more variables than observations. Bernoulli 10
- Bickel, Levina
- 2004
(Show Context)
Citation Context ...he sample covariance eigenvalues are over-dispersed and the eigenvectors are not consistent. It has also been shown that classification by LDA breaks down and reduces to random guessing when p/n → ∞ (=-=Bickel and Levina, 2004-=-). These results have demonstrated that alternative ways of estimating the covariance matrix are needed in high dimensions. Regularized covariance estimators proposed as alternatives to the sample cov... |

62 | Covariance regularization by thresholding
- BICKEL, LEVINA
- 2008
(Show Context)
Citation Context ...ion λmin(Σ) ≥ ε > 0. It is easy to check that U ⊂ Uτ with appropriately chosen constants. It is also easy to check that on the subclass U, the rate of banding is better than the rate of thresholding (=-=Bickel and Levina, 2007-=-). Thus, if there is an ordering of the variables that can make the matrix approximately bandable, the theory suggests we can expect to do better than thresholding if we discover that ordering. While ... |

52 | Ghaoui. First-order methods for sparse covariance selection - d’Aspremont, Banerjee, et al. |

51 |
Covariance matrix selection and estimation via penalised normal likelihood
- Huang, Liu, et al.
- 2006
(Show Context)
Citation Context ...n underlying these methods is that variables far apart are only weakly correlated, and therefore one can improve on the sample covariance by taking advantage of the ordering (Wu and Pourahmadi, 2003; =-=Huang et al., 2006-=-; Furrer and Bengtsson, 2007; Bickel and Levina, 2008; Levina et al., 2008). Consistency and convergence rates have been established for some of these estimators in the highdimensional setting under t... |

51 |
Asymptotics of sample eigenstructure for a large dimensional spiked covariance model
- Paul
- 2007
(Show Context)
Citation Context ... random matrix theory – from the classical results of Marčenko and Pastur (1967) to the recent work on the theory of the largest eigenvalues and eigenvectors (Johnstone, 2001; Johnstone and Lu, 2004; =-=Paul, 2007-=-; El Karoui, 2007b), and many others – allowed in-depth theoretical studies of the traditional estimator, the sample (empirical) ∗ Revised for the Journal of Computational and Graphical Statistics. † ... |

47 | Correlation clustering with partial information
- Demaine, Immorlica
(Show Context)
Citation Context ...ed variables placed as far apart as possible, and positively correlated variables closer together. This case is related to the correlation clustering problem in computer science (Bansal et al., 2002; =-=Demaine and Immorlica, 2003-=-), which aims to partition a weighted graph with positive and negative edge weights so that negative edges are broken up and positive edges are kept together. However, the correlation clustering algor... |

47 | Estimation of high-dimensional prior and posterior covariance matrices in Kalman filter variants
- Furrer, Bengtsson
- 2007
(Show Context)
Citation Context ...ethods is that variables far apart are only weakly correlated, and therefore one can improve on the sample covariance by taking advantage of the ordering (Wu and Pourahmadi, 2003; Huang et al., 2006; =-=Furrer and Bengtsson, 2007-=-; Bickel and Levina, 2008; Levina et al., 2008). Consistency and convergence rates have been established for some of these estimators in the highdimensional setting under the normal assumption; see mo... |

44 | Tracy-Widom limit for the largest eigenvalue of a large class of complex sample covariance matrices. The Annals of Probability, 35(2):663–714 - Karoui - 2007 |

40 | High dimensional covariance matrix estimation using a factor model - Fan, Y, et al. - 2007 |

36 |
Nonparametric Estimation of Large Covariance Matrices of Longitudinal Data
- Wu, Pourahmadi
- 2003
(Show Context)
Citation Context ...it regularizing assumption underlying these methods is that variables far apart are only weakly correlated, and therefore one can improve on the sample covariance by taking advantage of the ordering (=-=Wu and Pourahmadi, 2003-=-; Huang et al., 2006; Furrer and Bengtsson, 2007; Bickel and Levina, 2008; Levina et al., 2008). Consistency and convergence rates have been established for some of these estimators in the highdimensi... |

31 | Operator Norm Consistent Estimation of a Large Dimensional Sparse Covariance Matrices - Karoui - 2007 |

31 | Sparse Estimation of Large Covariance Matrices via a Nested Lasso Penalty, Ann
- Levina, Rothman, et al.
- 2008
(Show Context)
Citation Context ...rrelated, and therefore one can improve on the sample covariance by taking advantage of the ordering (Wu and Pourahmadi, 2003; Huang et al., 2006; Furrer and Bengtsson, 2007; Bickel and Levina, 2008; =-=Levina et al., 2008-=-). Consistency and convergence rates have been established for some of these estimators in the highdimensional setting under the normal assumption; see more on this in Section 2. These methods are non... |

29 | Robustness of community structure in networks - Karrer, Levina, et al. - 2008 |

27 | Between-groups comparison of principal components - Krzanowski - 1979 |

26 | Proximity graphs for clustering and manifold learning - Carreira-Perpiñán, Zemel - 2004 |

25 | Estimation of a covariance matrix under Stein’s loss - Dey, Srinivasan - 1985 |

25 | Biplot display of multivariate matrices for inspection of data and diagnosis - Gabriel - 1981 |

23 | Empirical bayes estimation of the multivariate normal covariance matrix - Haff - 1980 |

18 | An introduction to the bootstrap - B, Tibshirani - 1993 |

4 |
Sparse principal components analysis. Unpublished Manuscript
- Johnstone, Lu
- 2004
(Show Context)
Citation Context ...tion matrix. Advances in random matrix theory – from the classical results of Marčenko and Pastur (1967) to the recent work on the theory of the largest eigenvalues and eigenvectors (Johnstone, 2001; =-=Johnstone and Lu, 2004-=-; Paul, 2007; El Karoui, 2007b), and many others – allowed in-depth theoretical studies of the traditional estimator, the sample (empirical) ∗ Revised for the Journal of Computational and Graphical St... |

1 | Topics in High Dimensional Inference with Applications to Raman Spectroscopy - Wagaman - 2008 |