## A path following algorithm for Sparse Pseudo-Likelihood Inverse Covariance Estimation (SPLICE) (2008)

### Cached

### Download Links

Citations: | 11 - 0 self |

### BibTeX

@MISC{Rocha08apath,

author = {Guilherme V. Rocha and Peng Zhao and Bin Yu},

title = {A path following algorithm for Sparse Pseudo-Likelihood Inverse Covariance Estimation (SPLICE)},

year = {2008}

}

### OpenURL

### Abstract

Given n observations of a p-dimensional random vector, the covariance matrix and its inverse (precision matrix) are needed in a wide range of applications. Sample covariance (e.g. its eigenstructure) can misbehave when p is comparable to the sample size n. Regularization is often used to mitigate the problem. In this paper, we proposed an ℓ1 penalized pseudo-likelihood estimate for the inverse covariance matrix. This estimate is sparse due to the ℓ1 penalty, and we term this method SPLICE. Its regularization path can be computed via an algorithm based on the homotopy/LARS-Lasso algorithm. Simulation studies are carried out for various inverse covariance structures for p = 15 and n = 20, 1000. We compare SPLICE with the ℓ1 penalized likelihood estimate and a ℓ1 penalized Cholesky decomposition based method. SPLICE gives the best overall performance in terms of three metrics on the precision matrix and ROC curve for model selection. Moreover, our simulation results demonstrate that the SPLICE estimates are positive-definite for most of the regularization path even though the restriction is not enforced.

### Citations

3666 |
Convex Optimization
- Boyd, Vandenberghe
- 2004
(Show Context)
Citation Context ...st squares linear regression. The computational advantage of the ℓ1penalization over penalization by the dimension of the parameter being fitted (ℓ0-norm) – such as in AIC – stems from its convexity (=-=Boyd and Vandenberghe, 2004-=-). Homotopy algorithms for tracing the entire LASSO regularization path have recently become available (Osborne et al., 2000; Efron et al., 2004). Given the high-dimensionality of modern days data set... |

1832 | Regression shrinkage and selection via the lasso
- Tibshirani
- 1994
(Show Context)
Citation Context ...ted for all subsets of the parameters in U. A more computationally tractable alternative for performing parameter selection consists in penalizing parameter estimates by their ℓ1-norm (Breiman, 1995; =-=Tibshirani, 1996-=-; Chen et al., 2001), popularly known as the LASSO in the context of least squares linear regression. The computational advantage of the ℓ1penalization over penalization by the dimension of the parame... |

1651 | Atomic decomposition by basis pursuit
- Chen, Donoho, et al.
- 1999
(Show Context)
Citation Context ...s of the parameters in U. A more computationally tractable alternative for performing parameter selection consists in penalizing parameter estimates by their ℓ1-norm (Breiman, 1995; Tibshirani, 1996; =-=Chen et al., 2001-=-), popularly known as the LASSO in the context of least squares linear regression. The computational advantage of the ℓ1penalization over penalization by the dimension of the parameter being fitted (ℓ... |

1235 |
Information theory and an extension of the maximum likelihood principle
- Akaike
- 1973
(Show Context)
Citation Context ...position C = U ′ DU with U upper-triangular with unit diagonal and D a diagonal matrix. The parameters U and D are then estimated through p linear regressions and Akaike’s Information Criterion (AIC, =-=Akaike, 1973-=-) is used to promote sparsity in U. A similar covariance selection method is presented in Bilmes (2000). More recently, Bickel and Levina (2008) have obtained conditions ensuring consistency in the op... |

1138 |
Spatial interaction and the statistical analysis of lattice systems
- Besag
- 1974
(Show Context)
Citation Context ...res problem where the observations associated to each regression are weighted according to their conditional variances. The loss function thus formed can be interpreted as a pseudo neg-loglikelihood (=-=Besag, 1974-=-) in the Gaussian case. To this pseudo-negloglikelihood minimization, we add symmetry constraints and a weighted version of the ℓ1-penalty on off-diagonal terms to promote sparsity. The SPLICE estimat... |

382 | High-dimensional graphs and variable selection with the lasso - Meinshausen, Bühlmann |

362 | Regularization and variable selection via the elastic net - Zou, Hastie - 2005 |

323 |
Estimating the dimension of a model
- Schwartz
- 1978
(Show Context)
Citation Context ...zation. We compare the use of Akaike’s Information criterion (AIC, Akaike, 1973), a small-sample corrected version of the AIC (AICc, Hurvich et al., 1998) and the Bayesian Information Criterion (BIC, =-=Schwartz, 1978-=-) for selecting the proper amount of regularization. 4We use simulations to compare SPLICE estimates to the ℓ1-penalized maximum likelihood estimates (Banerjee et al., 2005, 2007; Yuan and Lin, 2007;... |

187 | Covariance selection - Dempster - 1972 |

167 | Pathwise coordinate optimization - Friedman, Hastie, et al. - 2007 |

155 | Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data - Banerjee, d’Aspremont - 2008 |

144 | On the LASSO and Its Dual - Osborne, Presnell, et al. - 2000 |

131 | Sparse principal component analysis
- Zou, Hastie, et al.
- 2006
(Show Context)
Citation Context ...l and Levina, 2004) and in Kalman filter ensembles (Furrer and Bengtsson, 2007). Regularization of the covariance matrix can also be achieved by regularizing its eigenvectors (Johnstone and Lu, 2004; =-=Zou et al., 2006-=-). Covariance selection methods for estimating covariance matrices consist of imposing sparsity on the precision matrix (i.e., the inverse of the covariance matrix). The Sparse Pseudo-Likelihood 2Inv... |

130 | A well-conditioned estimator for largedimensional covariance matrices - Ledoit, Wolf |

108 | Model selection and estimation in the Gaussian graphical model. Biometrika 94
- Yuan, Lin
- 2007
(Show Context)
Citation Context ...BIC, Schwartz, 1978) for selecting the proper amount of regularization. 4We use simulations to compare SPLICE estimates to the ℓ1-penalized maximum likelihood estimates (Banerjee et al., 2005, 2007; =-=Yuan and Lin, 2007-=-; Friedman et al., 2008) and to the ℓ1-penalized Cholesky approach in Huang et al. (2006). We have simulated both small and large sample data sets. Our simulations include model structures commonly us... |

103 |
Better subset regression using the nonnegative garrote
- Breiman
- 1995
(Show Context)
Citation Context ...imates be computed for all subsets of the parameters in U. A more computationally tractable alternative for performing parameter selection consists in penalizing parameter estimates by their ℓ1-norm (=-=Breiman, 1995-=-; Tibshirani, 1996; Chen et al., 2001), popularly known as the LASSO in the context of least squares linear regression. The computational advantage of the ℓ1penalization over penalization by the dimen... |

89 | Regularized estimation of large covariance matrics - Bickel, Levina - 2008 |

88 |
Smoothing Parameter Selection in Nonparametric Regression using an Improved Akaike Information Criterion
- Hurvich, Simonoff, et al.
- 1998
(Show Context)
Citation Context ..., we use information criteria to select a proper amount of regularization. We compare the use of Akaike’s Information criterion (AIC, Akaike, 1973), a small-sample corrected version of the AIC (AICc, =-=Hurvich et al., 1998-=-) and the Bayesian Information Criterion (BIC, Schwartz, 1978) for selecting the proper amount of regularization. 4We use simulations to compare SPLICE estimates to the ℓ1-penalized maximum likelihoo... |

75 | Sparse permutation invariant covariance estimation - ROTHMAN, BICKEL, et al. - 2008 |

73 | Some theory for Fisher’s linear discriminant, ‘naive Bayes’, and some alternatives when there are many more variables than observations. Bernoulli 10 - Bickel, Levina - 2004 |

51 |
Covariance matrix selection and estimation via penalised normal likelihood
- Huang, Liu, et al.
- 2006
(Show Context)
Citation Context ...path can be not only computationally costly but also counterproductive to the quality of the estimates. We have compared SPLICE with ℓ1-penalized covariance estimates based on Cholesky decomposition (=-=Huang et al., 2006-=-) and the exact likelihood expression (Banerjee et al., 2005, 2007; Yuan and Lin, 2007; Friedman et al., 2008) for a variety of sparse precision matrix cases and in terms of four different metrics, na... |

41 | Nonconjugate bayesian estimation of covariance matrices in hierarchical models - Daniels, Kass - 1999 |

40 |
High dimensional covariance matrix estimation using a factor model
- Fan, Y, et al.
- 2007
(Show Context)
Citation Context ...999, 2001) propose alternative strategies using shrinkage toward diagonal and more general matrices. Factorial models have also been used as a strategy to regularize estimates of covariance matrices (=-=Fan et al., 2006-=-). Tapering the covariance matrix is frequently used in time series and spatial models and have been used recently to improve the performance of covariance matrix estimates used by classifiers based o... |

38 | Sparse Inverse Covariance Matrices - Bilmes, “Factor - 2000 |

38 | Sparse principal components analysis
- Johnstone, Lu
- 2004
(Show Context)
Citation Context ...riminant analysis (Bickel and Levina, 2004) and in Kalman filter ensembles (Furrer and Bengtsson, 2007). Regularization of the covariance matrix can also be achieved by regularizing its eigenvectors (=-=Johnstone and Lu, 2004-=-; Zou et al., 2006). Covariance selection methods for estimating covariance matrices consist of imposing sparsity on the precision matrix (i.e., the inverse of the covariance matrix). The Sparse Pseud... |

36 | Nonparametric Estimation of Large Covariance Matrices of Longitudinal Data - Wu, Pourahmadi - 2003 |

34 |
Pastur, Distribution of eigenvalues in certain sets of random matrices
- Marchenko, A
- 1967
(Show Context)
Citation Context ... the sample size n. As one example, it is a well-known fact that the eigenvalues and eigenvectors of an estimated covariance matrix are inconsistent when the ratio p n does not vanish asymptotically (=-=Marchenko and Pastur, 1967-=-; Paul et al., 2008). Data sets with a large number of observed variables p and small number of observations n are now a common occurrence in statistics. Modeling such data sets creates a need for reg... |

33 | Shrinkage estimators for covariance matrices - Daniels, Kass - 2001 |

21 | Estimation of a covariance matrix - Stein - 1975 |

19 |
Methods and Applications of Linear Models: Regression and the Analysis of Variance
- Hocking
- 1996
(Show Context)
Citation Context ... partition Σ to get: ⎛⎡ ⎜⎢ cov ⎝ ⎣ Xj XJ ∗ ⎤⎞ ⎥⎟ ⎦⎠ = ⎡ ⎢ ⎣ σj,j Σj,J ∗ ΣJ ∗ ,j ΣJ ∗ ,J ∗ ⎤ ⎥ ⎦ where J ∗ corresponds to the indices in XJ ∗, so σj,j is a scalar, Σj,J ∗ vector and ΣJ ∗ ,J ∗ instance =-=Hocking, 1996-=-) yield: is a p − 1 dimensional row is a (p − 1) × (p − 1) square matrix. Inverting this partitioned matrix (see, for ⎡ ⎢ ⎣ σj,j Σj,J ∗ ΣJ ∗ ,j ΣJ ∗ ,J ∗ ⎤ ⎥ ⎦ −1 = ⎡ ⎢ ⎣ 1 d 2 j M1 − 1 d2 βj j M −1 2... |

8 | pre-conditioning” for feature selection and regression in high-dimensional problems
- Paul, Bair, et al.
- 2008
(Show Context)
Citation Context ...xample, it is a well-known fact that the eigenvalues and eigenvectors of an estimated covariance matrix are inconsistent when the ratio p n does not vanish asymptotically (Marchenko and Pastur, 1967; =-=Paul et al., 2008-=-). Data sets with a large number of observed variables p and small number of observations n are now a common occurrence in statistics. Modeling such data sets creates a need for regularization procedu... |

7 | Sparse covariance selection via robust maximum likelihood estimation - Banerjee, dAspremont, et al. - 2006 |

6 | Asymptotics for lasso type estimators - Knight, Fu - 2000 |

2 |
Glasso: Graphical lasso for R
- Friedman, Hastie, et al.
- 2007
(Show Context)
Citation Context ... for selecting the proper amount of regularization. 4We use simulations to compare SPLICE estimates to the ℓ1-penalized maximum likelihood estimates (Banerjee et al., 2005, 2007; Yuan and Lin, 2007; =-=Friedman et al., 2008-=-) and to the ℓ1-penalized Cholesky approach in Huang et al. (2006). We have simulated both small and large sample data sets. Our simulations include model structures commonly used in the literature (r... |

1 | Least angle regression. The Annals of Statistics 35 - Hastie, Johnstone, et al. - 2004 |

1 | Biostatistics 9, 3, 432–441. Sparse inverse covariance estimation with the - lasso - 2007 |

1 | Lectures on the theory of estimation of many parameters (reprint - Stein - 1986 |

1 |
Sparse inverse covariance estimation with the
- Furrer, Bengtsson
- 2007
(Show Context)
Citation Context ...d have been used recently to improve the performance of covariance matrix estimates used by classifiers based on linear discriminant analysis (Bickel and Levina, 2004) and in Kalman filter ensembles (=-=Furrer and Bengtsson, 2007-=-). Regularization of the covariance matrix can also be achieved by regularizing its eigenvectors (Johnstone and Lu, 2004; Zou et al., 2006). Covariance selection methods for estimating covariance matr... |