## Covariance Estimation: The GLM and Regularization Perspectives

Citations: | 2 - 0 self |

### BibTeX

@MISC{Pourahmadi_covarianceestimation:,

author = {Mohsen Pourahmadi},

title = {Covariance Estimation: The GLM and Regularization Perspectives},

year = {}

}

### OpenURL

### Abstract

Finding an unconstrained and statistically interpretable reparameterization of a covariance matrix is still an open problem in statistics. Its solution is of central importance in covariance estimation, particularly in the recent high-dimensional data environment where enforcing the positive-definiteness constraint could be computationally expensive. We provide a survey of the progress made in modeling covariance matrices from the perspectives of generalized linear models (GLM) or parsimony and use of covariates in low dimensions, regularization (shrinkage, sparsity) for high-dimensional data, and the role of various matrix factorizations. A viable and emerging regressionbased setup which is suitable for both the GLM and the regularization approaches is to link a covariance matrix, its inverse or their factors to certain regression models and then solve the relevant (penalized) least squares problems. We point out several instances of this regression-based setup in the literature. A notable case is in the Gaussian graphical models where linear regressions with LASSO penalty are used to estimate the neighborhood of one node at a time (Meinshausen and Bühlmann, 2006). Some advantages

### Citations

2108 | A new approach to linear filtering and prediction problems - Kalman - 1960 |

865 | The Elements of - Hastie, Tibshirani, et al. - 2009 |

752 | Least angle regression
- Efron, Hastie, et al.
(Show Context)
Citation Context .... Of particular interests are LASSO’s abilities to select models and estimate parameters si28multaneously, and the recent improved computational algorithms for LASSO such as the homotopy/LARS–LASSO (=-=Efron et al. 2004-=-; Rocha et al. 2008), see Fan and Lv (2010, Sec. 3.5) for other improved algorithms. Some early attempts at inducing sparsity in the precision matrix are, Bilmes (2000), Smith and Kohn (2002), Wu and ... |

556 |
Longitudinal data analysis using generalized linear models
- Liang, Zeger
- 1986
(Show Context)
Citation Context ...d in terms of the original variables. It allows one to estimate D and R separately, which is important in situations where one component might be more important than the other (Lin and Perlman, 1985; =-=Liang and Zeger, 1986-=-; Barnard et al. 2000). Note that while the logarithm of the diagonal entries of D are unconstrained, the correlation matrix R must be positive-definite with the additional constraint that all its dia... |

495 |
Ridge regression: Biased estimation for nonorthogonal problems
- Hoerl, Kennard
- 1970
(Show Context)
Citation Context ...he sample covariance matrix (Lin and Perlman, 1985; Daniels and Kass, 1999; Ledoit and Wolf, 2004). However, when p > n, since the sample covariance matrix is singular, suitable ridge regularization (=-=Hoerl and Kennard, 1970-=-; Ledoit and Wolf) will lead to covariance estimators which are more accurate and well-conditioned (Bickel and Li, 2006; Warton, 2008). This class of estimators based on an optimal linear combination ... |

382 | High-dimensional graphs and variable selection with - Meinshausen, Buhlmann - 2006 |

263 | Estimation with quadratic loss - James, Stein - 1956 |

226 | Sparse inverse covariance estimation with the graphical lasso - Friedman, Hastie, et al. |

218 |
Variance Components
- Searle, Casella, et al.
- 1992
(Show Context)
Citation Context ...rtually all of classical multivariate statistics 1(Anderson, 2003), time series analysis (Box et al. 1994), spatial data analysis (Cressie, 1993), variance components and longitudinal data analysis (=-=Searle et al. 1992-=-; Diggle et al. 2002), and in the modern and rapidly growing area of statistical and machine learning dealing with massive and high-dimensional data (Hastie, Tibshirani and Friedman, 2009). More speci... |

155 | Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data
- Banerjee, d’Aspremont
- 2008
(Show Context)
Citation Context ...neighborhood of one node at a time, several direct sparse estimators of Σ−1 have been proposed using a penalized likelihood approach with a LASSO penaly on its off-diagonal terms (Yuan and Lin, 2007; =-=Banerjee et al. 2008-=-; d’Aspremont et al. 2008; Friedman et al. 2008; Rothman et al. 2008; Rocha et al. 2008; Peng et al. 2009). Friedman et al.’s (2008) graphical LASSO, which is the fastest available algorithm to date, ... |

130 |
A well-conditioned estimator for largedimensional covariance matrices
- Ledoit, Wolf
(Show Context)
Citation Context ...many improved estimators have been proposed by shrinking 2only the eigenvalues of S towards a central value (Haff, 1980, 1991; Lin and Perlman, 1985; Dey and Srinivasan, 1985; Yang and Berger, 1994; =-=Ledoit and Wolf, 2004-=-). These have been derived from a decision-theoretic perspective or by specifying an appropriate prior for the covariance matrix. By now it is well-known that estimators like Stein’s (1956, 1975) that... |

124 |
Fitting Time Series Model to Nonstationary Processes
- Dahlhaus
- 1996
(Show Context)
Citation Context ... the subdiagonals of T . The idea of smoothing along its subdiagonals is motivated by the similarity of the regressions in (17) to the varying-coefficients autoregressions (Kitagawa and Gersch, 1985; =-=Dahlhaus, 1997-=-): m∑ fj,p(t/p)yt−j = σp(t/p)εt, t = 0, 1, 2, · · · , j=0 where f0,p(·) = 1, fj,p(·), 1 ≤ j ≤ m, and σp(·) are continuous functions on [0, 1] and {εt} is a sequence of independent random variables eac... |

109 | Transformation and Weighting in Regression - Carroll, Ruppert - 1988 |

108 | Model selection and estimation in the Gaussian graphical model. Biometrika 94
- Yuan, Lin
- 2007
(Show Context)
Citation Context ... multivariate linear regression (Warton, 2008; Witten and Tibshirani, 2009), portfolio selection (Ledoit et al. 2003) and Gaussian graphical models (Wong et al. 2003 ; Meinshausen and Bühlmann, 2006; =-=Yuan and Lin, 2007-=-). In these situations and others, the goal is to find alternative covariance estimators that are more accurate and well-conditioned than the sample covariance matrix. It was noted rather early by Ste... |

75 | Sparse permutation invariant covariance estimation
- ROTHMAN, BICKEL, et al.
- 2008
(Show Context)
Citation Context ...of Σ−1 have been proposed using a penalized likelihood approach with a LASSO penaly on its off-diagonal terms (Yuan and Lin, 2007; Banerjee et al. 2008; d’Aspremont et al. 2008; Friedman et al. 2008; =-=Rothman et al. 2008-=-; Rocha et al. 2008; Peng et al. 2009). Friedman et al.’s (2008) graphical LASSO, which is the fastest available algorithm to date, relies on the equivalence of the Banerjee et al. (2008) blockwise in... |

73 |
Some theory for Fisher’s linear discriminant, ‘naive Bayes’, and some alternatives when there are many more variables than observations. Bernoulli 10
- Bickel, Levina
- 2004
(Show Context)
Citation Context ...er among the variables in Y , and assume that variables far apart in the ordering are less correlated. For example, regularizing a covariance matrix by tapering (Furrer and Bengtsson, 2007), banding (=-=Bickel and Levina, 2004-=-; 2008a; Wu and Pourahmadi, 2009) and generally those based on the Cholesky decomposition of the covariance matrix or its inverse (Pourahmadi, 1999, 2000; Rothman et al. 2010) do impose an order among... |

73 | On a method of investigating periodicities in disturbed series, with special reference to wolfer’s sunspot numbers - Yule - 1927 |

68 | The method of path coefficients - Wright - 1934 |

62 | Covariance regularization by thresholding
- BICKEL, LEVINA
- 2008
(Show Context)
Citation Context ...hnstone and Lu (2009). Perhaps, the simplest, more direct and less computationally intensive ways of achieving sparsity are the techniques of banding and thresholding of the sample covariance matrix (=-=Bickel and Levina, 2008-=-a, b; Rothman, Levina and Zhu, 2009) which amounts to elementwise operations on S, and hence completely avoids the computationally expensive eigenvalue problem (Golub and Van Loan, 1989). In many appl... |

61 | Common Principal Components and Related Multivariate Models - Flury - 1988 |

53 | Modeling covariance matrices in terms of standard deviations and correlations with application to shrinkage - Bernard, McCulloch, et al. - 2000 |

52 | Ghaoui. First-order methods for sparse covariance selection - d’Aspremont, Banerjee, et al. |

51 |
Covariance matrix selection and estimation via penalised normal likelihood
- Huang, Liu, et al.
- 2006
(Show Context)
Citation Context ...gression coefficients, (ii) regression-based derivation (interpretation) of the modified Cholesky decomposition of a covariance matrix and its inverse (Pourahmadi, 1999; 2001, Sec. 3.5; Bilmes, 2000; =-=Huang et al. 2006-=-; Rothman, Levina and Zhu, 2010), (iii) the regression approach of Meinshausen and Bühlmann (2006), Rocha, Zhang and Yu (2008) and Peng, Wang, Zhou and Zhu (2009) merging all p regressions into a sing... |

49 |
Joint Mean-Covariance Models with Applications to Longitudinal Data
- Pourahmadi
- 1999
(Show Context)
Citation Context ...sions with LASSO penalty are used to estimate the neighborhood of one node at a time (Meinshausen and Bühlmann, 2006). Some advantages and a limitation of the regression-based Cholesky decomposition (=-=Pourahmadi, 1999-=-) relative to the classical spectral (eigenvalue) and variance-correlation decompositions are highlighted. It provides an unconstrained and statistically interpretable reparameterization, and guarante... |

47 | Estimation of high-dimensional prior and posterior covariance matrices in Kalman filter variants
- Furrer, Bengtsson
- 2007
(Show Context)
Citation Context ...l data which have a natural (time) order among the variables in Y , and assume that variables far apart in the ordering are less correlated. For example, regularizing a covariance matrix by tapering (=-=Furrer and Bengtsson, 2007-=-), banding (Bickel and Levina, 2004; 2008a; Wu and Pourahmadi, 2009) and generally those based on the Cholesky decomposition of the covariance matrix or its inverse (Pourahmadi, 1999, 2000; Rothman et... |

41 |
Nonconjugate bayesian estimation of covariance matrices in hierarchical models
- Daniels, Kass
- 1999
(Show Context)
Citation Context ...hat estimators like Stein’s (1956, 1975) that focus on shrinking the eigenvalues, have smaller estimation risks and are usually more accurate than the sample covariance matrix (Lin and Perlman, 1985; =-=Daniels and Kass, 1999-=-; Ledoit and Wolf, 2004). However, when p > n, since the sample covariance matrix is singular, suitable ridge regularization (Hoerl and Kennard, 1970; Ledoit and Wolf) will lead to covariance estimato... |

40 | High dimensional covariance matrix estimation using a factor model - Fan, Y, et al. - 2007 |

40 | Sparsistency and rates of convergence in large covariance matrix estimation - LAM, J - 2009 |

39 |
Asymptotically efficient estimation of covariance matrices with linear structure
- Anderson
- 1973
(Show Context)
Citation Context ...mulation, estimation and diagnostics, just like those for modeling the mean vector (McCullgh and Nelder, 1989). Attempts to develop such methods going beyond the traditional linear covariance models (=-=Anderson, 1973-=-), have been made in recent years by Chiu et al. (1996) and Pourahmadi (1999, 2000); Pan and MacKenzie (2003); Ye and Pan (2006); Wang and Lin (2008); Leng, Zhang and Pan (2010); Lin (2010) using the ... |

39 | Flexible Multivariate GARCH Modeling with an Application to international Stock Markets. The Review of Economics and Statistics
- Ledoit, Clara, et al.
- 2003
(Show Context)
Citation Context ...hen its inverse is needed as, for example, in the classification procedures (Anderson, 2003, Chap. 6), multivariate linear regression (Warton, 2008; Witten and Tibshirani, 2009), portfolio selection (=-=Ledoit et al. 2003-=-) and Gaussian graphical models (Wong et al. 2003 ; Meinshausen and Bühlmann, 2006; Yuan and Lin, 2007). In these situations and others, the goal is to find alternative covariance estimators that are ... |

39 |
Efficient Estimation of Covariance Selection Models
- Wong, Carter, et al.
- 2003
(Show Context)
Citation Context ...lassification procedures (Anderson, 2003, Chap. 6), multivariate linear regression (Warton, 2008; Witten and Tibshirani, 2009), portfolio selection (Ledoit et al. 2003) and Gaussian graphical models (=-=Wong et al. 2003-=- ; Meinshausen and Bühlmann, 2006; Yuan and Lin, 2007). In these situations and others, the goal is to find alternative covariance estimators that are more accurate and well-conditioned than the sampl... |

38 | Sparse Inverse Covariance Matrices
- Bilmes, “Factor
- 2000
(Show Context)
Citation Context ...aint on the regression coefficients, (ii) regression-based derivation (interpretation) of the modified Cholesky decomposition of a covariance matrix and its inverse (Pourahmadi, 1999; 2001, Sec. 3.5; =-=Bilmes, 2000-=-; Huang et al. 2006; Rothman, Levina and Zhu, 2010), (iii) the regression approach of Meinshausen and Bühlmann (2006), Rocha, Zhang and Yu (2008) and Peng, Wang, Zhou and Zhu (2009) merging all p regr... |

36 | Partial correlation estimation by joint sparse regression models - Peng, Wang, et al. - 2009 |

36 |
Foundations of Time Series Analysis and Prediction Theory
- Pourahmadi
- 2001
(Show Context)
Citation Context ...sent yet another unconstrained and statistically interpretable reparameterization of Σ, but now using the notion of partial autocorrelation function (PACF) from time series analysis (Box et al. 1994; =-=Pourahmadi, 2001-=-, Chap. 7). As expected, this approach just like the Cholesky decomposition requires an a priori order among the random variables in Y . It is motivated by and tries to mimic the phenomenal success of... |

36 |
Nonparametric Estimation of Large Covariance Matrices of Longitudinal Data
- Wu, Pourahmadi
- 2003
(Show Context)
Citation Context ... one could estimate the covariance matrix or its inverse using regression regularization tools like, covariance selection priors, AIC and LASSO penalties on the Cholesky factor (Smith and Kohn, 2002; =-=Wu and Pourahmadi, 2003-=-; Huang, Lin, Pourahmadi and Lin, 2006; Huang, Lin and Lin, 2007) and nested LASSO (Levina, Rothman and Zhu, 2007). The recent surge of interest in regression-based approaches to sparse estimation of ... |

35 |
On consistency and sparsity for principal components analysis in high dimensions (with discussion
- Johnstone, Lu
- 2009
(Show Context)
Citation Context .... They obtain explicit rate of convergence which 33depends on how fast k → ∞, see also Cai et al. (2010). The consistency in operator norm guarantees the consistency of principal component analysis (=-=Johnstone and Lu, 2009-=-) and other related methods in multivariate statistics when n is small and p is large. Cai et al. (2010) propose a tapering procedure for the covariance matrix estimation and derive the optimal rate o... |

33 | Shrinkage estimators for covariance matrices - Daniels, Kass - 2001 |

31 | Operator Norm Consistent Estimation of a Large Dimensional Sparse Covariance Matrices - Karoui - 2007 |

31 | Sparse Estimation of Large Covariance Matrices via a Nested Lasso Penalty, Ann - Levina, Rothman, et al. - 2008 |

28 |
Maximum likelihood fitting of ARMA models to time series with missing observations
- Jones
- 1980
(Show Context)
Citation Context ...principle covariates can be used to to write GLM for correlation matrices. Some preliminary results are given in Daniels and Pourahmadi (2009). The PACF also works for stationary covariance matrices (=-=Jones, 1980-=-), so long as an order is imposed on the variables. The problem is open for unordered variables. • Ordering The Variables: Variables in many application areas do not have a natural order as in time se... |

28 |
Linear recursive equations, covariance selection and path analysis
- Wermuth
- 1980
(Show Context)
Citation Context ...assume the existence of an a priori order among the p variables of interest. Some of the more explicit uses are in Kalman (1960) for filtering of state-space models and the Gaussian graphical models (=-=Wermuth, 1980-=-). For other uses of Cholesky decomposition in multivariate quality control and related areas see Pourahmadi (2007b). 172.3 GLM for Covariance Matrices Research on estimation of covariance matrices h... |

27 | Spectrum estimation for large dimensional covariance matrices using random matrix theory - Karoui |

26 | Optimal rates of convergence for covariance matrix estimation - CAI, ZHANG, et al. - 2010 |

26 | The matrix-logarithm covariance model - Chiu, Leonard, et al. - 1996 |

26 |
Estimation of a covariance matrix using the reference prior
- James, Yang
- 1993
(Show Context)
Citation Context ...d downward. Since then many improved estimators have been proposed by shrinking 2only the eigenvalues of S towards a central value (Haff, 1980, 1991; Lin and Perlman, 1985; Dey and Srinivasan, 1985; =-=Yang and Berger, 1994-=-; Ledoit and Wolf, 2004). These have been derived from a decision-theoretic perspective or by specifying an appropriate prior for the covariance matrix. By now it is well-known that estimators like St... |

25 | Estimation of a covariance matrix with zeros - Chaudhuri, Drton, et al. |

25 |
Estimation of a covariance matrix under Stein’s loss
- Dey, Srinivasan
- 1985
(Show Context)
Citation Context ...e eigenvalue will be biased downward. Since then many improved estimators have been proposed by shrinking 2only the eigenvalues of S towards a central value (Haff, 1980, 1991; Lin and Perlman, 1985; =-=Dey and Srinivasan, 1985-=-; Yang and Berger, 1994; Ledoit and Wolf, 2004). These have been derived from a decision-theoretic perspective or by specifying an appropriate prior for the covariance matrix. By now it is well-known ... |

24 |
A smoothness priors time-varying (AR) coefficient modeling of nonstationary covariance time series
- Kitagawa, Gersch
- 1985
(Show Context)
Citation Context ...nomial estimators to smooth the subdiagonals of T . The idea of smoothing along its subdiagonals is motivated by the similarity of the regressions in (17) to the varying-coefficients autoregressions (=-=Kitagawa and Gersch, 1985-=-; Dahlhaus, 1997): m∑ fj,p(t/p)yt−j = σp(t/p)εt, t = 0, 1, 2, · · · , j=0 where f0,p(·) = 1, fj,p(·), 1 ≤ j ≤ m, and σp(·) are continuous functions on [0, 1] and {εt} is a sequence of independent rand... |

24 |
Parsimonious Covariance Matrix Estimation for Longitudinal Data
- Smith, Kohn
- 2002
(Show Context)
Citation Context ...variant. Nevertheless, one could estimate the covariance matrix or its inverse using regression regularization tools like, covariance selection priors, AIC and LASSO penalties on the Cholesky factor (=-=Smith and Kohn, 2002-=-; Wu and Pourahmadi, 2003; Huang, Lin, Pourahmadi and Lin, 2006; Huang, Lin and Lin, 2007) and nested LASSO (Levina, Rothman and Zhu, 2007). The recent surge of interest in regression-based approaches... |

23 | A selective overview of variable selection in high dimensional feature space (invited review article - Fan, Lv |