## Analysis of incomplete climate data: Estimation of mean values and covariance matrices and imputation of missing values (2001)

Citations: | 54 - 3 self |

### BibTeX

@MISC{Schneider01analysisof,

author = {Tapio Schneider},

title = {Analysis of incomplete climate data: Estimation of mean values and covariance matrices and imputation of missing values},

year = {2001}

}

### OpenURL

### Abstract

Estimating the mean and the covariance matrix of an incomplete dataset and filling in missing values with imputed values is generally a nonlinear problem, which must be solved iteratively. The expectation maximization (EM) algorithm for Gaussian data, an iterative method both for the estimation of mean values and covariance matrices from incomplete datasets and for the imputation of missing values, is taken as the point of departure for the development of a regularized EM algorithm. In contrast to the conventional EM algorithm, the regularized EM algorithm is applicable to sets of climate data, in which the number of variables typically exceeds the sample size. The regularized EM algorithm is based on iterated analyses of linear regressions of variables with missing values on variables with available values, with regression coefficients estimated by ridge regression, a regularized regression method in which a continuous regularization parameter controls the filtering of the noise in the data. The regularization parameter is determined by generalized cross-validation, such as to minimize, approximately, the expected mean squared error of the imputed values. The regularized EM algorithm can estimate, and exploit for the imputation of missing values, both synchronic and diachronic covariance matrices, which may contain information on spatial covariability, stationary temporal covariability, or cyclostationary temporal covariability. A test of the regularized EM algorithm with simulated surface temperature data demonstrates that the algorithm is applicable to typical sets of climate data and that it leads to more accurate estimates of the missing values than a conventional non-iterative imputation technique.

### Citations

4660 | Matrix analysis - Horn, Johnson - 1986 |

1120 |
Statistical analysis with missing data
- Little, Rubin
- 1987
(Show Context)
Citation Context ...eviations of the missing values from the imputed values, or the expected variances and covariances of the imputation error, must be taken into account in estimating the covariance matrix of the data (=-=Little and Rubin 1987-=-, chapter 3.4). Since estimates of the mean and of the covariance matrix of an incomplete dataset depend on the unknown missing values, and since, conversely, estimates of the missing values depend on... |

859 |
Mvltivariote Analysis
- Mardia, Kent, et al.
- 1979
(Show Context)
Citation Context ...e missing. 3 Given the partitioned estimate of the covariance matrix ˆ� (t) , the conditional maximum likelihood estimate of the regression coefficients can be written as � �1 Bˆ �ˆ �ˆ aa am (2) (cf. =-=Mardia et al. 1979-=-, chapter 6.2). From the structure of the regression model (1) follows that, given an estimate ˆB of the regression coefficients and the partitioned estimate of the covariance matrix ˆ� (t) , an estim... |

822 |
Solution of Ill-posed Problems
- Tikhonov, Arsenin
- 1977
(Show Context)
Citation Context ...basis, but regularized regression parameters are computed with a method known to statisticians as ridge regression and to applied mathematicians as Tikhonov regularization (Hoerl and Kennard 1970a,b; =-=Tikhonov and Arsenin 1977-=-). In ridge regression, a continuous regularization parameter controls the degree of regularization imposed on the regression coefficients. High-frequency or small-scale components in the regression c... |

494 | Ridge Regression: Biased Estimation for Nonorthogonal Problems Technometrics 12 - Hoerl, Kennard - 1970 |

384 |
Maximum Likelihood Estimation from Incomplete Data via the EM Algorithm
- Dempster, Laird, et al.
- 1977
(Show Context)
Citation Context ...at follows, an iterative method both for the estimation of mean values and covariance matrices and for the imputation of missing values will be presented. The expectation maximization (EM) algorithm (=-=Dempster et al. 1977-=-) is taken as the point of departure for the development of a regularized EM algorithm that is applicable to incomplete sets of climate data, in which the number of variables typically exceeds the num... |

273 | Generalized cross-validation as a method for choosing a good ridge parameter - Golub, Heath, et al. - 1979 |

259 |
DB: Inference and missing data
- Rubin
- 1976
(Show Context)
Citation Context ...s in the dataset, rests on the assumption that the missing values in the dataset are missing at random, in the sense that the probability that a value is missing does not depend on the missing value (=-=Rubin 1976-=-). For example, in a dataset with monthly mean surface temperatures on a spatial grid, the missing values are missing at random if correlations between anthropogenic temperature changes and the availa... |

231 | The total least squares problem: Computational aspects and analysis - Huffel, Vanderwalle - 1991 |

192 | Regularization tools: a Matlab package for analysis and solution of discrete illposed problems
- Hansen
- 1994
(Show Context)
Citation Context ...tion (18) and the effective number of degrees of freedom T(h) from the filter factors (19) requires O(r) operations, where r is the number of nonzero eigenvalues of the correlation matrix ˆ�� aa (cf. =-=Hansen 1994-=-, 1997, chapter 4.6). That is, if the ridge regression is computed via an eigendecomposition of the correlation matrix ˆ�� aa, only a small additional effort is required to find, with one of the commo... |

173 | Inverse Problem Theory: Methods for Data Fitting and Model Parameter Estimation - Tarantola - 1987 |

157 |
Global-scale temperature patterns and climate forcing over the past six centuries: Nature, v
- Mann, Bradley, et al.
- 1998
(Show Context)
Citation Context ...ch method. Ill-posed problems in climate research are often regularized by performing multivariate analyses in a truncated principal component basis (see, e.g., Smith et al. 1996; Kaplan et al. 1997; =-=Mann et al. 1998-=-). If a problem is regularized by truncating a principal component analysis, high-frequency or small-scale components of the solution, represented by higher-order principal components, are filtered ou... |

155 | Transient responses of a coupled ocean atmosphere model to gradual changes of atmospheric CO 2. 1. Annual mean response - Manabe, RJ, et al. - 1991 |

111 | Analyses of global sea surface temperature 1856-1991 - Kaplan, Cane, et al. - 1998 |

103 | Hemispheric surface air temperature variations: a reanalysis and an update to - Jones - 1994 |

62 |
Reconstruction of historical sea surface temperatures using empirical orthogonal functions
- Smith, Reynolds, et al.
- 1996
(Show Context)
Citation Context ...t data must be regularized with some such method. Ill-posed problems in climate research are often regularized by performing multivariate analyses in a truncated principal component basis (see, e.g., =-=Smith et al. 1996-=-; Kaplan et al. 1997; Mann et al. 1998). If a problem is regularized by truncating a principal component analysis, high-frequency or small-scale components of the solution, represented by higher-order... |

60 |
Practical approximate solutions to linear operator equations when the data are noisy
- Wahba
- 1977
(Show Context)
Citation Context ...Fourier coefficients, in analogy to inverse problems in which the counterpart of the matrix X˜ a is a convolution operator whose singular value decomposition is equivalent to a Fourier expansion (cf. =-=Wahba 1977-=-; Anderssen and Prenter 1981). The representation (16) of the regression coefficients shows that, in the standard form, the columns of the regression coefficient matrix ˆB� h are linear combinations o... |

56 | Tikhonov regularization and total least squares - Golub, Hansen, et al. - 1999 |

39 | P.: Regularization by truncated total least squares - Fierro, Golub, et al. - 1997 |

35 | Marine surface temperature: Observed variations and data requirements, Climatic Change31 - Parker, Folland, et al. - 1995 |

33 | Interdecadal changes of surface temperature since the late nineteenth century - Parker, Jones, et al. - 1994 |

31 |
Reduced space optimal analysis for historical data sets: 135 years of Atlantic sea surface temperatures
- Kaplan, Kushnir, et al.
- 1997
(Show Context)
Citation Context ...larized with some such method. Ill-posed problems in climate research are often regularized by performing multivariate analyses in a truncated principal component basis (see, e.g., Smith et al. 1996; =-=Kaplan et al. 1997-=-; Mann et al. 1998). If a problem is regularized by truncating a principal component analysis, high-frequency or small-scale components of the solution, represented by higher-order principal component... |

24 |
A Method of Estimation of Missing Values in Multivariate Data Suitable for Use with an Electronic-Computer
- Buck
- 1960
(Show Context)
Citation Context ...e estimate of the covariance matrix of a dataset are available, the missing values in the dataset can be filled in with their conditional expectation values given the available values in the dataset (=-=Buck 1960-=-). Conversely, the mean and the covariance matrix can be estimated from a completed dataset with imputed values filled in for missing values, provided that an estimate of the covariance matrix of the ... |

22 | Transient response of a coupled ocean–atmosphere model to gradual changes of atmospheric CO 2. Part I: Annual mean response - Spelman, Bryan - 1991 |

19 |
Missing values in multivariate analysis
- Beale, Little
- 1975
(Show Context)
Citation Context ...e context.s1MARCH 2001 SCHNEIDER 857 ˆ� (t) (cf. Mardia et al. 1979, chapter 6.2). As a Schur complement of a positive definite matrix ˆ� (t) , the residrection in the case of a complete dataset (cf. =-=Beale and Little 1975-=-). Thus, the new estimate (6) of the covariual covariance matrix Ĉ is assured to be positive definite ance matrix is computed in the same way as the sample (Horn and John 1985, p. 472). covariance mat... |

12 | Global mean surface air temperature and North Atlantic overturning in a suite of coupled GCM climate change experiments. Geophys - Dixon, Lanzante - 1999 |

8 | Behavior near zero of the distribution of GCV smoothing parameter estimates for splines
- Wahba, Wang
- 1993
(Show Context)
Citation Context ...has a minimum near zero, generalized cross-validation occasionally leads to a regularization parameter near zero when, in fact, a greater regularization parameter would be more appropriate (Wahba and =-=Wang 1995-=-). Choosing too small a regularization parameter in such cases can be avoided by constructing a lower bound for the regularization parameter from a priori guesses of the magnitude of the imputation er... |

6 | Matrix Computations. 2d ed - Golub, Loan - 1989 |

2 |
A formal comparison of methods proposed for the numerical solution of first kind integral equations
- Anderssen, Prenter
- 1981
(Show Context)
Citation Context ...ficients, in analogy to inverse problems in which the counterpart of the matrix X˜ a is a convolution operator whose singular value decomposition is equivalent to a Fourier expansion (cf. Wahba 1977; =-=Anderssen and Prenter 1981-=-). The representation (16) of the regression coefficients shows that, in the standard form, the columns of the regression coefficient matrix ˆB� h are linear combinations of the eigenvectors V :j of t... |

2 |
Uncertainty in the solution of linear operator equations
- Linz
- 1984
(Show Context)
Citation Context ...nce that complicates the use of an otherwise optimal method: there are no general, problem-independent criteria according to which the optimality of a method for illposed problems can be established (=-=Linz 1984-=-). Hence, any claim that the regularized EM algorithm or any other technique for the imputation of missing values in climate data is ‘‘optimal’’ in some general sense would be unjustified. The perform... |

2 | Latent root regression analysis, Technometrics 16 - Webster, Gunst, et al. - 1974 |

1 |
cited 1999: Analysis of incomplete climate data: Matlab code. [Available online at http://www.aos.princeton.edu/WWWPUBLIC/tapio/imputation
- Schneider
- 1999
(Show Context)
Citation Context ...complete set of simulated surface temperature data from which values had 7 Details of the implementation of the regularized EM algorithm can be taken from the program code, which is available online (=-=Schneider 1999-=-). been deleted to obtain the incomplete set of test data. The normalization constant M is the total number of c missing values in a dataset, and � j is the standard deviation of the jth variable of t... |