Results 1  10
of
44
Analysis of incomplete climate data: Estimation of mean values and covariance matrices and imputation of missing values
, 2001
"... Estimating the mean and the covariance matrix of an incomplete dataset and filling in missing values with imputed values is generally a nonlinear problem, which must be solved iteratively. The expectation maximization (EM) algorithm for Gaussian data, an iterative method both for the estimation of m ..."
Abstract

Cited by 105 (4 self)
 Add to MetaCart
(Show Context)
Estimating the mean and the covariance matrix of an incomplete dataset and filling in missing values with imputed values is generally a nonlinear problem, which must be solved iteratively. The expectation maximization (EM) algorithm for Gaussian data, an iterative method both for the estimation of mean values and covariance matrices from incomplete datasets and for the imputation of missing values, is taken as the point of departure for the development of a regularized EM algorithm. In contrast to the conventional EM algorithm, the regularized EM algorithm is applicable to sets of climate data, in which the number of variables typically exceeds the sample size. The regularized EM algorithm is based on iterated analyses of linear regressions of variables with missing values on variables with available values, with regression coefficients estimated by ridge regression, a regularized regression method in which a continuous regularization parameter controls the filtering of the noise in the data. The regularization parameter is determined by generalized crossvalidation, such as to minimize, approximately, the expected mean squared error of the imputed values. The regularized EM algorithm can estimate, and exploit for the imputation of missing values, both synchronic and diachronic covariance matrices, which may contain information on spatial covariability, stationary temporal covariability, or cyclostationary temporal covariability. A test of the regularized EM algorithm with simulated surface temperature data demonstrates that the algorithm is applicable to typical sets of climate data and that it leads to more accurate estimates of the missing values than a conventional noniterative imputation technique.
Tutorial in Biostatistics: Multivariable prognostic models. Statistics in Medicine
, 1996
"... Multivariable regression models are powerful tools that are used frequently in studies of clinical outcomes. These models can use a mixture of categorical and continuous variables and can handle partially observed (censored) responses. However, uncritical application of modelling techniques can resu ..."
Abstract

Cited by 28 (0 self)
 Add to MetaCart
Multivariable regression models are powerful tools that are used frequently in studies of clinical outcomes. These models can use a mixture of categorical and continuous variables and can handle partially observed (censored) responses. However, uncritical application of modelling techniques can result in models that poorly fit the dataset at hand, or, even more likely, inaccurately predict outcomes on new subjects. One must know how to measure qualities of a model's fit in order to avoid poorly fitted or overfitted models. Measurement of predictive accuracy can be difficult for survival time data in the presence of censoring. We discuss an easily interpretable index of predictive discrimination as well as methods for assessing calibration of predicted survival probabilities. Both types of predictive accuracy should be unbiasedly validated using bootstrapping or crossvalidation, before using predictions in a new data series. We discuss some of the hazards of poorly fitted and overfitted regression models and present one modelling strategy that avoids many of the problems discussed. The methods described are applicable to all regression models, but are particularly needed for binary, ordinal, and timetoevent outcomes. Methods are illustrated with a survival analysis in prostate cancer using Cox regression. 1.
Missing data in multiple item scales: A Monte Carlo analysis of missing data techniques
 Organizational Research Methods
, 1999
"... Researchers in many fields use multiple item scales to measure important variables such as attitudes and personality traits, but find that some respondents failed to complete certain items. Past missing data research focuses on missing entire instruments, and is of limited help because there are f ..."
Abstract

Cited by 25 (0 self)
 Add to MetaCart
(Show Context)
Researchers in many fields use multiple item scales to measure important variables such as attitudes and personality traits, but find that some respondents failed to complete certain items. Past missing data research focuses on missing entire instruments, and is of limited help because there are few variables to help impute missing scores and the variables are often not highly related to each other. Multiple item scales offer the unique opportunity to impute missing values from other correlated items designed to measure the same construct. A Monte Carlo analysis was conducted to compare several missing data techniques. The techniques included listwise deletion, regression imputation, hotdeck imputation, and two forms of mean substitution. Results suggest that regression imputation and substituting the mean response of a person to other items on a scale are very promising approaches. Furthermore, the imputation techniques often outperformed listwise deletion. Surveys or questionnaires are used to measure many important constructs in psychology, management, and marketing. In Human Resource Management (HRM) and
Analyzing incomplete longitudinal clinical trial data
 Biostatistics
, 2004
"... Using standard missing data taxonomy, due to Rubin and coworkers, and simple algebraic derivations, it is argued that some simple but commonly used methods to handle incomplete longitudinal clinical trial data, such as complete case analyses and methods based on last observation carried forward, re ..."
Abstract

Cited by 20 (3 self)
 Add to MetaCart
(Show Context)
Using standard missing data taxonomy, due to Rubin and coworkers, and simple algebraic derivations, it is argued that some simple but commonly used methods to handle incomplete longitudinal clinical trial data, such as complete case analyses and methods based on last observation carried forward, require restrictive assumptions and stand on a weaker theoretical foundation than likelihoodbased methods developed under the missing at random (MAR) framework. Given the availability of flexible software for analyzing longitudinal sequences of unequal length, implementation of likelihoodbased MAR analyses is not limited by computational considerations. While such analyses are valid under the comparatively weak assumption of MAR, the possibility of data missing not at random (MNAR) is difficult to rule out. It is argued, however, that MNAR analyses are, themselves, surrounded with problems and therefore, rather than ignoring MNAR analyses altogether or blindly shifting to them, their optimal place is within sensitivity analysis. The concepts developed here are illustrated using data from three clinical trials, where it is shown that the analysis method may have an impact on the conclusions of the study.
Automatically detecting criminal identity deception: An adaptive detection algorithm
 IEEE Transactions on Systems, Man and Cybernetics (Part A
, 2005
"... Abstract—Identity deception, specifically identity concealment, is a serious problem encountered in the law enforcement and intelligence communities. In this paper, the authors discuss techniques that can automatically detect identity deception. Most of the existing techniques are experimental and c ..."
Abstract

Cited by 9 (2 self)
 Add to MetaCart
(Show Context)
Abstract—Identity deception, specifically identity concealment, is a serious problem encountered in the law enforcement and intelligence communities. In this paper, the authors discuss techniques that can automatically detect identity deception. Most of the existing techniques are experimental and cannot be easily applied to real applications because of problems such as missing values and large data size. The authors propose an adaptive detection algorithm that adapts well to incomplete identities with missing values and to large datasets containing millions of records. The authors describe three experiments to show that the algorithm is significantly more efficient than the existing record comparison algorithm with little loss in accuracy. It can identify deception having incomplete identities with high precision. In addition, it demonstrates excellent efficiency and scalability for large databases. A case study conducted in another law enforcement agency shows that the authors ’ algorithm is useful in detecting both intentional deception and unintentional data errors. Index Terms—Efficiency, identity deception, missing value, scalability. I.
Prediction of missing values in microarray and use of mixed models to evaluate the predictors
 STAT. APPL. GENET. MOL. BIOL
, 2005
"... ..."
Adjustments for rater effects in performance assessment
 Applied Psychological Measurement
, 1991
"... Alternative methods to correct for rater leniency/stringency effects (i.e., rater bias) in performance ratings were investigated. Rater bias effects are of concern when candidates are evaluated by different raters. The three correction methods evaluated were ordinary least squares (OLS), weighted l ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
(Show Context)
Alternative methods to correct for rater leniency/stringency effects (i.e., rater bias) in performance ratings were investigated. Rater bias effects are of concern when candidates are evaluated by different raters. The three correction methods evaluated were ordinary least squares (OLS), weighted least squares (WLS), and imputation of the missing data (IMPUTE). In addition, the usual procedure of averaging the observed ratings was investigated. Data were simulated from an essentially &tau;equivalent measurement model, with true scores and error scores normally distributed. The variables manipulated in the simulations were method of correction (OLS, WLS, IMPUTE, averaging the observed ratings), amount
A procedure for combining sample standardized mean differences and vote a nts to estimate the population standardired mean difference in fixed effis models
, 1996
"... Missing effectsize stimates pose a difficult problem in metaanalysis. Conventional procedures for dealing with this problem include discarding studies with missing estimates and imputing single values for missing estimates (e.g., 0, mean). An alternative procedure, which combines effectsize stim ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
Missing effectsize stimates pose a difficult problem in metaanalysis. Conventional procedures for dealing with this problem include discarding studies with missing estimates and imputing single values for missing estimates (e.g., 0, mean). An alternative procedure, which combines effectsize stimates and vote counts, is proposed for handling missing estimates. The combined estimator has several desirable features: (a) It uses all the information available from studies in a research synthesis, (b) it is consistent, (c) it is more efficient han other estimators, (d) it has known variance, and (e) it gives weight o all studies proportional to the Fisher information they provide. The combined procedure is the method of choice in a research synthesis when some studies do not provide enough information to compute ffectsize stimates but do provide information about he direction or statistical significance of results. Missing data is perhaps the largest problem facing the practicing metaanalyst. Frequently, meta
A novel framework for imputation of missing values in databases
 IEEE Transactions on Systems, Man, and Cybernetics Part A: Systems and Humans
, 2007
"... ..."
(Show Context)
Imputing Missing Values: The Effect on the Accuracy of Classification
"... Data from patient records were used to classify cardiac patients as to whether they are likely or unlikely to experience a subsequent morbid event after admission to a hospital. Both a linear discriminant function and a logistic regression equation were developed using a set of nine predictor variab ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
Data from patient records were used to classify cardiac patients as to whether they are likely or unlikely to experience a subsequent morbid event after admission to a hospital. Both a linear discriminant function and a logistic regression equation were developed using a set of nine predictor variables which were chosen on the basis of their correlations with the likelihood of a subsequent morbid event. Once the models were obtained, artificiallygenerated missing values were replaced with imputed values using mean substitution, regression imputation and hotdeck imputation techniques. The effect on the accuracy of the predictions using models with imputed values was determined by comparing the reclassifications using imputed data with the actual occurrence or nonoccurrence of a subsequent morbid event. Mean substitution and hotdeck imputation performed slightly better than regression imputation in this application regardless of whether or not the predictor variable whose values were being imputed was categorical or numerical. tatistical modeling techniques have been widely used for many years to predict a particular outcome using information from a group of variables which are related to the outcome of interest. That outcome could be a continuous variable such as