Results 1 - 10
of
20
Analyzing Incomplete Political Science Data: An Alternative Algorithm for Multiple Imputation
- American Political Science Review
, 2000
"... We propose a remedy for the discrepancy between the way political scientists analyze data with missing values and the recommendations of the statistics community. Methodologists and statisticians agree that "multiple imputation" is a superior approach to the problem of missing data scattered through ..."
Abstract
-
Cited by 88 (35 self)
- Add to MetaCart
We propose a remedy for the discrepancy between the way political scientists analyze data with missing values and the recommendations of the statistics community. Methodologists and statisticians agree that "multiple imputation" is a superior approach to the problem of missing data scattered through one's explanatory and dependent variables than the methods currently used in applied data analysis. The reason for this discrepancy lies with the fact that the computational algorithms used to apply the best multiple imputation models have been slow, difficult to implement, impossible to run with existing commercial statistical packages, and demanding of considerable expertise. In this paper, we adapt an existing algorithm, and use it to implement a generalpurpose, multiple imputation model for missing data. This algorithm is considerably faster and easier to use than the leading method recommended in the statistics literature. We also quantify the risks of current missing data practices, ...
Not Asked Or Not Answered: Multiple Imputation for Multiple Surveys
- Journal of the American Statistical Association
, 1998
"... We present a method of analyzing a series of independent cross-sectional surveys in which some questions are not answered in some surveys and some respondents do not answer some of the questions posed. The method is also applicable to a single survey in which different questions are asked, or differ ..."
Abstract
-
Cited by 16 (7 self)
- Add to MetaCart
We present a method of analyzing a series of independent cross-sectional surveys in which some questions are not answered in some surveys and some respondents do not answer some of the questions posed. The method is also applicable to a single survey in which different questions are asked, or different sampling methods used, in different strata or clusters. Our method involves multiply-imputing the missing items and questions by adding to existing methods of imputation designed for single surveys a hierarchical regression model that allows covariates at the individual and survey levels. Information from survey weights is exploited by including in the analysis the variables on which the weights were based, and then reweighting individual responses (observed and imputed) to estimate population quantities. We also develop diagnostics for checking the fit of the imputation model based on comparing imputed to nonimputed data. We illustrate with the example that motivated this project --- a ...
What to do about missing values in time series cross-section data
, 2009
"... Applications of modern methods for analyzing data with missing values, based primarily on multiple imputation, have in the last half-decade become common in American politics and political behavior. Scholars in this subset of political science have thus increasingly avoided the biases and inefficien ..."
Abstract
-
Cited by 8 (4 self)
- Add to MetaCart
Applications of modern methods for analyzing data with missing values, based primarily on multiple imputation, have in the last half-decade become common in American politics and political behavior. Scholars in this subset of political science have thus increasingly avoided the biases and inefficiencies caused by ad hoc methods like listwise deletion and best guess imputation. However, researchers in much of comparative politics and international relations, and others with similar data, have been unable to do the same because the best available imputation methods work poorly with the time-series cross section data structures common in these fields. Weattempttorectify this situation with three related developments. First, we build a multiple imputation model that allows smooth time trends, shifts across cross-sectional units, and correlations over time and space, resulting in far more accurate imputations. Second, we enable analysts to incorporate knowledge from area studies experts via priors on individual missing cell values, rather than on difficult-to-interpret model parameters. Third, because these tasks could not be accomplished within existing imputation algorithms, in that they cannot handle as many variables as needed even in the simpler cross-sectional data for which they were designed, we also develop a new algorithm that substantially expands the range of computationally feasible data types and sizes for which multiple imputation can be used. These developments also make it possible to implement the methods introduced here in freely available open source software that is considerably more reliable than existing algorithms. We develop an approach to analyzing data with
Maximum Consistency of Incomplete Data Via Non-invasive Imputation
, 2002
"... We present an algorithm to impute missing values from given data alone, and analyse its performance. The proposed procedure is based on non-numeric rule based data analysis, and aims to maximise consistency of imputation from known values. In contrast to the prevailing statistical imputation algorit ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
We present an algorithm to impute missing values from given data alone, and analyse its performance. The proposed procedure is based on non-numeric rule based data analysis, and aims to maximise consistency of imputation from known values. In contrast to the prevailing statistical imputation algorithms, it does not make representational assumptions or presupposes other model constraints. Therefore, it is suitable for a wide variety of data-sets, and can be used as a pre-processing step before resorting to harder numerical methods.
Multiple imputation for model checking: Completed-data plots with missing and latent data
- Biometrics
, 2005
"... Summary. In problems with missing or latent data, a standard approach is to first impute the unobserved data, then perform all statistical analyses on the completed dataset—corresponding to the observed data and imputed unobserved data—using standard procedures for complete-data inference. Here, we ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
Summary. In problems with missing or latent data, a standard approach is to first impute the unobserved data, then perform all statistical analyses on the completed dataset—corresponding to the observed data and imputed unobserved data—using standard procedures for complete-data inference. Here, we extend this approach to model checking by demonstrating the advantages of the use of completed-data model diagnostics on imputed completed datasets. The approach is set in the theoretical framework of Bayesian posterior predictive checks (but, as with missing-data imputation, our methods of missing-data model checking can also be interpreted as “predictive inference ” in a non-Bayesian context). We consider the graphical diagnostics within this framework. Advantages of the completed-data approach include: (1) One can often check model fit in terms of quantities that are of key substantive interest in a natural way, which is not always possible using observed data alone. (2) In problems with missing data, checks may be devised that do not require to model the missingness or inclusion mechanism; the latter is useful for the analysis of ignorable but unknown data collection mechanisms, such as are often assumed in the analysis of sample surveys and observational studies. (3) In many problems with latent data, it is possible to check qualitative features of the model (for example, independence of two variables) that can be naturally formalized with the help of the latent data. We illustrate with several applied examples.
Listwise deletion is evil: What to do about missing data in political science
- Paper Presented at the Annual Meeting of the American Political Science Association
, 1998
"... We propose a remedy to the substantial discrepancy between the way political scientists analyze data with missing values and the recommendations of the statistics community. With a few notable exceptions, statisticians and methodologists have agreed on a widely applicable approach to many missing da ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
We propose a remedy to the substantial discrepancy between the way political scientists analyze data with missing values and the recommendations of the statistics community. With a few notable exceptions, statisticians and methodologists have agreed on a widely applicable approach to many missing data problems based on the concept of \multiple imputation, " but most researchers in our eld and other social sciences still use far inferior methods. Indeed, we demonstrate that the threats to validity from current missing data practices rival the biases from the much better known omitted variable problem. As it turns out, this discrepancy is not entirely our fault, as the computational algorithms used to apply the best multiple imputation models have been slow, di cult to implement, impossible to run with existing commercial statistical packages, and demanding of considerable expertise on the part of the user (even experts disagree on how to use them). In this paper, we adapt an existing algorithm, and use it to implement a generalpurpose, multiple imputation model for missing data. This algorithm is between 65 and
The Multiple Adaptations of Multiple Imputation
"... Multiple imputation was first conceived as a tool that statistical agencies could use to handle nonresponse in large sample, public use surveys. In the last two decades, the multiple imputation framework has been adapted for other statistical contexts. As examples, individual researchers use multipl ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
Multiple imputation was first conceived as a tool that statistical agencies could use to handle nonresponse in large sample, public use surveys. In the last two decades, the multiple imputation framework has been adapted for other statistical contexts. As examples, individual researchers use multiple imputation to handle missing data in small samples; statistical agencies disseminate multiply-imputed datasets for purposes of protecting data confidentiality; and, survey methodologists and epidemiologists use multiple imputation to correct for measurement errors. In some of these settings, Rubin’s original rules for combining the point and variance estimates from the multiply-imputed datasets are not appropriate, because what is known—and therefore in the conditional expectations and variances used to derive inferential methods—differs from the missing data context. methods of inference. These applications require new combining rules and In fact, more than ten combining rules exist in the
Multiple Imputation for Multivariate Data with Missing and Below-Threshhold Measurements: Time-Series Concentrations of Pollutants in the Arctic
- Biometrics
, 2001
"... Many chemical and environmental data sets are complicated by the existence of fully missing values or censored values known to lie below detection thresholds. For example, week-long samples of airborne particulate matter were obtained at Alert, N. W. T. Canada between 1980 and 1991, where some of th ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Many chemical and environmental data sets are complicated by the existence of fully missing values or censored values known to lie below detection thresholds. For example, week-long samples of airborne particulate matter were obtained at Alert, N. W. T. Canada between 1980 and 1991, where some of the concentrations of 24 particulate constituents were coarsened in the sense of being either fully missing or below detection limits. To facilitate scientific analysis, it is appealing to create complete data by filling in missing values so that standard complete-data methods can be applied. We briefly review commonly used strategies for handling missing values and focus on the multiple imputation approach, which generally leads to valid inferences when faced with missing data. Three statistical models are developed for multiply-imputing the missing values of airborne particulate matter. We expect that these models are useful for creating multiple imputations in a variety of incomplete multiv...
Releasing Multiply Imputed, Synthetic Public-Use Microdata: An Illustration and Empirical
- Study, Journal of the Royal Statistical Society, A
, 2004
"... Summary. The paper presents an illustration and empirical study of releasing multiply imputed, fully synthetic public use microdata. Simulations based on data from the US Current Population Survey are used to evaluate the potential validity of inferences based on fully synthetic data for a variety o ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Summary. The paper presents an illustration and empirical study of releasing multiply imputed, fully synthetic public use microdata. Simulations based on data from the US Current Population Survey are used to evaluate the potential validity of inferences based on fully synthetic data for a variety of descriptive and analytic estimands, to assess the degree of protection of confidentiality that is afforded by fully synthetic data and to illustrate the specification of synthetic data imputation models. Benefits and limitations of releasing fully synthetic data sets are discussed.
Using Conditional Distributions for Missing-Data Imputation
- Statistical Science
, 2001
"... Introduction The authors discuss conditionally-specified models in probability theory and for modeling joint distributions in various applications. This theoretical structure is useful, considering that conditional models are becoming standard in many spatial applications, following Besag (1974). ( ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Introduction The authors discuss conditionally-specified models in probability theory and for modeling joint distributions in various applications. This theoretical structure is useful, considering that conditional models are becoming standard in many spatial applications, following Besag (1974). (Rather than attempting an exhaustive or ever representative list, we shall just refer to Besag and Higdon (1999) as a recent example with discussion.) In addition, there has been occasional discussion in the literature as to the relative merits of conditionally or jointly-specified models (for example, Besag, 1974, Haslett, 1985, and Ripley, 1988). Here, however, we would like to address a different topic: the use of conditional distributions, not to model an underlying joint distribution, but for the purpose of imputing missing data. At first this might seem like an unimportant distinction---after all, imputation requires modeling (if only implicitly). However, when the fraction of

