Results 1  10
of
13
Analyzing Incomplete Political Science Data: An Alternative Algorithm for Multiple Imputation
 American Political Science Review
, 2000
"... We propose a remedy for the discrepancy between the way political scientists analyze data with missing values and the recommendations of the statistics community. Methodologists and statisticians agree that "multiple imputation" is a superior approach to the problem of missing data scattered through ..."
Abstract

Cited by 141 (40 self)
 Add to MetaCart
We propose a remedy for the discrepancy between the way political scientists analyze data with missing values and the recommendations of the statistics community. Methodologists and statisticians agree that "multiple imputation" is a superior approach to the problem of missing data scattered through one's explanatory and dependent variables than the methods currently used in applied data analysis. The reason for this discrepancy lies with the fact that the computational algorithms used to apply the best multiple imputation models have been slow, difficult to implement, impossible to run with existing commercial statistical packages, and demanding of considerable expertise. In this paper, we adapt an existing algorithm, and use it to implement a generalpurpose, multiple imputation model for missing data. This algorithm is considerably faster and easier to use than the leading method recommended in the statistics literature. We also quantify the risks of current missing data practices, ...
What to do about missing values in time series crosssection data
, 2009
"... Applications of modern methods for analyzing data with missing values, based primarily on multiple imputation, have in the last halfdecade become common in American politics and political behavior. Scholars in this subset of political science have thus increasingly avoided the biases and inefficien ..."
Abstract

Cited by 14 (4 self)
 Add to MetaCart
Applications of modern methods for analyzing data with missing values, based primarily on multiple imputation, have in the last halfdecade become common in American politics and political behavior. Scholars in this subset of political science have thus increasingly avoided the biases and inefficiencies caused by ad hoc methods like listwise deletion and best guess imputation. However, researchers in much of comparative politics and international relations, and others with similar data, have been unable to do the same because the best available imputation methods work poorly with the timeseries cross section data structures common in these fields. Weattempttorectify this situation with three related developments. First, we build a multiple imputation model that allows smooth time trends, shifts across crosssectional units, and correlations over time and space, resulting in far more accurate imputations. Second, we enable analysts to incorporate knowledge from area studies experts via priors on individual missing cell values, rather than on difficulttointerpret model parameters. Third, because these tasks could not be accomplished within existing imputation algorithms, in that they cannot handle as many variables as needed even in the simpler crosssectional data for which they were designed, we also develop a new algorithm that substantially expands the range of computationally feasible data types and sizes for which multiple imputation can be used. These developments also make it possible to implement the methods introduced here in freely available open source software that is considerably more reliable than existing algorithms. We develop an approach to analyzing data with
Semiparametric efficiency in GMM models with auxiliary data
 Ann. Statist
, 2008
"... We study semiparametric efficiency bounds and efficient estimation of parameters defined through general moment restrictions with missing data. Identification relies on auxiliary data containing information about the distribution of the missing variables conditional on proxy variables that are obser ..."
Abstract

Cited by 12 (2 self)
 Add to MetaCart
We study semiparametric efficiency bounds and efficient estimation of parameters defined through general moment restrictions with missing data. Identification relies on auxiliary data containing information about the distribution of the missing variables conditional on proxy variables that are observed in both the primary and the auxiliary database, when such distribution is common to the two data sets. The auxiliary sample can be independent of the primary sample, or can be a subset of it. For both cases, we derive bounds when the probability of missing data given the proxy variables is unknown, or known, or belongs to a correctly specified parametric family. We find that the conditional probability is not ancillary when the two samples are independent. For all cases, we discuss efficient semiparametric estimators. An estimator based on a conditional expectation projection is shown to require milder regularity conditions than one based on inverse probability weighting. 1. Introduction. Many
The Multiple Adaptations of Multiple Imputation
"... Multiple imputation was first conceived as a tool that statistical agencies could use to handle nonresponse in large sample, public use surveys. In the last two decades, the multiple imputation framework has been adapted for other statistical contexts. As examples, individual researchers use multipl ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
Multiple imputation was first conceived as a tool that statistical agencies could use to handle nonresponse in large sample, public use surveys. In the last two decades, the multiple imputation framework has been adapted for other statistical contexts. As examples, individual researchers use multiple imputation to handle missing data in small samples; statistical agencies disseminate multiplyimputed datasets for purposes of protecting data confidentiality; and, survey methodologists and epidemiologists use multiple imputation to correct for measurement errors. In some of these settings, Rubin’s original rules for combining the point and variance estimates from the multiplyimputed datasets are not appropriate, because what is known—and therefore in the conditional expectations and variances used to derive inferential methods—differs from the missing data context. methods of inference. These applications require new combining rules and In fact, more than ten combining rules exist in the
Multiple Imputation for Incomplete Data With Semicontinuous Variables
 Journal of the American Statistical Association
, 2003
"... We consider the application of multiple imputation to data containing not only partially missing categorical and continuous variables, but also partially missing ‘semicontinuous ’ variables (variables that take on a single discrete value with positive probability but are otherwise continuously distr ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
We consider the application of multiple imputation to data containing not only partially missing categorical and continuous variables, but also partially missing ‘semicontinuous ’ variables (variables that take on a single discrete value with positive probability but are otherwise continuously distributed). As an imputation model for data sets of this type, we introduce an extension of the standard general location model proposed by Olkin and Tate; our extension, the blocked general location model, provides a robust and general strategy for handling partially observed semicontinuous variables. In particular, we incorporate a twolevel model for the semicontinuous variables into the general location model. The � rst level models the probability that the semicontinuous variable takes on its point mass value, and the second level models the distribution of the variable given that it is not at its point mass. In addition, we introduce EM and data augmentation algorithms for the blocked general location model with missing data; these can be used to generate imputations under the proposed model and have been implemented in publicly available software. We illustrate our model and computational methods via a simulation study and an analysis of a survey of Massachusetts Megabucks Lottery winners.
Model Checking for Incomplete High Dimensional Categorical Data
, 1999
"... OF THE DISSERTATION Model Checking for Incomplete High Dimensional Categorical Data by MingYi Hu Doctor of Philosophy in Statistics University of California, Los Angeles, 1999 Professor Thomas R. Belin, Cochair Professor Robert I. Jennrich, Cochair Categorical data are often arranged in ..."
Abstract
 Add to MetaCart
OF THE DISSERTATION Model Checking for Incomplete High Dimensional Categorical Data by MingYi Hu Doctor of Philosophy in Statistics University of California, Los Angeles, 1999 Professor Thomas R. Belin, Cochair Professor Robert I. Jennrich, Cochair Categorical data are often arranged in a contingency table and summarized by a loglinear model. A standard approach for comparing two competing models is to calculate twice the discrepancy between maximized loglikelihoods, which follows a 2 distribution asymptotically. But when data are sparse, the 2 approximation may be questionable. xii As an alternative to a largesample approximation to the reference distribution, we implement the framework introduced by Rubin (1984) for finding the posterior predictive check (PPC) distribution. The PPC distribution represents the conditional probability of a future value of a test statistic based on the information given by observed data along with model specifications, which can se...
Translated by Kimon FriarContents
"... Discipline is the highest of all virtues. Only so may strength and desire be counterbalanced and the endeavors of man bear fruit. N. KAZANTZAKIS, ..."
Abstract
 Add to MetaCart
Discipline is the highest of all virtues. Only so may strength and desire be counterbalanced and the endeavors of man bear fruit. N. KAZANTZAKIS,
binary regression models
, 2009
"... iterative adjustment of responses for the reduction of bias in ..."