Results 1 - 10
of
95
Analyzing Incomplete Political Science Data: An Alternative Algorithm for Multiple Imputation
- American Political Science Review
, 2000
"... We propose a remedy for the discrepancy between the way political scientists analyze data with missing values and the recommendations of the statistics community. Methodologists and statisticians agree that "multiple imputation" is a superior approach to the problem of missing data scattered through ..."
Abstract
-
Cited by 88 (35 self)
- Add to MetaCart
We propose a remedy for the discrepancy between the way political scientists analyze data with missing values and the recommendations of the statistics community. Methodologists and statisticians agree that "multiple imputation" is a superior approach to the problem of missing data scattered through one's explanatory and dependent variables than the methods currently used in applied data analysis. The reason for this discrepancy lies with the fact that the computational algorithms used to apply the best multiple imputation models have been slow, difficult to implement, impossible to run with existing commercial statistical packages, and demanding of considerable expertise. In this paper, we adapt an existing algorithm, and use it to implement a generalpurpose, multiple imputation model for missing data. This algorithm is considerably faster and easier to use than the leading method recommended in the statistics literature. We also quantify the risks of current missing data practices, ...
Logistic Regression in Rare Events Data
, 1999
"... We study rare events data, binary dependent variables with dozens to thousands of times fewer ones (events, such as wars, vetoes, cases of political activism, or epidemiological infections) than zeros (“nonevents”). In many literatures, these variables have proven difficult to explain and predict, a ..."
Abstract
-
Cited by 33 (4 self)
- Add to MetaCart
We study rare events data, binary dependent variables with dozens to thousands of times fewer ones (events, such as wars, vetoes, cases of political activism, or epidemiological infections) than zeros (“nonevents”). In many literatures, these variables have proven difficult to explain and predict, a problem that seems to have at least two sources. First, popular statistical procedures, such as logistic regression, can sharply underestimate the probability of rare events. We recommend corrections that outperform existing methods and change the estimates of absolute and relative risks by as much as some estimated effects reported in the literature. Second, commonly used data collection strategies are grossly inefficient for rare events data. The fear of collecting data with too few events has led to data collections with huge numbers of observations but relatively few, and poorly measured, explanatory variables, such as in international conflict data with more than a quarter-million dyads, only a few of which are at war. As it turns out, more efficient sampling designs exist for making valid inferences, such as sampling all available events (e.g., wars) and a tiny fraction of nonevents (peace). This enables scholars to save as much as 99 % of their (nonfixed) data collection costs or to collect much more meaningful explanatory
2001): “Clarify: Software for Interpreting and Presenting Statistical Results
- Journal of Statistical Software
"... and distribute this program provided that no charge is made and the copy is identical to the original. To request an exception, please contact Michael Tomz. Contents 1 ..."
Abstract
-
Cited by 30 (0 self)
- Add to MetaCart
and distribute this program provided that no charge is made and the copy is identical to the original. To request an exception, please contact Michael Tomz. Contents 1
The dangers of extreme counterfactuals
- Political Analysis
, 2006
"... We address the problem that occurs when inferences about counterfactuals—predictions, ‘‘what-if’ ’ questions, and causal effects—are attempted far from the available data. The danger of these extreme counterfactuals is that substantive conclusions drawn from statistical models that fit the data well ..."
Abstract
-
Cited by 11 (7 self)
- Add to MetaCart
We address the problem that occurs when inferences about counterfactuals—predictions, ‘‘what-if’ ’ questions, and causal effects—are attempted far from the available data. The danger of these extreme counterfactuals is that substantive conclusions drawn from statistical models that fit the data well turn out to be based largely on speculation hidden in convenient modeling assumptions that few would be willing to defend. Yet existing statistical strategies provide few reliable means of identifying extreme counterfactuals. We offer a proof that inferences farther from the data allow more model dependence and then develop easyto-apply methods to evaluate how model dependent our answers would be to specified counterfactuals. These methods require neither sensitivity testing over specified classes of models nor evaluating any specific modeling assumptions. If an analysis fails the simple tests we offer, then we know that substantive results are sensitive to at least some modeling choices that are not based on empirical evidence. Free software that accompanies this article implements all the methods developed. 1
Labor-market competition and individual preferences over immigration policy", NBER working paper n° 6946
, 1999
"... Abstract—This paper uses three years of individual-level data to analyze the determinants of individual preferences over immigration policy in the United States. We have two main empirical results. First, less-skilled workers are signi � cantly more likely to prefer limiting immigrant in � ows into ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
Abstract—This paper uses three years of individual-level data to analyze the determinants of individual preferences over immigration policy in the United States. We have two main empirical results. First, less-skilled workers are signi � cantly more likely to prefer limiting immigrant in � ows into the United States. Our � nding suggests that, over the time horizons that are relevant to individuals when evaluating immigration policy, individuals think that the U.S. economy absorbs immigrant in � ows at least partly by changing wages. Second, we � nd no evidence that the relationship between skills and immigration opinions is stronger in high-immigration communities. I.
Toward a Common Framework for Statistical Analysis and Development
- Journal of Computational and Graphical Statistics
, 2008
"... We develop a general ontology of statistical methods and use it to propose a common framework for statistical analysis and software development built on and within the R language, including R’s numerous existing packages. This framework offers a simple unified structure and syntax that can encompass ..."
Abstract
-
Cited by 10 (4 self)
- Add to MetaCart
We develop a general ontology of statistical methods and use it to propose a common framework for statistical analysis and software development built on and within the R language, including R’s numerous existing packages. This framework offers a simple unified structure and syntax that can encompass a large fraction of existing statistical procedures. We conjecture that it can be used to encompass and present simply a vast majority of existing statistical methods, without requiring changes in existing approaches, and regardless of the theory of inference on which they are based, notation with which they were developed, and programming syntax with which they have been implemented. This development enabled us, and should enable others, to design statistical software with a single, simple, and unified user interface that helps overcome the conflicting notation, syntax, jargon, and statistical methods existing across the methods subfields of numerous academic disciplines. The approach also enables one to build a graphical user interface that automatically includes any method encompassed within the framework. We hope that the result of this line of research will greatly reduce the time from the creation of a new statistical innovation to its widespread use by applied researchers whether or not they use or program in R.
When can history be our guide? The pitfalls of counterfactual inference
- International Studies Quarterly
, 2007
"... Inferences about counterfactuals are essential for prediction, answering ‘‘what if ’ ’ questions, and estimating causal effects. However, when the counterfactuals posed are too far from the data at hand, conclusions drawn from well-specified statistical analyses become based on speculation and conve ..."
Abstract
-
Cited by 8 (4 self)
- Add to MetaCart
Inferences about counterfactuals are essential for prediction, answering ‘‘what if ’ ’ questions, and estimating causal effects. However, when the counterfactuals posed are too far from the data at hand, conclusions drawn from well-specified statistical analyses become based on speculation and convenient but indefensible model assumptions rather than empirical evidence. Unfortunately, standard statistical approaches assume the veracity of the model rather than revealing the degree of model-dependence, so this problem can be hard to detect. We develop easy-to-apply methods to evaluate counterfactuals that do not require sensitivity testing over specified classes of models. If an analysis fails the tests we offer, then we know that substantive results are sensitive to at least some modeling choices that are not based on empirical evidence. We use these methods to evaluate the extensive scholarly literatures on the effects of changes in the degree of democracy in a country (on any dependent variable) and separate analyses of the effects of UN peacebuilding efforts. We find evidence that many scholars are inadvertently drawing conclusions based more on modeling hypotheses than on evidence in the data. For some research questions, history contains insufficient information to be our guide. Free software that accompanies this paper implements all our suggestions. Social science is about making inferencesFusing facts we know to learn about facts we do not know. Some inferential targets (the facts we do not know) are factual, which means that they exist even if we do not know them. In early 2003, Saddam Hussein was obviously either alive or dead, but the world did not know which it was
What to do about missing values in time series cross-section data
, 2009
"... Applications of modern methods for analyzing data with missing values, based primarily on multiple imputation, have in the last half-decade become common in American politics and political behavior. Scholars in this subset of political science have thus increasingly avoided the biases and inefficien ..."
Abstract
-
Cited by 8 (4 self)
- Add to MetaCart
Applications of modern methods for analyzing data with missing values, based primarily on multiple imputation, have in the last half-decade become common in American politics and political behavior. Scholars in this subset of political science have thus increasingly avoided the biases and inefficiencies caused by ad hoc methods like listwise deletion and best guess imputation. However, researchers in much of comparative politics and international relations, and others with similar data, have been unable to do the same because the best available imputation methods work poorly with the time-series cross section data structures common in these fields. Weattempttorectify this situation with three related developments. First, we build a multiple imputation model that allows smooth time trends, shifts across cross-sectional units, and correlations over time and space, resulting in far more accurate imputations. Second, we enable analysts to incorporate knowledge from area studies experts via priors on individual missing cell values, rather than on difficult-to-interpret model parameters. Third, because these tasks could not be accomplished within existing imputation algorithms, in that they cannot handle as many variables as needed even in the simpler cross-sectional data for which they were designed, we also develop a new algorithm that substantially expands the range of computationally feasible data types and sizes for which multiple imputation can be used. These developments also make it possible to implement the methods introduced here in freely available open source software that is considerably more reliable than existing algorithms. We develop an approach to analyzing data with
Listwise deletion is evil: What to do about missing data in political science
- Paper Presented at the Annual Meeting of the American Political Science Association
, 1998
"... We propose a remedy to the substantial discrepancy between the way political scientists analyze data with missing values and the recommendations of the statistics community. With a few notable exceptions, statisticians and methodologists have agreed on a widely applicable approach to many missing da ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
We propose a remedy to the substantial discrepancy between the way political scientists analyze data with missing values and the recommendations of the statistics community. With a few notable exceptions, statisticians and methodologists have agreed on a widely applicable approach to many missing data problems based on the concept of \multiple imputation, " but most researchers in our eld and other social sciences still use far inferior methods. Indeed, we demonstrate that the threats to validity from current missing data practices rival the biases from the much better known omitted variable problem. As it turns out, this discrepancy is not entirely our fault, as the computational algorithms used to apply the best multiple imputation models have been slow, di cult to implement, impossible to run with existing commercial statistical packages, and demanding of considerable expertise on the part of the user (even experts disagree on how to use them). In this paper, we adapt an existing algorithm, and use it to implement a generalpurpose, multiple imputation model for missing data. This algorithm is between 65 and
Estimating Risk and Rate Levels, Ratios, and Differences in Case-Control Studies
, 2001
"... Classic (or "cumulative") case-control sampling designs do not admit inferences about quantities of interest other than risk ratios, and then only by making the rare events assumption. Probabilities, risk differences, and other quantities cannot be computed without knowledge of the population incide ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
Classic (or "cumulative") case-control sampling designs do not admit inferences about quantities of interest other than risk ratios, and then only by making the rare events assumption. Probabilities, risk differences, and other quantities cannot be computed without knowledge of the population incidence fraction. Similarly, density (or "risk set") case-control sampling designs do not allow inferences about quantities other than the rate ratio. Rates, rate differences, cumulative rates, risks, and other quantities cannot be estimated unless auxiliary information about the underlying cohort such as the number of controls in each full risk set is available. Most scholars who have considered the issue recommend reporting more than just risk and rate ratios, but auxiliary population information needed to do this is not usually available. We address this problem by developing methods that allow valid inferences about all relevant quantities of interest from either type of case-control study when completely ignorant of or only partially knowledgeable about relevant auxiliary population information.

