## Toward a Common Framework for Statistical Analysis and Development (2008)

### Cached

### Download Links

Venue: | Journal of Computational and Graphical Statistics |

Citations: | 19 - 7 self |

### BibTeX

@ARTICLE{Imai08towarda,

author = {Kosuke Imai and Gary King and Olivia Lau},

title = {Toward a Common Framework for Statistical Analysis and Development},

journal = {Journal of Computational and Graphical Statistics},

year = {2008},

pages = {1--22}

}

### OpenURL

### Abstract

We develop a general ontology of statistical methods and use it to propose a common framework for statistical analysis and software development built on and within the R language, including R’s numerous existing packages. This framework offers a simple unified structure and syntax that can encompass a large fraction of existing statistical procedures. We conjecture that it can be used to encompass and present simply a vast majority of existing statistical methods, without requiring changes in existing approaches, and regardless of the theory of inference on which they are based, notation with which they were developed, and programming syntax with which they have been implemented. This development enabled us, and should enable others, to design statistical software with a single, simple, and unified user interface that helps overcome the conflicting notation, syntax, jargon, and statistical methods existing across the methods subfields of numerous academic disciplines. The approach also enables one to build a graphical user interface that automatically includes any method encompassed within the framework. We hope that the result of this line of research will greatly reduce the time from the creation of a new statistical innovation to its widespread use by applied researchers whether or not they use or program in R.

### Citations

4897 |
Neural Networks for Pattern Recognition
- Bishop
- 1995
(Show Context)
Citation Context ...le imputation for missing data (Rubin 1987; King, Honaker, Joseph, and Scheve 2001; Honaker and King 2007), or outlier removal and feature detection to improve data quality or statistical robustness (=-=Bishop 1995-=-, chap. 8)—and then three commands are always performed: First, some statistical method, such as a likelihood or Bayesian model, is specified and fit. Second, we4 K. IMAI, G. KING, AND O. LAU identif... |

912 |
R: a language for data analysis and graphics
- Ihaka, Gentleman
- 1996
(Show Context)
Citation Context ...lt than it should be for pursuits that have so much underlying structure in common. Among the efforts to reduce the costs of spanning these diverse subfields, the R Project for Statistical Computing (=-=Ihaka and Gentleman, 1996-=-; R Development Core Team, 2007), and the S language on which it was based (Becker, Chambers and Wilks, 1988), stand as monumental developments. These projects solved so many problems for developers t... |

686 |
Multiple Imputation for Non-response in Surveys
- Rubin
- 1987
(Show Context)
Citation Context ... basic idea is that raw data goes in — perhaps after being preprocessed via matching methods for causal inference (Rubin, 1973; Ho, Imai, King and Stuart, 2007), multiple imputation for missing data (=-=Rubin, 1987-=-; King et al., 2001; Honaker and King, 2007), or outlier removal and feature detection to improve data quality or statistical robustness (Bishop, 1 This paper summarizes only the structural aspects of... |

374 | Core Team 2008: R: A language and environment for statistical computing - Development - 2009 |

245 | 2004): "Nonparametric Estimation of Average Treatment Effects under Exogeneity: A Review
- Imbens
(Show Context)
Citation Context ...en 0, while holding the other explanatory6 K. IMAI, G. KING, AND O. LAU variables constant (whether at their means, modes, or medians). Alternatively, compute the in-sample average treatment effect (=-=Imbens 2004-=-) by setting the explanatory variables at their observed values, and imputing only the unobserved counterfactual for each individual. Finally, use the model output from zelig() and the values for the ... |

181 | Making the Most of Statistical Analyses: Improving Interpretation and Presentation - King, Tomz, et al. - 2000 |

153 | Analyzing Incomplete Political Science Data: An Alternative Algorithm for Multiple Imputation.” American Political Science Review 95(1):49–69
- King, Honaker, et al.
- 2001
(Show Context)
Citation Context ...s that raw data goes in — perhaps after being preprocessed via matching methods for causal inference (Rubin, 1973; Ho, Imai, King and Stuart, 2007), multiple imputation for missing data (Rubin, 1987; =-=King et al., 2001-=-; Honaker and King, 2007), or outlier removal and feature detection to improve data quality or statistical robustness (Bishop, 1 This paper summarizes only the structural aspects of Zelig, rather than... |

100 | Matching as nonparametric preprocessing for reducing model dependence in parametric causal inference. Forthcoming in Political Analysis. Manuscript available at http://gking.harvard.edu/matchp.pdf - Ho, Imai, et al. - 2007 |

83 | The New S Language - Becker, Chambers, et al. - 1988 |

58 |
Unifying Political Methodology: The Likelihood Theory of Statistical Inference. Ann Arbor
- King
- 1989
(Show Context)
Citation Context ...i and fixed parameters, β, such that µi = g(Xi, β). The model is completed with some independence assumption, most typically that Yi and Yj are independent conditional on µi and θ for all i �= j (see =-=King, 1989-=-). For example, the simple linear-normal regression model has a scalar dependent variable Yi distributed normally with mean µi and variance σ 2 , and with the mean varying as a linear function of a ve... |

43 | Enhancing the validity and cross-cultural comparability of measurement in survey research - King, Murray, et al. - 2004 |

29 |
The Dangers of Extreme Counterfactuals.” Political Analysis 14(2): 131–59. http://gking.harvard.edu/files/abs/counterft-abs.shtml
- King, Zeng
- 2006
(Show Context)
Citation Context ... of interest farther from the data are more model dependent, we can use the output ofsetx() to evaluate, via the R packagewhatif, how far the counterfactual question of interest is from the data (see =-=King and Zeng 2006-=-, King and Zeng 2007, Stoll, King and Zeng 2005). One may also use diagnostic tools such as cross-validation to validate the fitted model for any model supported by the Zelig framework. Finally, estim... |

25 | Making the Most of Statistical Analyses - King, Tomz, et al. - 2000 |

22 |
Enhancing the Validity and
- King, Murray, et al.
- 2004
(Show Context)
Citation Context ...es due to threshold shifts resulting from differential item functioning (i.e., survey respondents having different standards for what constitutes 9 (3)sdifferent levels of the dependent variable; see =-=King et al., 2004-=-). The main portion of the model is a multivariate ordered probit, for independent normal latent variables Y ∗ is for observation i (i = 1, . . . , n) and self-assessment variable s (s = 1, . . . , S)... |

14 | What to do about missing values in time series cross-section data. http://gking.harvard.edu/files/abs/ pr-abs.shtml (accessed Sept 6
- Honaker, King
- 2008
(Show Context)
Citation Context ...s in — perhaps after being preprocessed via matching methods for causal inference (Rubin, 1973; Ho, Imai, King and Stuart, 2007), multiple imputation for missing data (Rubin, 1987; King et al., 2001; =-=Honaker and King, 2007-=-), or outlier removal and feature detection to improve data quality or statistical robustness (Bishop, 1 This paper summarizes only the structural aspects of Zelig, rather than all of its options. See... |

11 | An introduction to the dataverse network as an infrastructure for data sharing
- King
- 2007
(Show Context)
Citation Context ...ps after being preprocessed via matching methods for causal inference (Rubin, 1973; Ho, Imai, King and Stuart, 2007), multiple imputation for missing data (Rubin, 1987; King et al., 2001; Honaker and =-=King, 2007-=-), or outlier removal and feature detection to improve data quality or statistical robustness (Bishop, 1 This paper summarizes only the structural aspects of Zelig, rather than all of its options. See... |

11 |
When Can History Be Our Guide? The
- King, Zeng
- 2006
(Show Context)
Citation Context ... from the data are more model dependent, we can use the output ofsetx() to evaluate, via the R packagewhatif, how far the counterfactual question of interest is from the data (see King and Zeng 2006, =-=King and Zeng 2007-=-, Stoll, King and Zeng 2005). One may also use diagnostic tools such as cross-validation to validate the fitted model for any model supported by the Zelig framework. Finally, estimates of the quantiti... |

10 | Amelia II: A program for missing data - Honker, King, et al. - 2009 |

9 | 2001 “Analyzing Incomplete Political Science Data: An Alternative Algorithm for Multiple Imputation - King, Honaker, et al. |

6 | MatchIt: Matching Software for Causal Inference. Version 0.8. Used with permission - HO, IMAI, et al. - 2004 |

2 | Zelig: everyone's statistical software. Available at: http://GKing.harvard.edu/zelig - Imai, King, et al. - 2007 |

2 | 2006) “The Dangers of Extreme Counterfactuals - King, Zeng |

1 | Amelia II: A Program for Missing Data.” Available online at http://gking.harvard.edu/amelia - Honaker, King, et al. - 2006 |