## Matching Methods for Causal Inference: A Review and a Look Forward

Citations: | 19 - 1 self |

### BibTeX

@MISC{Stuart_matchingmethods,

author = {Elizabeth A. Stuart},

title = {Matching Methods for Causal Inference: A Review and a Look Forward},

year = {}

}

### OpenURL

### Abstract

Abstract. When estimating causal effects using observational data, it is desirable to replicate a randomized experiment as closely as possible by obtaining treated and control groups with similar covariate distributions. This goal can often be achieved by choosing well-matched samples of the original treated and control groups, thereby reducing bias due to the covariates. Since the 1970s, work on matching methods has examined how to best choose treated and control subjects for comparison. Matching methods are gaining popularity in fields such as economics, epidemiology, medicine and political science. However, until now the literature and related advice has been scattered across disciplines. Researchers who are interested in using matching methods—or developing methods related to matching—do not have a single place to turn to learn about past and current research. This paper provides a structure for thinking about matching methods and guidance on their use, coalescing the existing research (both old and new) and providing a summary of where the literature on matching methods is now and where it should be headed. Key words and phrases: Observational study, propensity scores, subclassification, weighting.

### Citations

2280 | Statistical Power Analysis for the Behavioral Sciences. Erlbaum Ass - Cohen - 1988 |

1012 |
The central role of the propensity score in observational studies for causal effects
- Rosenbaum, Rubin
- 1983
(Show Context)
Citation Context ... assignment mechanism, which determines which individuals receive treatment and which receive control. A key assumption in non-experimental studies is that of strongly ignorable treatment assignment (=-=Rosenbaum and Rubin, 1983-=-b) which implies that (1) treatment assignment (T ) is independent of the potential outcomes (Y (0), Y (1)) given the covariates (X): T ⊥(Y (0), Y (1))|X, and (2) there is a positive probability of re... |

652 |
Multiple Imputation for Nonresponse in Surveys
- Rubin
- 1987
(Show Context)
Citation Context ...mpute the missing covariates, run the matching and effect estimation separately within each “complete” data set, and then use the multiple imputation combining rules to obtain final effect estimates (=-=Rubin, 1987-=-; Song et al., 2001). Qu and Lipkovich (2009) illustrate this method and show good results for an adaptation that also includes indicators of missing data patterns in the propensity score model. In ad... |

468 | Estimating causal effects of treatments in randomized and non randomized studies - Rubin - 1974 |

413 |
Matching as an Econometric Evaluation Estimator: Evidence from Evaluating a Job Training Programme
- Heckman, Ichimura, et al.
- 1997
(Show Context)
Citation Context ...ividuals, regardless of the overlap of the distributions. When using those methods it may be beneficial to explicitly restrict the analysis to those individuals in the region of common support (as in =-=Heckman et al., 1997-=-; Dehejia and Wahba, 1999). Most analyses define common support using the propensity score, discarding individuals with propensity score values outside the range of the other group. A second method in... |

304 |
Characterizing Selection Bias Using Experimental Data
- Heckman, Ichimura, et al.
- 1998
(Show Context)
Citation Context ...umption of ignorable treatment assignment, it is important to include in the matching procedure all variables known to be related to both treatment assignment and the outcome (Rubin and Thomas, 1996; =-=Heckman et al., 1998-=-b; Glazerman et al., 2003; Hill et al., 2004). Generally poor performance is found of methods that use a relatively small set of “predictors of convenience,” such as demographics only (Shadish et al.,... |

304 | Statistics and causal inference - Holland - 1986 |

275 |
Causal Effects in Non-experimental Studies: Reevaluating the Evaluation of Training Programmes
- Dehejia, Wahba
- 1999
(Show Context)
Citation Context ...ation. Selection models and regression models have been shown to perform poorly in situations where there is insufficient overlap, but their standard diagnostics do not involve checking this overlap (=-=Dehejia and Wahba, 1999-=-, 2002; Glazerman et al., 2003). Matching methods in part serve to make researchers aware of the quality of resulting inferences. Third, matching methods have straightforward diagnostics by which thei... |

237 | 2002; Propensity Score Matching Methods for Non-Experimental Casual Studies. Review of Economics and Statistics - Dehejia, Wahda |

236 | Nonparametric Estimation of Average Treatment Effects under Exogeneity: A
- Imbens
- 2004
(Show Context)
Citation Context ...esearch and resources have been scattered across disciplines such as statistics (Rosenbaum, 2002; Rubin, 2006), epidemiology (Brookhart et al., 2006), sociology (Morgan and Harding, 2006), economics (=-=Imbens, 2004-=-), and political science (Ho et al., 2007). This paper coalesces the diverse literature on matching methods, bringing together the original work on matching methods–of which many current researchers a... |

225 |
Statistical Methods
- Snedecor, Cochran
- 1967
(Show Context)
Citation Context ...er increases when the groups are more similar because of the reduced extrapolation and higher precision that is obtained when comparing groups that are similar versus groups that are quite different (=-=Snedecor and Cochran, 1980-=-). This is also what yields the increased power of using matched pairs in randomized experiments (Wacholder and Weinberg, 1982). Smith (1997) provides an illustration where estimates from 1:1 matching... |

209 |
A generalization of sampling without replacement from a finite universe
- Horvitz, Thompson
- 1952
(Show Context)
Citation Context ...ity score for individual k. This weighting serves to weight both the treated and control groups up to the full sample, in the same way that survey sampling weights weight a sample up to a population (=-=Horvitz and Thompson, 1952-=-). An alternative weighting technique, weighting by the odds, can be used to estimate the ATT (Hirano et al., 2003). Formally, wi = Ti + (1 − Ti) êi . With this weight, treated individuals receive a w... |

203 |
Constructing a Control Group Using Multivariate Matched Sampling Methods that Incorporate the Propensity Score
- Rosenbaum, Rubin
- 1985
(Show Context)
Citation Context ...gh dimensional. Requiring exact matches often leads to many individuals not being matched, which can result in larger bias than if the matches are inexact but more individuals remain in the analysis (=-=Rosenbaum and Rubin, 1985-=-b). A recent advance, coarsened exact matching (CEM), can be used to do exact matching on broader ranges of the variables; for example, using income 11categories rather than a continuous measure (Iac... |

168 | Efficient estimation of average treatment effects using the estimated propensity score. Econometrica 71
- Hirano, Imbens, et al.
- 2003
(Show Context)
Citation Context ... the same way that survey sampling weights weight a sample up to a population (Horvitz and Thompson, 1952). An alternative weighting technique, weighting by the odds, can be used to estimate the ATT (=-=Hirano et al., 2003-=-). Formally, wi = Ti + (1 − Ti) êi . With this weight, treated individuals receive a weight of 1. Control individuals 1−êi 19are weighted up to the full sample using the 1 1−êi this way both groups a... |

162 | Observational Studies - Rosenbaum - 2002 |

160 |
Reducing bias in observational studies using subclassification on the propensity score
- Rosenbaum, Rubin
- 1984
(Show Context)
Citation Context ...is found on particular variables or functions of variables, those terms can be included in a re-estimated propensity score model, which should improve their balance in the subsequent matched samples (=-=Rosenbaum and Rubin, 1984-=-; Dehejia and Wahba, 2002). Research indicates that mis-estimation of the propensity score (e.g., excluding a squared term that is in the true model) is not a large problem, and that treatment effect ... |

157 | Matching as an econometric evaluation estimator - Heckman, Ichimura, et al. - 1998 |

139 |
The role of propensity score in estimating dose–response functions
- Imbens
- 2000
(Show Context)
Citation Context ...ighting technique, used primarily in economics, is kernel weighting, which averages over multiple individuals in the control group for each treated individual, with weights defined by their distance (=-=Imbens, 2000-=-). Heckman et al. (1997, 1998a,b) describe a local linear matching estimator that requires specifying a bandwidth parameter. Generally, larger bandwidths increase bias but reduce variance by putting w... |

114 | Large Sample Properties of Matching Estimators for Average Treatment Effects. Econometrica 74(1
- Abadie, Imbens
- 2006
(Show Context)
Citation Context ... two methods have been shown to work best in combination (Rubin, 1973b; Carpenter, 1977; Rubin, 1979; Robins and Rotnitzky, 1995; Heckman et al., 1997; Rubin and Thomas, 2000; Glazerman et al., 2003; =-=Abadie and Imbens, 2006-=-). This 25● Absolute Standardized Diff in Means 0.0 0.1 0.2 0.3 ● ● ● ● ● ● ● ● ● ● ● ● ● All Data Matched Data Figure 2: Plot of standardized difference of means of 10 covariates before and after ma... |

94 | Matching as nonparametric preprocessing for reducing model dependence in parametric causal inference. Political Analysis
- HO, IMAI, et al.
- 2007
(Show Context)
Citation Context ...red across disciplines such as statistics (Rosenbaum, 2002; Rubin, 2006), epidemiology (Brookhart et al., 2006), sociology (Morgan and Harding, 2006), economics (Imbens, 2004), and political science (=-=Ho et al., 2007-=-). This paper coalesces the diverse literature on matching methods, bringing together the original work on matching methods–of which many current researchers are not aware–and tying together ideas acr... |

65 |
Assessing sensitivity to an unobserved binary covariate in an observational study with binary outcomes
- Rosenbaum, Rubin
- 1983
(Show Context)
Citation Context ... assignment mechanism, which determines which individuals receive treatment and which receive control. A key assumption in non-experimental studies is that of strongly ignorable treatment assignment (=-=Rosenbaum and Rubin, 1983-=-b) which implies that (1) treatment assignment (T ) is independent of the potential outcomes (Y (0), Y (1)) given the covariates (X): T ⊥(Y (0), Y (1))|X, and (2) there is a positive probability of re... |

63 |
Semiparametric Efficiency in Multivariate Regression Models with Missing Data
- Robins, Rotnitzky
- 1995
(Show Context)
Citation Context ...are not designed to “compete” with modeling adjustments such as linear regression, and in fact the two methods have been shown to work best in combination (Rubin, 1973b; Carpenter, 1977; Rubin, 1979; =-=Robins and Rotnitzky, 1995-=-; Heckman et al., 1997; Rubin and Thomas, 2000; Glazerman et al., 2003; Abadie and Imbens, 2006). This 25● Absolute Standardized Diff in Means 0.0 0.1 0.2 0.3 ● ● ● ● ● ● ● ● ● ● ● ● ● All Data Match... |

56 |
The Consquences of Adjustment for a Concomitant Variable That Has Been Affected by the Treatment
- Rosenbaum
- 1984
(Show Context)
Citation Context ...vious research and scientific understanding (Rubin, 2001). One type of variable that should not be included in the matching process are those that may have been affected by the treatment of interest (=-=Rosenbaum, 1984-=-; Frangakis and Rubin, 2002; Greenland, 2003). This is especially important when the covariates, treatment indicator, and outcomes are all collected at the same point in time. If it is deemed to be cr... |

55 | Matching using estimated propensity scores: relating theory to practice - Rubin, Thomas - 1996 |

53 |
Using multivariate matched sampling and regression adjustment to control bias in observational studies
- Rubin
- 1979
(Show Context)
Citation Context ...es; for example, using income 11categories rather than a continuous measure (Iacus et al., 2009). The Mahalanobis distance can work quite well when there are relatively few covariates (fewer than 8; =-=Rubin, 1979-=-; Zhao, 2004), but it does not perform as well when the covariates are not normally distributed or there are many covariates (Gu and Rosenbaum, 1993). This is likely because Mahalanobis metric matchin... |

46 |
Principal stratification in causal inference
- Frangakis, Rubin
- 2002
(Show Context)
Citation Context ...d scientific understanding (Rubin, 2001). One type of variable that should not be included in the matching process are those that may have been affected by the treatment of interest (Rosenbaum, 1984; =-=Frangakis and Rubin, 2002-=-; Greenland, 2003). This is especially important when the covariates, treatment indicator, and outcomes are all collected at the same point in time. If it is deemed to be critical to control for a var... |

39 |
Comparison of multivariate matching methods: Structure, distances, and algorithms
- Gu, Rosenbaum
- 1993
(Show Context)
Citation Context ...e well when there are relatively few covariates (fewer than 8; Rubin, 1979; Zhao, 2004), but it does not perform as well when the covariates are not normally distributed or there are many covariates (=-=Gu and Rosenbaum, 1993-=-). This is likely because Mahalanobis metric matching essentially regards all interactions among the elements of X as equally important; with more covariates Mahalanobis matching thus tries to match m... |

39 | Adjusting for Nonignorable Drop-Out Using Semiparametric Nonresponse Models
- Scharfstein, Rotnitzky, et al.
- 1999
(Show Context)
Citation Context ...stimation procedure than to the true underlying probabilities. Weight trimming, which sets weights above some maximum to that maximum, has been proposed as one solution to this problem (Potter, 1993; =-=Scharfstein et al., 1999-=-). However, there is relatively little guidance regarding the trimming level. Because of this sensitivity to the size of the weights and potential model misspecification, more attention should be paid... |

36 | The effectiveness of adjustment by subclassification in removing bias in observational studies - Cochran - 1968 |

36 | Marginal structural models and causal inference in epidemiology
- Robins, Hernán, et al.
- 2000
(Show Context)
Citation Context ...ongitudinal settings where individuals may go in and out of the treatment group, or where the treatment assignment date is undefined for the control group. Methods such as marginal structural models (=-=Robins et al., 2000-=-) or balanced risk set matching (Li et al., 2001) are useful in those settings. 1.2 Notation and Background: Estimating Causal Effects As first formalized in Rubin (1974), the estimation of causal eff... |

35 | 2004): "Stratification and weighting via the propensity score in estimation of causal treatment effects", Statistics in Medicine
- Lunceford, Davidian
(Show Context)
Citation Context ...t into the propensity score. Based on those results, the current convention is to use 5-10 subclasses. However, with larger sample sizes more subclasses (e.g., 10-20) may be feasible and appropriate (=-=Lunceford and Davidian, 2004-=-). More work needs to be done to help determine the optimal number of subclasses: 18enough to get adequate bias reduction but not too many that the within-subclass effect estimates become unstable. 3... |

34 |
Using propensity scores to help design observational studies: application to the tobacco litigation
- Rubin
- 2001
(Show Context)
Citation Context ...liberal in terms of including variables that may be associated with treatment assignment and/or the outcomes. Some examples of matching have 50 or even 100 covariates included in the procedure (e.g., =-=Rubin, 2001-=-). However, in small samples it may not be possible to include a very large set of variables. In that case priority should be 9given to variables believed to be related to the outcome, as there is a ... |

33 | Confounding and collapsibility in causal inference - GREENLAND, PEARL, et al. - 1999 |

33 |
Combining propensity score matching with additional adjustments for prognostic covariates
- Rubin, Thomas
- 2000
(Show Context)
Citation Context ...the matched treated and control groups) may be a good alternative (Rosenbaum et al., 2007). If the key covariates of interest are continuous, Mahalanobis matching within propensity score 12calipers (=-=Rubin and Thomas, 2000-=-), defines the distance between individuals i and j as: ⎧ Dij = ⎪⎨ (Zi − Zj) ′ Σ −1 (Zi − Zj) if |logit(ei) − logit(ej)| ≤ c ⎪⎩ ∞ if |logit(ei) − logit(ej)| > c where c is the caliper, Z is the set of... |

32 | Causal Inference with General Treatment Regimes: Generalizing the Propensity Score - Imai, Dyk - 2004 |

32 | Propensity Score Estimation with Boosted Regression for Evaluating Causal Effects in Observational Studies
- McCaffrey, Ridgeway, et al.
- 2004
(Show Context)
Citation Context ...ed. The most common for propensity score estimation is logistic regression, although non-parametric methods such as boosted CART and generalized boosted models (gbm) often show very good performance (=-=McCaffrey et al., 2004-=-; Setoguchi et al., 2008; Lee et al., 2009). 13The model diagnostics when estimating propensity scores are not the standard model diagnostics for logistic regression or CART. With propensity score es... |

32 |
Model-based direct adjustment
- Rosenbaum
- 1987
(Show Context)
Citation Context ...able that is known to be unrelated to the treatment, such as a pre-treatment measure of the outcome variable (as in Imbens, 2004), or the difference in outcomes between multiple control groups (as in =-=Rosenbaum, 1987-=-b). If the test indicates that the effect is not equal to zero, then the assumption of ignorable treatment assignment is deemed to be less plausible. A second approach is to perform analyses of sensit... |

31 | Misunderstandings among experimentalists and observationalists: Balance test fallacies in causal inference. http://gking.harvard.edu/files/abs/matchse-abs.shtml (accessed September 1
- Imai, King, et al.
- 2006
(Show Context)
Citation Context ...ticularly effective in terms of reducing bias. Below we use “propensity score” to refer to either the propensity score itself or the linear version. Although exact matching is in many ways the ideal (=-=Imai et al., 2008-=-), the primary difficulty with the exact and Mahalanobis distance measures is that neither works very well when X is high dimensional. Requiring exact matches often leads to many individuals not being... |

31 |
Some practical issues in the evaluation of heterogeneous labour market programmes by matching methods
- Lechner
- 2002
(Show Context)
Citation Context ...e systematic differences–a situation where overfitting is good. When researchers want to account for the uncertainty in the matching, a bootstrap procedure has been found to outperform other methods (=-=Lechner, 2002-=-; Hill and Reiter, 2006). There are also some empirical formulas for variance estimation for particular matching scenarios (e.g., Abadie and Imbens, 2006, 2009b; Schafer and Kang, 2008), but this is a... |

31 |
The use of matched sampling and regression adjustment to remove bias in observational studies
- Rubin
- 1973
(Show Context)
Citation Context ...original work in matching methods, particularly the theoretical developments, which compared the benefits of selecting matched versus random samples of the control 3group (Althauser and Rubin, 1970; =-=Rubin, 1973-=-a,b). The second setting is one in which all of the outcome data is already available, and the goal of the matching is to reduce bias in the estimation of the treatment effect. A common feature of mat... |

30 |
Doubly robust estimation in missing data and causal inference models
- Bang, Robins
- 2005
(Show Context)
Citation Context ...be paid to the accuracy of propensity score estimates when the propensity scores will be used for weighting vs. matching (Kang and Schafer, 2007). Another effective strategy is doubly-robust methods (=-=Bang and Robins, 2005-=-), which yield accurate effect estimates if either the propensity score model or the outcome model are correctly specified, as discussed further in Section 5. 3.3 Assessing Common Support One issue th... |

29 | Demystifying double robustness: A comparison of alternative strategies for estimating a population mean from incomplete data
- Kang, Schafer
- 2007
(Show Context)
Citation Context ...ze of the weights and potential model misspecification, more attention should be paid to the accuracy of propensity score estimates when the propensity scores will be used for weighting vs. matching (=-=Kang and Schafer, 2007-=-). Another effective strategy is doubly-robust methods (Bang and Robins, 2005), which yield accurate effect estimates if either the propensity score model or the outcome model are correctly specified,... |

28 | Quantifying biases in causal models: Classical confounding versus collider-stratification bias. Epidemiology - Greenland - 2003 |

28 |
Matching with Doses in an Observational Study of a Media Campaign against Drug Abuse
- Lu, Zanutto, et al.
- 2001
(Show Context)
Citation Context ...osenbaum (2002) summarizes two methods for dealing with doses of treatment. In the first method, the propensity score is still 31a scalar function of the covariates (e.g., Joffe and Rosenbaum, 1999; =-=Lu et al., 2001-=-). In the second method, each of the levels of treatment has its own propensity score (e.g., Rosenbaum, 1987a; Imbens, 2000) and each propensity score is used one at a time to estimate the distributio... |

27 | Smoking and lung cancer: Recent evidence and a discussion of some questions - Cornfield, Haenszel, et al. - 1959 |

27 |
Observational Studies, 2nd edition
- Rosenbaum
- 2002
(Show Context)
Citation Context ...available, nor a summary of advice for applied researchers interested in implementing these methods. In contrast, the research and resources have been scattered across disciplines such as statistics (=-=Rosenbaum, 2002-=-; Rubin, 2006), epidemiology (Brookhart et al., 2006), sociology (Morgan and Harding, 2006), economics (Imbens, 2004), and political science (Ho et al., 2007). This paper coalesces the diverse literat... |

26 | Genetic Matching for Estimating Causal Effects: A General Multivariate Matching Method for Achieving Balance in Observational Studies - Diamond, Sekhon - 2005 |

25 |
Effects of Misspecification of the Propensity Score on Estimators of Treatment Effect
- Drake
- 1993
(Show Context)
Citation Context ...rm that is in the true model) is not a large problem, and that treatment effect estimates are more biased when the outcome model is misspecified than when the propensity score model is mis-specified (=-=Drake, 1993-=-; Dehejia and Wahba, 1999, 2002; Zhao, 2004). This may in part be because the propensity score is used only as a tool to get covariate balance–the accuracy of the model is less important as long as ba... |

25 |
Matching with Multiple Controls to Estimate Treatment Effects
- Smith
- 1997
(Show Context)
Citation Context ...ng the number of matches: Ratio Matching When there are large numbers of control individuals, it is sometimes possible to get multiple good matches for each treated individual, called ratio matching (=-=Smith, 1997-=-; Rubin and Thomas, 2000). Selecting the number of matches involves a bias:variance trade-off. Selecting multiple controls for each treated individual will generally increase bias since the 2nd, 3rd, ... |

25 | 2004a), “Full Matching in an Observational Study of Coaching for the SAT
- Hansen
(Show Context)
Citation Context ...e within-subclass effect estimates become unstable. 3.2.2 Full Matching A more sophisticated form of subclassification, full matching, selects the number of subclasses automatically (Rosenbaum, 1991; =-=Hansen, 2004-=-; Stuart and Green, 2008). Full matching creates a series of matched sets, where each matched set contains at least one treated individual and at least one control individual (and each matched set may... |