#### DMCA

## Bringing Context and Variability Back in to Causal Analysis (2012)

Citations: | 3 - 0 self |

### BibTeX

@MISC{Morgan12bringingcontext,

author = {Stephen L. Morgan and et al.},

title = {Bringing Context and Variability Back in to Causal Analysis },

year = {2012}

}

### OpenURL

### Abstract

The methodology of causal analysis in the social sciences is often divided into two ideal type research scenarios: experimental social science and observational social science. For experimental social science, the researcher can manipulate the cause of interest. The most common research design is one where the analyst assigns values of the cause according to a randomization scheme and then calculates post-treatment differences in outcomes across levels of the assigned cause. Typically, the researcher gives little or no attention to individual-specific differences in the inferred causal effects or to the context in which the experiment is conducted. 1 For observational social science, the analyst cannot manipulate the cause through intervention because some process outside of the analyst's control determines the pattern of causal exposure. To develop causal assertions, the analyst must adopt a model of causal exposure based on assumptions about how the cause is distributed in the population. Most commonly, a model is adopted that warrants causal inference from differences in outcomes calculated within sets of observed individuals who are exposed to alternative values of the cause but who are deemed otherwise comparable by the maintained model of causal exposure. Individual-level variation in causal effects is then presumed to exist within and across comparison sets, often arising from interactions between individuals' characteristics and the contexts within which they are exposed to the cause. In this chapter, we will discuss methods for modeling causal effects in observational social science, giving particular attention to the capacity of new graphical methods to represent and then motivate models that can effectively deliver estimates of underlying heterogeneity of causal effects. We have several related goals that we will pursue in the following order: (1) explain why quantitatively oriented social science that adopted path modeling methodology became a target of critiques that it had ignored variability and context, (2) demonstrate how such effects can be expressed within a more recent methodology of causal graphs, (3) consider feasible empirical strategies to identify these effects, and (4) explain why causal graphs pose a risk of obscuring patterns of heterogeneity that deserve full scrutiny. To set the stage for our explanations, consider some classic examples from sociology that have sought to model explicitly the effects of individual-level heterogeneity of causal effects as they interact with consequential social contexts. At least since the 1980s, sociologists have investigated the effects of neighborhoods on educational outcomes, deviance, and the transition to adulthood (for insightful reviews, see Jencks and Mayer 1990 and Harding, Gennetian, Winship, Sanbonmatsu, and Kling 2011). Because neighborhoods have many characteristics, and individuals living within them can be influenced to varying degrees by circumstances only partly under their own control, the effects of neighborhoods have proven persistently di¢ cult to estimate. These debates have not been settled by …rst-rate observational data analysis or by large-scale experimentation (see Alongside this work on neighborhoods, sociologists of education have studied the variable e¤ects of schooling on the academic achievement of students. These studies include attempts to estimate the di¤erential e¤ects of public schooling on learning for students from di¤erent socioeconomic strata. For example, Sociologists have also considered the di¤erential consequences of labor market conditions and training opportunities for young adults. For example, Mare and Winship (1984) studied the extent to which changes in the unemployment rate for black youths can be considered di¤erential responses to labor market conditions across youths who have di¤erential propensities to enter the military or postsecondary schooling. More recently, Brand and Xie (2010) have studied the di¤erential payo¤ of college across di¤erent types of students, challenging the position implicitly maintained by many economists that college provides the greatest bene…ts to those most likely to enter college. Our focal example in this chapter will be the contentious research on charter schooling in the United States that has been the subject of substantial and recent public debate. In an excellent book on these debates, Henig (2008:2) introduces and de…nes charter schools in the following way: Just a little more than …fteen years since the …rst charter school opened in Minnesota, there are now nearly 4,000 nationwide, serving an estimated 1.1 million students. ... The laws governing charter schools di¤er -sometimes substantially -from state to state, of course, but some general characteristics have emerged. Charter schools receive public funding on a per-student basis, are often responsible for achieving educational outcomes de…ned by their government chartering entity, and are subject to at least nominal public oversight. They typically are barred from charging tuition on top of the public per-pupil allocation, but are free to pursue other forms of supplementary support from donors, foundations, or corporate sponsors. Although they must observe certain baseline regulations, such as prohibitions on discrimination and the provision of safe environments, they are exempt from many of the rules and regulations that bind regular public schools to speci…c standards and procedures. This hybrid status ... has made charter schools a special focus of attention and helped draw them into ideological whirlpools that raise the stakes surrounding the research into their actual form and consequences. At their core, the central research questions in the debate are simple: Do students who attend charter schools perform better on standardized tests than they would 3 have performed if they had instead attended regular public schools? Would students who attend regular public schools perform better on standardized tests if they had instead attended charter schools? The contentious research that has addressed these questions is distinguished in many respects. Not only are some of its combatants leading researchers at the nation's top universities, many of these researchers are unusually ideological (as Henig shows brilliantly in his book). This scholarly energy is ampli…ed by the public attention that has been paid to charter schools by the national press, which is related to the support that charter schools have received from celebrity donors and from presidential aspirants. At the same time, the research that informs the debate is cutting edge in the best sense. Careful attention is paid to details of measurement, and the research designs that have been adopted are a healthy mixture of basic comparisons of achievement levels as well as daring attempts to leverage quasi-experimental variation from the ways in which charter school programs are administered. 2 What makes pursuing these questions complex is the underlying heterogeneity of the real world. The process by which some students become enrolled in charter schools is only partly observed. It is likely that some students in charter schools are much more likely to bene…t from them than others, and it is even more di¢ cult to assess how students who never contemplated entering charter schools might fare if given the opportunity to attend them. At the same time, charter schools di¤er greatly from each other, such that the e¤ect of charter schooling must surely vary because of quality di¤erences, as well as the match between each student and the unique features of each charter school. In the next section, we provide necessary background for our subsequent pre- 4 the charter school e¤ect from a path-modeling perspective. We also use this material to explain how quantitatively oriented sociology opened itself up to the critique that variability and context were too frequently ignored in attempts to estimate causal e¤ects. 3 An Emergent Vulnerability to Critique In the 1980s and early 1990s, a robust critique of dominant forms of quantitative research arose in sociology (see We will not review these critiques in this chapter, but for this section we will use the models at the heart of these critiques -simple path diagrams and their underlying linear regression equations -as a point of departure. In the remainder of the chapter that follows this section, we will explain why we feel that this robust critique of quantitatively oriented causal analysis has now been weakened by improved practice that draws on a virtuous combination of causal graphs with nonparametric foundations and causal e¤ects with potential outcome de…nitions. To understand the graphical appeal of traditional path models, consider the 3 Before carrying on, we should note that our title "Bringing Context and Variability Back in to Causal Analysis" is slightly misleading, since we will explain how context and variability have been brought back into causal analysis in the past …fteen years in ways that provide a solid foundation for future research. Thus, the tone of our chapter is optimistic and forward looking, not an indictment of current practice (which is often how the phrase "Bringing ____ Back In"has been used in the long series of critical papers in sociology that followed from Homans'classic manifesto on methodological individualism, delivered as "Bringing Men Back In"in his 1964 Presidential Address to the American Sociological Association). 5 path diagram presented in The structure of the path diagram in where e Y is a regression representation of all omitted factors that determine Y . The …nal e Y term in Equation 1 is suppressed in In the literature on path models that swept through the social sciences in Can the e¤ect of C on Y vary across P ? That seems reasonable, since it would seem that the e¤ect of a charter school would depend on family background. Parents with college degrees probably help their kids get more out of school. Actually, now that I think about it, since N captures neighborhood characteristics, don't we think that there are better schools in some neighborhoods? In fact, charter schools are more likely to be established in areas with troubled neighborhood-based schools. And neighborhoods with weaker schools also tend to have stronger deviant subcultures with gangs and such. So the e¤ect of charter schooling probably also depends on the neighborhood in which one lives. How do we represent such variation in e¤ects in the path model? 5 In response, an instructor would typically explain that one can think of such e¤ects as supplemental arrows from a variable to an arrow in the path diagram, such that the variable itself modi…es the arrow. Yet, since these sorts of arrows are not formally justi…ed in traditional path diagrams, the instructor would almost surely have then recommended a shift toward a more complex regression speci…cation, such as This might have been an acceptable form of pragmatism if the approximation spirit had carried over to model interpretation. Too frequently it did not, and many causal assertions can be found in the literature based on linear additive models that are overly reductionist. This incautious literature then opened up quantitative research to the claims of critics that too many practitioners had fallen prey to the belief that linear regression modeling reveals strong causal laws in which variability and context play minor roles. 8 The most cogent presentation of this criticism is Abbott's oft-cited "Transcending General Linear Reality"(see Moving Beyond Path Diagrams and Simplistic Linear Regression Models Consider the general multiple regression model of the form where Y is an interval-scaled outcome variable and X 1 through X k are predictor variables. Estimation of the slope parameters b 1 through b k can be motivated as a descriptive data reduction exercise where the goal is to obtain a best-…tting linear approximation to the population-level relationship between Y and X 1 through X k . Alternatively, and more ambitiously, the model can be estimated as a full causal model where the interest is in identifying the expected shifts in Y that would result from what-if interventions on all possible values of the variables X 1 through X k . The path model tradition embraced the second and more ambitious of these two approaches. The more recent literature that we will consider in this chapter has examined an intermediate case. For this model, the variable X 1 in Equation 3 is an indicator variable C, as in which is the causal e¤ect of charter schooling instead of regular schooling for each individual i. The variables Y 1 and Y 0 are then population-level potential outcome random variables, and the average treatment e¤ect (ATE) in the population is where E[:] is the expectation operator from probability theory. The observed outcome 11 variable Y is de…ned as Although quite simple, this notational shift changed perspectives and allowed for the development of new techniques, as we will discuss later. It also made clear (to social scientists who may have forgotten) that causal e¤ects exist independent of regression models and can be expressed without relying on regression-based language. With this notation, causal e¤ects could be de…ned over any subset of the population. Two particular average causal e¤ects of interest became common to investigate. The average treatment e¤ect for the treated (ATT) is while the average treatment e¤ect for the controls (ATC) is For the charter school example, these are, respectively, the average e¤ect of charter schooling for those who attend charter schools and for those who attend regular public The third major advancement that has allowed scholarship to move beyond simple regression models is the elaboration of a new form of graph-based causal modeling. Here, the contribution is in enabling new methodological insight and in providing new levels of clarity to researchers. Since this perspective is less familiar to social scientists, and often both misunderstood and underappreciated, we present it in considerable detail in the next section. A New Methodology of Causal Graphs Since the 1990s, the rationale for graphical depictions of causal relationships has been strengthened by scholars working at the margins of the social sciences. Judea that Pearl and his colleagues are credited with developing, and we will present here only the essential points necessary to demonstrate how variability and context can be incorporated into current graphical methods for causal analysis. 6 We will not discuss the new literature in epidemiology on the distinction between an interaction and an e¤ect modi…cation (see 13 The causal graph in When seen as a causal graph as developed in the more recent literature, the structural equations for Reading from left to right in the causal graph in Instead, if one must use a data analytic machine to conceptualize how to perform an appropriate empirical analysis of the puzzle under consideration, one should default to simple tabular strati…cation. In this case, one should think of a su¢ ciently large sample, such that one could, for example, estimate with great precision the value of Y for every conceivable combination of values for P = p, C = c, and N = n. Average causal e¤ects can then be calculated by appropriately weighting di¤erences calculated within such a strati…cation of the data. Representing Variability and Contextual E¤ects in Causal Graphs Although there are tremendous advantages that accrue from the general nonparametric structure of causal graphs, it can still be hard to encode heterogeneity in causal graphs in transparent ways for social scientists. Moreover, many scholars who work with causal graphs but who are not social scientists (including those who have developed the case for their general applicability to all causal analysis) do not fully understand how social scientists think about heterogeneity, especially when produced by an interaction with an unobserved variable. To promote understanding by making the key conceptual linkages, we start with a model that is simpler even than the one in Two Separate Causal Graphs for Two Latent Classes Consider the two causal graphs in [INSERT FIGURE 2 ABOUT HERE] Although surely a gross oversimpli…cation, suppose nonetheless that the population is composed of sixth graders who have been raised in two types of families. Families with G = 1 choose schools predominantly for lifestyle reasons, such as proximity to their extended families and tastes for particular school cultures, assuming that all schools are similar in instructional impact because achievement is largely a function of individual e¤ort. Families with G = 2 choose elementary schools for their children by selecting the school, subject to constraints, that they feel will maximize the achievement of their children, assuming that schools di¤er in quality and that their children may learn more in some schools than in others. Accordingly, they are attentive to the national press on educational policy, in which both the Bush and Obama administrations argued for increasing the number of charter schools in the country because some researchers had argued that charter schools are more e¤ective. As a consequence, the second group of families is more likely to send their children to charter schools, such that the mean of C is higher for those families with G = 2 than G = 1. Finally, suppose that parents with college degrees are more likely to value distinctive forms of education, and as a result are more likely to send their children to charter schools (independent of whether or not highly educated parents are more likely to be found in the latent class for whom G = 2, which we will discuss later). 8 The lower case values x, d, and y for the two causal graphs are meant to connote that these are realized values of X, D, and Y that may di¤er in their distributions across the two latent classes. 17 They are also more likely to be able to support children in completing homework and otherwise making the most of the educational opportunities that are o¤ered to their children. Accordingly, suppose that in both groups the causal e¤ects P ! C and P ! Y are positive and substantial (i.e., that 1 , 1 , 2 , and 2 in The question for investigation is whether the e¤ect of C on Y is positive for both groups, and if so, whether it is the same size for both groups. If we are willing to assume, as some of the literature suggests, that the second group of families is correct in the sense that school quality does matter for student learning, and further that charter schools are higher quality (as authors of this chapter, we neither agree nor disagree with this position; see Henig 2008), then we should expect that both 1 and 2 are more likely positive than not. And, if we believe that parents with G = 2 have some sense that this is correct, then not only will more of them send their children to charter schools, they will also sort their children more e¤ectively into charter and noncharter schools. In other words, they will also be more likely to continue to enroll their children in regular public schools if they feel that their children will not bene…t from the distinctive characteristics of available charter schools (e.g., if the charter schools that have openings have instructional themes that their children …nd distasteful). Because both of these self-selection e¤ects are reinforcing, is it likely that 2 > 1 . 9 If this plausible scenario is true in reality, what would happen if a researcher ignored the latent classes (either by mistake or, more realistically, because the membership variable G is unobserved) and simply assumed that a single DAG prevailed? In this case, a researcher might estimate the e¤ect of C on Y for each value of P and then average these e¤ects over the distribution of P , yielding a population-level estimate . At best, this estimate would be uninformative about the underlying 9 These target parameters, 2 and 1 , are de…ned implicitly as the average e¤ect of charter schooling for all students from families with G = 2 and G = 1, respectively. 18 pattern of heterogeneity that suggests that 2 > 1 . At worst, this estimate would be completely wrong as an estimate of the average causal e¤ect of C on Y . For example, if P predicts latent class membership G, and G predicts the size of the e¤ect of C on Y , then P -stratum-speci…c e¤ects mix together individual-level causal e¤ects that vary with the conditional distribution of G within the strata of P . Combining P -stratum-speci…c e¤ects by calculating an average e¤ect across only the distribution of P does not properly weight the G-stratum-speci…c e¤ects that are embedded in di¤erential patterns within the strata of P . In order to consider these possibilities, we need to have a model of selection into C that is informed by a model of the traits of individuals that would cause them to be found in underlying latent classes. It is most natural to pursue such a model in a single causal graph that explicitly represents the latent classes by including the variable G as a node within it. A Single Causal Graph for Two Latent Classes Consider 10 The variable G is given a hollow node, , to indicate that it is unobserved. The arrow from G to C is present because there are alternative groups of families, coded by the alternative values of the unobserved variable G, that approach di¤erently the decision of whether to send their children to charter schools. As a result, G predicts charter school attendance, C. 11 The corresponding structural equations for the causal graph in 11 Although we will continue to write as if G only takes on two values that identify two latent classes, this restriction is no longer necessary. G may take on as many values as there are alternative groups of families who approach di¤erently the decision of whether to send their children to a charter school. then The latent class membership variable G only enters these structural equations in two places, on its own in Equation 17 and then as an input to f C (:) in Equation [ INSERT FIGURE 3 ABOUT HERE] To accept For the charter school e¤ect, there is no literature to support such a dismissal of the power of self-selection. Accordingly, for Figures 4(a) and 4(b), we add an arrow from G to Y to the graph presented earlier in [INSERT FIGURE 4 ABOUT HERE] For so that family background is an explicit cause of latent class membership. It is likely that parents with high socioeconomic status are more likely to select on the possible causal e¤ect of charter schooling, which is how the latent classes were discussed for Self-Selection into the Latent Classes 21 Suppose that latent class membership G is determined by a variable that measures a family's subjective expectation of their child's likely bene…t from attending a charter school instead of a regular public school. Although we could enter this variable into a causal graph with a single letter, such as S or E, for Note that Exp(C ! Y ) is determined solely by e Exp in Equation 26. Thus, [ INSERT FIGURE 5 ABOUT HERE] Given what we have written in the last section about the likelihood that families with di¤erent patterns of P will end up in di¤erent latent classes represented by G, it seems clear that The structural equations are now augmented as In sociology, the causal e¤ect of P on Exp(C ! Y ) via I follows from the position that privileged positions in social structure are occupied by advantaged families. From these positions, individuals acquire information I that allows them to recognize bene…ts that are available to them. 12 By elaborating the causal graph progressively from Self-Selection into the Treatment and a Complementary Context How hard is the task of allowing for contextual e¤ects? With causal graphs, it is considerably easier than one might expect. Consider 13 With the addition of N , the function for Y is now f Y (P; C; G; N; e Y ). Recall, again, that N is not restricted by any functional form assumption for f Y (:). As a result, the causal e¤ect of N can modify or interact with the partial e¤ects of G; C; and P on Y . [INSERT FIGURE 6 ABOUT HERE] Figure 6 also allows for even more powerful e¤ects of self-selection. Suppose that self-selection into the latent classes in G is associated with self-selection into N as well. We see two separate and countervailing tendencies. Parents attuned to the potential bene…ts of charter schooling are also more likely to choose neighborhood contexts that best allow them to encourage their children to study hard in school. At the same time, after obtaining an attendance o¤er from a charter school, a family may also decide to move to an alternative neighborhood in the catchment area of a suboptimal regular public school, since attendance at such a school may no longer be a consideration in the family's residential decision. If either e¤ect is present, then the function for N is equal to f N (G; e N ), and we then have seven structural equations as 13 If the latter are only di¤use cultural understandings that only weakly shape local norms about the appropriateness of enacting the role of achievement-oriented student, then such variables may be di¢ cult to observe. In this case, N might then be coded as a series of neighborhood dummy identi…er variables. Analysis of these e¤ects would then only be possible if there were su¢ cient numbers of students to analyze from within each neighborhood studied. Without such variation, the potential e¤ects of N could not be separated from individual characteristics of students and their families. And, if modeled in this way, only the total e¤ects of N would be identi…ed, since the dummy variables for N would not contain any information on the underlying explanatory factors that structure the neighborhood e¤ects that they identify. 14 See VanderWeele (2009) for an incisive analysis of the di¤erence between an interaction and an e¤ect modi…cation. Our interest, conceptually at least, is in instances of genuine causal interaction, although much of what we write would hold under simpler structures of only e¤ect modi…cation. 24 P = f P (e P ), (36)