#### DMCA

## Inferring causal impact using Bayesian structural time-series models. Annals of Applied Statistics (2014)

Citations: | 3 - 0 self |

### Citations

2477 | Experimental and quasi-experimental designs for research. - Campbell, Stanley - 1963 |

1264 | Experimental and Quasi-experimental designs for Generalized Causal Inference. Wadsworth Cengage learning - Shadish, Cook, et al. - 2002 |

1107 |
Estimating Causal Effects of Treatments in Randomized and Non-randomized Studies
- Rubin
- 1974
(Show Context)
Citation Context ...1996; Seggie, Cavusgil and Phelan, 2007; Leeflang et al., 2009; Stewart, 2009). Here, we focus on measuring the impact of a discrete marketing event, such as the release of a new product, the introduction of a new feature, or the beginning or end of an advertising campaign, with the aim of measuring the event’s impact on a response metric of interest (e.g., sales). The causal impact of a treatment is the difference between the observed value of the response and the (unobserved) value that would have been obtained under the alternative treatment, i.e., the ‘effect of treatment on the treated’ (Rubin, 1974; Hitchcock, 2004; Morgan and Winship, 2007; Rubin, 2007; Cox and Wermuth, 2001; Heckman and Vytlacil, 2007; Antonakis et al., 2010; Kleinberg and Hripcsak, 2011; Hoover, 2012; Claveau, 2012). In the present setting the response variable is a time series, so the causal effect of interest is the difference between the observed series and the series that would have been observed had the intervention not taken place. Broadly speaking, there are three sources of information available for inferring the counterfactual time series. The first is the time-series behaviour of the response itself, prior ... |

920 |
Mostly Harmless Econometrics: An Empiricist’s Companion.
- Angrist, Pischke
- 2009
(Show Context)
Citation Context ... the difference between (i) the pre-post difference in the treatment group and (ii) the pre-post difference in the control group. The assumption underlying such differencein-differences (DD) designs is that the level of the control group provides an adequate proxy for the level that would have been observed in the treatment group in the absence of treatment (see Lester, 1946; Campbell, Stanley and Gage, 1963; Ashenfelter and Card, 1985; Card and Krueger, 1993; Angrist and Krueger, 1999; Athey and Imbens, 2002; Abadie, 2005; Meyer, 1995; Shadish, Cook and Campbell, 2002; Donald and Lang, 2007; Angrist and Pischke, 2008; Robinson, McNulty and Krasno, 2009; Antonakis et al., 2010). 4 K.H. BRODERSEN ET AL. post-intervention period pre-intervention period a c b Figure 1. Inferring impact through counterfactual predictions. (a) Simulated trajectory of a treated market (Y ) with an intervention beginning in January 2013. Two other markets (X1, X2) were not subject to the intervention and serve as synthetic controls. Inverting the state-space model described in the main text yields a prediction of what would have happened in Y had the intervention not taken place (posterior predictive expectation of the counterfac... |

822 | How Much Should we Trust Differences-in-Differences Estimates?”, Quarterly - Bertrand, Duflo, et al. - 2004 |

598 |
Variable selection via Gibbs sampling
- George, McCulloch
- 1993
(Show Context)
Citation Context ...re 1/σ2 ∼ G(10−2, 10−2s2y), where s2y = ∑ t(yt − y)2/(n − 1) is the sample variance of the target series. Scaling by the sample variance is a minor violation of the Bayesian paradigm, but it is an effective means of choosing a reasonable scale for the prior. It is similar to the popular technique of scaling the data prior to analysis, but we prefer to do the scaling in the prior so we can model the data on its original scale. When faced with many potential controls, we prefer letting the model choose an appropriate set. This can be achieved by placing a spike-andslab prior over coefficients (George and McCulloch, 1993, 1997; Polson and Scott, 2011; Scott and Varian, 2013). A spike-and-slab prior combines point mass at zero (the ‘spike’), for an unknown subset of zero coefficients, with a weakly informative distribution on the complementary set of non-zero coefficients (the ‘slab’). Contrary to what its name might suggest, the ‘slab’ is usually not completely flat, but rather a Gaussian with a large variance. Let % = (%1, . . . , %J), where %j = 1 if βj 6= 0 and %j = 0 otherwise. Let β% denote the non-zero elements of the vector β and let Σ−1% denote the rows and columns of Σ−1 corresponding to non-zero ent... |

545 |
On Gibbs sampling for state space models
- Carter, Kohn
- 1994
(Show Context)
Citation Context ...t for each t. We use the same samples to obtain the posterior distribution of cumulative impact. Posterior simulation. We use a Gibbs sampler to simulate a sequence (θ, z)(1), (θ, z)(2), . . . from a Markov chain whose stationary distribution is p(θ, z|y). The sampler alternates between: a data-augmentation step that simulates from p(z|y, θ); and a parameter-simulation step that simulates from p(θ|y, z). The data-augmentation step uses the posterior simulation algorithm from Durbin and Koopman (2002), providing an improvement over the earlier forward-filtering, backward-sampling algorithms by Carter and Kohn (1994), Fruhwirth-Schnatter (1994), and de Jong and Shepard (1995). In brief, because p(y, z|θ) is jointly multivariate normal, the variance of p(z|y, θ) does not depend on y. We can therefore simulate (y∗, z∗) ∼ p(y, z|θ) and subtract E(z∗|y∗, θ) to obtain zero-mean noise with the correct variance. Adding E(z|y, θ) restores the correct mean, which completes the draw. The required expectations can be computed using the Kalman filter and a fast mean smoother described in detail by Durbin and Koopman (2002). The result is a direct simulation from p(z|y, θ) in an algorithm that is linear in the number... |

497 | Minimum Wages and Employment: A Case Study of the Fast-Food Industry in
- Card, Krueger
- 1994
(Show Context)
Citation Context ... such settings is based on a linear model of the observed outcomes in the treatment and control group before and after the intervention. Such a model estimates the difference between (i) the pre-post difference in the treatment group and (ii) the pre-post difference in the control group. The assumption underlying such differencein-differences (DD) designs is that the level of the control group provides an adequate proxy for the level that would have been observed in the treatment group in the absence of treatment (see Lester, 1946; Campbell, Stanley and Gage, 1963; Ashenfelter and Card, 1985; Card and Krueger, 1993; Angrist and Krueger, 1999; Athey and Imbens, 2002; Abadie, 2005; Meyer, 1995; Shadish, Cook and Campbell, 2002; Donald and Lang, 2007; Angrist and Pischke, 2008; Robinson, McNulty and Krasno, 2009; Antonakis et al., 2010). 4 K.H. BRODERSEN ET AL. post-intervention period pre-intervention period a c b Figure 1. Inferring impact through counterfactual predictions. (a) Simulated trajectory of a treated market (Y ) with an intervention beginning in January 2013. Two other markets (X1, X2) were not subject to the intervention and serve as synthetic controls. Inverting the state-space model descri... |

462 |
Bayesian Forecasting and Dynamic Models
- West, Harrison
- 1997
(Show Context)
Citation Context ...egrate out our posterior uncertainty about which covariates to include and how strongly they should influence our predictions. All covariates are assumed to be contemporaneous; the present model does not infer on a potential lag between treated and untreated time series. A known lag, however, can be easily incorporated by shifting the corresponding regressor in time. Contemporaneous covariates with dynamic coefficients. A smaller number of potential controls can be included with dynamic regression coefficients to account for time-varying relationships (e.g., Banerjee, Kauffman and Wang, 2007; West and Harrison, 1997). Given covariates j = 1 . . . J , this introduces the dynamic regression component xtβt = J∑ j=1 xj,tβj,t βj,t+1 = βj,t + ηβ,j,t, (2.6) where ηβ,j,t ∼ N (0, σ2βj ) is Gaussian white noise. Here, βj,t is the coefficient for the jth control series and σβj is the standard deviation of its associated BAYESIAN CAUSAL IMPACT ANALYSIS 9 ... |

441 |
Natural and quasi-experiments in economics,”
- Meyer
- 1995
(Show Context)
Citation Context ...ontrol group before and after the intervention. Such a model estimates the difference between (i) the pre-post difference in the treatment group and (ii) the pre-post difference in the control group. The assumption underlying such differencein-differences (DD) designs is that the level of the control group provides an adequate proxy for the level that would have been observed in the treatment group in the absence of treatment (see Lester, 1946; Campbell, Stanley and Gage, 1963; Ashenfelter and Card, 1985; Card and Krueger, 1993; Angrist and Krueger, 1999; Athey and Imbens, 2002; Abadie, 2005; Meyer, 1995; Shadish, Cook and Campbell, 2002; Donald and Lang, 2007; Angrist and Pischke, 2008; Robinson, McNulty and Krasno, 2009; Antonakis et al., 2010). 4 K.H. BRODERSEN ET AL. post-intervention period pre-intervention period a c b Figure 1. Inferring impact through counterfactual predictions. (a) Simulated trajectory of a treated market (Y ) with an intervention beginning in January 2013. Two other markets (X1, X2) were not subject to the intervention and serve as synthetic controls. Inverting the state-space model described in the main text yields a prediction of what would have happened in Y had ... |

346 |
The Practice of Econometrics: Classic and Contemporary,
- Berndt
- 1991
(Show Context)
Citation Context ...tate-space approach discussed in this paper include autoregressive (AR) and moving-average (MA) models. These models define autocorrelation among observations rather than latent states, thus precluding the ability to distinguish between state noise and observation noise (Ataman, Mela and Van Heerde, 2008; Leeflang et al., 2009). In the scenarios we consider, advertising is a planned perturbation of the market. This generally makes it easier to obtain plausible causal inferences than in genuinely observational studies in which the experimenter had no control about treatment (see discussions in Berndt, 1991; Brady, 2002; Hitchcock, 2004; Robinson, McNulty and Krasno, 2009; Winship and Morgan, 1999; Camillo and d’Attoma, 2010; Antonakis et al., 2010; Lewis and Reiley, 2011; Lewis, Rao and Reiley, 2011; Kleinberg and Hripcsak, 2011; Vaver and Koehler, 2011). The principal problem in observational studies is endogeneity: the possibility that the observed outcome might not be the result of the treatment but of other omitted, endogenous variables. In principle, propensity scores can be used to correct for the selection bias that arises when the treatment effect is correlated with the likelihood of be... |

300 | Using the longitudinal structure of earnings to estimate the effect of training programs
- Ashenfelter, Card
- 1985
(Show Context)
Citation Context ...roach to causal inference in such settings is based on a linear model of the observed outcomes in the treatment and control group before and after the intervention. Such a model estimates the difference between (i) the pre-post difference in the treatment group and (ii) the pre-post difference in the control group. The assumption underlying such differencein-differences (DD) designs is that the level of the control group provides an adequate proxy for the level that would have been observed in the treatment group in the absence of treatment (see Lester, 1946; Campbell, Stanley and Gage, 1963; Ashenfelter and Card, 1985; Card and Krueger, 1993; Angrist and Krueger, 1999; Athey and Imbens, 2002; Abadie, 2005; Meyer, 1995; Shadish, Cook and Campbell, 2002; Donald and Lang, 2007; Angrist and Pischke, 2008; Robinson, McNulty and Krasno, 2009; Antonakis et al., 2010). 4 K.H. BRODERSEN ET AL. post-intervention period pre-intervention period a c b Figure 1. Inferring impact through counterfactual predictions. (a) Simulated trajectory of a treated market (Y ) with an intervention beginning in January 2013. Two other markets (X1, X2) were not subject to the intervention and serve as synthetic controls. Inverting the ... |

233 | Approaches for Bayesian variable selection
- George, McCulloch
- 1997
(Show Context)
Citation Context ...rce of information for inferring the counterfactual is the available prior knowledge about the model parameters, as elicited, for example, by previous studies. We combine the three preceding sources of information using a statespace time-series model, where one component of state is a linear regression on the contemporaneous predictors. The framework of our model allows us to choose from among a large set of potential controls by placing a spikeBAYESIAN CAUSAL IMPACT ANALYSIS 3 and-slab prior on the set of regression coefficients, and by allowing the model to average over the set of controls (George and McCulloch, 1997). We then compute the posterior distribution of the counterfactual time series given the value of the target series in the pre-intervention period, along with the values of the controls in the post-intervention period. Subtracting the predicted from the observed response during the post-intervention period gives a semiparametric Bayesian posterior distribution for the causal effect (Figure 1). Related work. As with other domains, causal inference in marketing requires subtlety. Marketing data are often observational and rarely follow the ideal of a randomised design. They typically exhibit a l... |

214 | The simulation smoother for time series models - DeJong, Shephard - 1995 |

191 | Data augmentation and dynamic linear models - Fruhwirth-Schnatter - 1994 |

174 |
Counterfactuals and Causal Inference: Methods and Principles for Social Research. New York:
- Morgan, Winship
- 2007
(Show Context)
Citation Context ...lan, 2007; Leeflang et al., 2009; Stewart, 2009). Here, we focus on measuring the impact of a discrete marketing event, such as the release of a new product, the introduction of a new feature, or the beginning or end of an advertising campaign, with the aim of measuring the event’s impact on a response metric of interest (e.g., sales). The causal impact of a treatment is the difference between the observed value of the response and the (unobserved) value that would have been obtained under the alternative treatment, i.e., the ‘effect of treatment on the treated’ (Rubin, 1974; Hitchcock, 2004; Morgan and Winship, 2007; Rubin, 2007; Cox and Wermuth, 2001; Heckman and Vytlacil, 2007; Antonakis et al., 2010; Kleinberg and Hripcsak, 2011; Hoover, 2012; Claveau, 2012). In the present setting the response variable is a time series, so the causal effect of interest is the difference between the observed series and the series that would have been observed had the intervention not taken place. Broadly speaking, there are three sources of information available for inferring the counterfactual time series. The first is the time-series behaviour of the response itself, prior to the intervention. The second is the beha... |

169 |
On assessing prior distributions and Bayesian regression analysis with g-prior distributions.
- Zellner
- 1986
(Show Context)
Citation Context ....10) encodes our prior expectation about the value of each element of β. In practice, we usually set b = 0. The prior parameters in equation (2.11) can be elicited by asking about the expected R2 ∈ [0, 1] as well as the number of observations worth of weight ν the prior estimate should be given. Then s = ν(1−R2)s2y. The final prior parameter in (2.10) is Σ−1 which, up to a scaling factor, is the prior precision over β in the full model, with all variables included. The total information in the data is XTX, and so 1nX TX is the average information in a single observation. Zellner’s g-prior (Zellner, 1986; Chipman et al., 2001; Liang et al., 2008) sets Σ−1 = gnX TX, so that g can be interpreted as g observations worth of information. Zellner’s prior becomes improper when XTX is not positive definite; we therefore ensure propriety by averaging XTX with its diagonal, (2.12) Σ−1 = g n { wXTX + (1− w) diag ( XTX )} with default values of g = 1 and w = 1/2. Overall, this prior specification provides a broadly useful default while providing considerable flexibility in those cases where more specific prior information is available. 12 K.H. BRODERSEN ET AL. 2.3. Inference. Posterior inference in our m... |

167 | Semiparametric Difference-in-Differences Estimators,”
- Abadie
- 2006
(Show Context)
Citation Context ...reatment and control group before and after the intervention. Such a model estimates the difference between (i) the pre-post difference in the treatment group and (ii) the pre-post difference in the control group. The assumption underlying such differencein-differences (DD) designs is that the level of the control group provides an adequate proxy for the level that would have been observed in the treatment group in the absence of treatment (see Lester, 1946; Campbell, Stanley and Gage, 1963; Ashenfelter and Card, 1985; Card and Krueger, 1993; Angrist and Krueger, 1999; Athey and Imbens, 2002; Abadie, 2005; Meyer, 1995; Shadish, Cook and Campbell, 2002; Donald and Lang, 2007; Angrist and Pischke, 2008; Robinson, McNulty and Krasno, 2009; Antonakis et al., 2010). 4 K.H. BRODERSEN ET AL. post-intervention period pre-intervention period a c b Figure 1. Inferring impact through counterfactual predictions. (a) Simulated trajectory of a treated market (Y ) with an intervention beginning in January 2013. Two other markets (X1, X2) were not subject to the intervention and serve as synthetic controls. Inverting the state-space model described in the main text yields a prediction of what would have happe... |

156 | Heckman (2007): “Econometric Evaluation of Social Programs
- Abbring, J
(Show Context)
Citation Context ...us on measuring the impact of a discrete marketing event, such as the release of a new product, the introduction of a new feature, or the beginning or end of an advertising campaign, with the aim of measuring the event’s impact on a response metric of interest (e.g., sales). The causal impact of a treatment is the difference between the observed value of the response and the (unobserved) value that would have been obtained under the alternative treatment, i.e., the ‘effect of treatment on the treated’ (Rubin, 1974; Hitchcock, 2004; Morgan and Winship, 2007; Rubin, 2007; Cox and Wermuth, 2001; Heckman and Vytlacil, 2007; Antonakis et al., 2010; Kleinberg and Hripcsak, 2011; Hoover, 2012; Claveau, 2012). In the present setting the response variable is a time series, so the causal effect of interest is the difference between the observed series and the series that would have been observed had the intervention not taken place. Broadly speaking, there are three sources of information available for inferring the counterfactual time series. The first is the time-series behaviour of the response itself, prior to the intervention. The second is the behaviour of other time series that were predictive of the target se... |

150 |
Inference with difference-in-differences and other panel data.
- Donald, Lang
- 2007
(Show Context)
Citation Context ... Such a model estimates the difference between (i) the pre-post difference in the treatment group and (ii) the pre-post difference in the control group. The assumption underlying such differencein-differences (DD) designs is that the level of the control group provides an adequate proxy for the level that would have been observed in the treatment group in the absence of treatment (see Lester, 1946; Campbell, Stanley and Gage, 1963; Ashenfelter and Card, 1985; Card and Krueger, 1993; Angrist and Krueger, 1999; Athey and Imbens, 2002; Abadie, 2005; Meyer, 1995; Shadish, Cook and Campbell, 2002; Donald and Lang, 2007; Angrist and Pischke, 2008; Robinson, McNulty and Krasno, 2009; Antonakis et al., 2010). 4 K.H. BRODERSEN ET AL. post-intervention period pre-intervention period a c b Figure 1. Inferring impact through counterfactual predictions. (a) Simulated trajectory of a treated market (Y ) with an intervention beginning in January 2013. Two other markets (X1, X2) were not subject to the intervention and serve as synthetic controls. Inverting the state-space model described in the main text yields a prediction of what would have happened in Y had the intervention not taken place (posterior predictive ex... |

133 | A simple and efficient simulation smoother for state space time series analysis
- Durbin, Koopman
- 2002
(Show Context)
Citation Context ...y y. Third, we use the posterior predictive samples to compute the posterior distribution of the pointwise impact yt− yt for each t. We use the same samples to obtain the posterior distribution of cumulative impact. Posterior simulation. We use a Gibbs sampler to simulate a sequence (θ, z)(1), (θ, z)(2), . . . from a Markov chain whose stationary distribution is p(θ, z|y). The sampler alternates between: a data-augmentation step that simulates from p(z|y, θ); and a parameter-simulation step that simulates from p(θ|y, z). The data-augmentation step uses the posterior simulation algorithm from Durbin and Koopman (2002), providing an improvement over the earlier forward-filtering, backward-sampling algorithms by Carter and Kohn (1994), Fruhwirth-Schnatter (1994), and de Jong and Shepard (1995). In brief, because p(y, z|θ) is jointly multivariate normal, the variance of p(z|y, θ) does not depend on y. We can therefore simulate (y∗, z∗) ∼ p(y, z|θ) and subtract E(z∗|y∗, θ) to obtain zero-mean noise with the correct variance. Adding E(z|y, θ) restores the correct mean, which completes the draw. The required expectations can be computed using the Kalman filter and a fast mean smoother described in detail by Dur... |

132 | The practical implementation of bayesian model selection (with discussion). Model Selection
- Chip, George, et al.
- 2001
(Show Context)
Citation Context ...r prior expectation about the value of each element of β. In practice, we usually set b = 0. The prior parameters in equation (2.11) can be elicited by asking about the expected R2 ∈ [0, 1] as well as the number of observations worth of weight ν the prior estimate should be given. Then s = ν(1−R2)s2y. The final prior parameter in (2.10) is Σ−1 which, up to a scaling factor, is the prior precision over β in the full model, with all variables included. The total information in the data is XTX, and so 1nX TX is the average information in a single observation. Zellner’s g-prior (Zellner, 1986; Chipman et al., 2001; Liang et al., 2008) sets Σ−1 = gnX TX, so that g can be interpreted as g observations worth of information. Zellner’s prior becomes improper when XTX is not positive definite; we therefore ensure propriety by averaging XTX with its diagonal, (2.12) Σ−1 = g n { wXTX + (1− w) diag ( XTX )} with default values of g = 1 and w = 1/2. Overall, this prior specification provides a broadly useful default while providing considerable flexibility in those cases where more specific prior information is available. 12 K.H. BRODERSEN ET AL. 2.3. Inference. Posterior inference in our model can be broken dow... |

131 |
The estimation of causal effects from observational data
- Winship, Morgan
- 1999
(Show Context)
Citation Context ...ng-average (MA) models. These models define autocorrelation among observations rather than latent states, thus precluding the ability to distinguish between state noise and observation noise (Ataman, Mela and Van Heerde, 2008; Leeflang et al., 2009). In the scenarios we consider, advertising is a planned perturbation of the market. This generally makes it easier to obtain plausible causal inferences than in genuinely observational studies in which the experimenter had no control about treatment (see discussions in Berndt, 1991; Brady, 2002; Hitchcock, 2004; Robinson, McNulty and Krasno, 2009; Winship and Morgan, 1999; Camillo and d’Attoma, 2010; Antonakis et al., 2010; Lewis and Reiley, 2011; Lewis, Rao and Reiley, 2011; Kleinberg and Hripcsak, 2011; Vaver and Koehler, 2011). The principal problem in observational studies is endogeneity: the possibility that the observed outcome might not be the result of the treatment but of other omitted, endogenous variables. In principle, propensity scores can be used to correct for the selection bias that arises when the treatment effect is correlated with the likelihood of being treated (Rubin and Waterman, 2006; Chan et al., 2010). However, the propensityscore appr... |

87 | Mixtures of g-priors for Bayesian variable selection
- Liang, Paulo, et al.
(Show Context)
Citation Context ...out the value of each element of β. In practice, we usually set b = 0. The prior parameters in equation (2.11) can be elicited by asking about the expected R2 ∈ [0, 1] as well as the number of observations worth of weight ν the prior estimate should be given. Then s = ν(1−R2)s2y. The final prior parameter in (2.10) is Σ−1 which, up to a scaling factor, is the prior precision over β in the full model, with all variables included. The total information in the data is XTX, and so 1nX TX is the average information in a single observation. Zellner’s g-prior (Zellner, 1986; Chipman et al., 2001; Liang et al., 2008) sets Σ−1 = gnX TX, so that g can be interpreted as g observations worth of information. Zellner’s prior becomes improper when XTX is not positive definite; we therefore ensure propriety by averaging XTX with its diagonal, (2.12) Σ−1 = g n { wXTX + (1− w) diag ( XTX )} with default values of g = 1 and w = 1/2. Overall, this prior specification provides a broadly useful default while providing considerable flexibility in those cases where more specific prior information is available. 12 K.H. BRODERSEN ET AL. 2.3. Inference. Posterior inference in our model can be broken down into three pieces. ... |

60 | Bayes and empirical-Bayes multiplicity adjustment in the variable-selection problem.
- Scott, Berger
- 2010
(Show Context)
Citation Context ... J∏ j=1 π %j j (1− πj) 1−%j , where πj is the prior probability of regressor j being included in the model. Values for πj can be elicited by asking about the expected model size M , and then setting all πj = M/J . An alternative is to use a more specific set of values πj . In particular, one might choose to set certain πj to either 1 or 0 to force the corresponding variables into or out of the model. Generally, framing the prior in terms of expected model size has the advantage that the model can adapt to growing numbers of predictor variables without having to switch to a hierarchical prior (Scott and Berger, 2010). For the ‘slab’ portion of the prior we use a conjugate normal-inverse Gamma distribution, β%|σ2 ∼ N ( b%, σ 2 (Σ −1 % ) −1 ) (2.10) 1 σ2 ∼ G ( ν 2 , s 2 ) .(2.11) The vector b in equation (2.10) encodes our prior expectation about the value of each element of β. In practice, we usually set b = 0. The prior parameters in equation (2.11) can be elicited by asking about the expected R2 ∈ [0, 1] as well as the number of observations worth of weight ν the prior estimate should be given. Then s = ν(1−R2)s2y. The final prior parameter in (2.10) is Σ−1 which, up to a scaling factor, is the ... |

55 | Asymptotic properties of a robust variance matrix estimator for panel data when T is large. Journal of Econometrics. forthcoming
- Hansen
- 2005
(Show Context)
Citation Context ...% credible interval of the cumulative impact crosses the zeroline about five months after the intervention, at which point we would no longer declare a significant overall effect. BAYESIAN CAUSAL IMPACT ANALYSIS 5 Despite their utility in economics and the social sciences, DD designs have been limited in three ways. First, DD is traditionally based on a static regression model that assumes i.i.d. data, even though the design has a temporal component. When fit to serially correlated data, static models yield overoptimistic inferences with too narrow uncertainty intervals (see also Solon, 1984; Hansen, 2007a,b; Bertrand, Duflo and Mullainathan, 2002). Second, most DD analyses only consider two time points: before and after the intervention. In practice, the manner in which an effect evolves over time, especially its onset and decay structure, is often a key question. Third, selection of synthetic controls is often arbitrary (Robinson, McNulty and Krasno, 2009), bearing the risk of selecting controls that fail to provide an adequate explanation of contemporaneous changes in the environment— which, as a result, might be erroneously attributed to the intervention. The limitations of DD schemes can ... |

41 |
On making causal claims: A review and recommendations.
- Antonakis, Bendahan, et al.
- 2010
(Show Context)
Citation Context ...f a discrete marketing event, such as the release of a new product, the introduction of a new feature, or the beginning or end of an advertising campaign, with the aim of measuring the event’s impact on a response metric of interest (e.g., sales). The causal impact of a treatment is the difference between the observed value of the response and the (unobserved) value that would have been obtained under the alternative treatment, i.e., the ‘effect of treatment on the treated’ (Rubin, 1974; Hitchcock, 2004; Morgan and Winship, 2007; Rubin, 2007; Cox and Wermuth, 2001; Heckman and Vytlacil, 2007; Antonakis et al., 2010; Kleinberg and Hripcsak, 2011; Hoover, 2012; Claveau, 2012). In the present setting the response variable is a time series, so the causal effect of interest is the difference between the observed series and the series that would have been observed had the intervention not taken place. Broadly speaking, there are three sources of information available for inferring the counterfactual time series. The first is the time-series behaviour of the response itself, prior to the intervention. The second is the behaviour of other time series that were predictive of the target series prior to the interv... |

36 |
Generalized least squares inference in panel and multilevel models with serial correlation and fixed effects.
- Hansen
- 2007
(Show Context)
Citation Context ...% credible interval of the cumulative impact crosses the zeroline about five months after the intervention, at which point we would no longer declare a significant overall effect. BAYESIAN CAUSAL IMPACT ANALYSIS 5 Despite their utility in economics and the social sciences, DD designs have been limited in three ways. First, DD is traditionally based on a static regression model that assumes i.i.d. data, even though the design has a temporal component. When fit to serially correlated data, static models yield overoptimistic inferences with too narrow uncertainty intervals (see also Solon, 1984; Hansen, 2007a,b; Bertrand, Duflo and Mullainathan, 2002). Second, most DD analyses only consider two time points: before and after the intervention. In practice, the manner in which an effect evolves over time, especially its onset and decay structure, is often a key question. Third, selection of synthetic controls is often arbitrary (Robinson, McNulty and Krasno, 2009), bearing the risk of selecting controls that fail to provide an adequate explanation of contemporaneous changes in the environment— which, as a result, might be erroneously attributed to the intervention. The limitations of DD schemes can ... |

23 |
Shortcomings of marginal analysis for wage-employment problems.
- Lester
- 1946
(Show Context)
Citation Context ...derable heterogeneity among them. A standard approach to causal inference in such settings is based on a linear model of the observed outcomes in the treatment and control group before and after the intervention. Such a model estimates the difference between (i) the pre-post difference in the treatment group and (ii) the pre-post difference in the control group. The assumption underlying such differencein-differences (DD) designs is that the level of the control group provides an adequate proxy for the level that would have been observed in the treatment group in the absence of treatment (see Lester, 1946; Campbell, Stanley and Gage, 1963; Ashenfelter and Card, 1985; Card and Krueger, 1993; Angrist and Krueger, 1999; Athey and Imbens, 2002; Abadie, 2005; Meyer, 1995; Shadish, Cook and Campbell, 2002; Donald and Lang, 2007; Angrist and Pischke, 2008; Robinson, McNulty and Krasno, 2009; Antonakis et al., 2010). 4 K.H. BRODERSEN ET AL. post-intervention period pre-intervention period a c b Figure 1. Inferring impact through counterfactual predictions. (a) Simulated trajectory of a treated market (Y ) with an intervention beginning in January 2013. Two other markets (X1, X2) were not subject to th... |

22 |
Data augmentation for support vector machines.
- Polson, Scott
- 2011
(Show Context)
Citation Context ...s2y = ∑ t(yt − y)2/(n − 1) is the sample variance of the target series. Scaling by the sample variance is a minor violation of the Bayesian paradigm, but it is an effective means of choosing a reasonable scale for the prior. It is similar to the popular technique of scaling the data prior to analysis, but we prefer to do the scaling in the prior so we can model the data on its original scale. When faced with many potential controls, we prefer letting the model choose an appropriate set. This can be achieved by placing a spike-andslab prior over coefficients (George and McCulloch, 1993, 1997; Polson and Scott, 2011; Scott and Varian, 2013). A spike-and-slab prior combines point mass at zero (the ‘spike’), for an unknown subset of zero coefficients, with a weakly informative distribution on the complementary set of non-zero coefficients (the ‘slab’). Contrary to what its name might suggest, the ‘slab’ is usually not completely flat, but rather a Gaussian with a large variance. Let % = (%1, . . . , %J), where %j = 1 if βj 6= 0 and %j = 0 otherwise. Let β% denote the non-zero elements of the vector β and let Σ−1% denote the rows and columns of Σ−1 corresponding to non-zero entries in %. We can then factori... |

15 |
Models of Causal Inference: Going Beyond the Neyman– Rubin–Holland Theory,
- Brady
- 2002
(Show Context)
Citation Context ...roach discussed in this paper include autoregressive (AR) and moving-average (MA) models. These models define autocorrelation among observations rather than latent states, thus precluding the ability to distinguish between state noise and observation noise (Ataman, Mela and Van Heerde, 2008; Leeflang et al., 2009). In the scenarios we consider, advertising is a planned perturbation of the market. This generally makes it easier to obtain plausible causal inferences than in genuinely observational studies in which the experimenter had no control about treatment (see discussions in Berndt, 1991; Brady, 2002; Hitchcock, 2004; Robinson, McNulty and Krasno, 2009; Winship and Morgan, 1999; Camillo and d’Attoma, 2010; Antonakis et al., 2010; Lewis and Reiley, 2011; Lewis, Rao and Reiley, 2011; Kleinberg and Hripcsak, 2011; Vaver and Koehler, 2011). The principal problem in observational studies is endogeneity: the possibility that the observed outcome might not be the result of the treatment but of other omitted, endogenous variables. In principle, propensity scores can be used to correct for the selection bias that arises when the treatment effect is correlated with the likelihood of being treated (... |

15 |
A review of causal inference for biomedical informatics.
- Kleinberg, Hripcsak
- 2011
(Show Context)
Citation Context ...vent, such as the release of a new product, the introduction of a new feature, or the beginning or end of an advertising campaign, with the aim of measuring the event’s impact on a response metric of interest (e.g., sales). The causal impact of a treatment is the difference between the observed value of the response and the (unobserved) value that would have been obtained under the alternative treatment, i.e., the ‘effect of treatment on the treated’ (Rubin, 1974; Hitchcock, 2004; Morgan and Winship, 2007; Rubin, 2007; Cox and Wermuth, 2001; Heckman and Vytlacil, 2007; Antonakis et al., 2010; Kleinberg and Hripcsak, 2011; Hoover, 2012; Claveau, 2012). In the present setting the response variable is a time series, so the causal effect of interest is the difference between the observed series and the series that would have been observed had the intervention not taken place. Broadly speaking, there are three sources of information available for inferring the counterfactual time series. The first is the time-series behaviour of the response itself, prior to the intervention. The second is the behaviour of other time series that were predictive of the target series prior to the intervention. Examples of such contr... |

15 | Here, there, and everywhere: correlated online behaviors can lead to overestimates of the effects of advertising. - Lewis, Rao, et al. - 2011 |

14 | Evaluating online ad campaigns in a pipeline: Causal models at scale.
- Chan, Ge, et al.
- 2010
(Show Context)
Citation Context ...tervention period gives a semiparametric Bayesian posterior distribution for the causal effect (Figure 1). Related work. As with other domains, causal inference in marketing requires subtlety. Marketing data are often observational and rarely follow the ideal of a randomised design. They typically exhibit a low signal-to-noise ratio. They are subject to multiple seasonal variations, and they are often confounded by the effects of unobserved variables and their interactions (for recent examples, see Seggie, Cavusgil and Phelan, 2007; Stewart, 2009; Leeflang et al., 2009; Takada and Bass, 1998; Chan et al., 2010; Lewis and Reiley, 2011; Lewis, Rao and Reiley, 2011; Vaver and Koehler, 2011, 2012). Rigorous causal inferences can be obtained through randomised experiments, often implemented in the form of geo experiments (Vaver and Koehler, 2011, 2012). Many market interventions, however, fail to satisfy the requirements of such approaches. For instance, advertising campaigns are frequently launched across multiple channels, online and offline, which precludes measurement of individual exposure. Campaigns are often targeted at an entire country, and one country only, which prohibits the use of geographi... |

14 |
A bayesian foundation for individual learning under uncertainty. Front Hum Neurosci
- Mathys, Daunizeau, et al.
- 2011
(Show Context)
Citation Context ...onsequence of this is that we can reuse the samples from the posterior to obtain credible intervals for all summary statistics of interest. Such statistics include, for example, the average absolute and relative effect caused by the intervention as well as its cumulative effect. Posterior inference was implemented in C++ and R and, for all empirical datasets presented in Section 4, took less than 30 seconds on a standard Linux machine. If the computational burden of sampling-based inference ever became prohibitive, one option would be to replace it by a variational Bayesian approximation (see Mathys et al., 2011; Brodersen et al., 2013, for examples). Another way of using the proposed model is for power analyses. In particular, given past time series of market activity, we can define a point in the past to represent a hypothetical intervention and apply the model in the usual fashion. As a result, we obtain a measure of uncertainty about the response in the treated market after the beginning of the hypothetical intervention. This provides an estimate of what incremental effect would have been required to be outside of the 95% central interval of what would have happened in the absence of treatment. B... |

13 |
Estimating autocorrelations in fixed-effects models.
- Solon
- 1984
(Show Context)
Citation Context ... Here, the 95% credible interval of the cumulative impact crosses the zeroline about five months after the intervention, at which point we would no longer declare a significant overall effect. BAYESIAN CAUSAL IMPACT ANALYSIS 5 Despite their utility in economics and the social sciences, DD designs have been limited in three ways. First, DD is traditionally based on a static regression model that assumes i.i.d. data, even though the design has a temporal component. When fit to serially correlated data, static models yield overoptimistic inferences with too narrow uncertainty intervals (see also Solon, 1984; Hansen, 2007a,b; Bertrand, Duflo and Mullainathan, 2002). Second, most DD analyses only consider two time points: before and after the intervention. In practice, the manner in which an effect evolves over time, especially its onset and decay structure, is often a key question. Third, selection of synthetic controls is often arbitrary (Robinson, McNulty and Krasno, 2009), bearing the risk of selecting controls that fail to provide an adequate explanation of contemporaneous changes in the environment— which, as a result, might be erroneously attributed to the intervention. The limitations of D... |

11 | Measurement of return on marketing investment: a conceptual framework and the future of marketing metrics”, - Seggie, Cavusgil, et al. - 2007 |

10 |
2004a]: ‘Do All and Only Causes Raise the Probabilities of Effects
- Hitchcock
(Show Context)
Citation Context ... Cavusgil and Phelan, 2007; Leeflang et al., 2009; Stewart, 2009). Here, we focus on measuring the impact of a discrete marketing event, such as the release of a new product, the introduction of a new feature, or the beginning or end of an advertising campaign, with the aim of measuring the event’s impact on a response metric of interest (e.g., sales). The causal impact of a treatment is the difference between the observed value of the response and the (unobserved) value that would have been obtained under the alternative treatment, i.e., the ‘effect of treatment on the treated’ (Rubin, 1974; Hitchcock, 2004; Morgan and Winship, 2007; Rubin, 2007; Cox and Wermuth, 2001; Heckman and Vytlacil, 2007; Antonakis et al., 2010; Kleinberg and Hripcsak, 2011; Hoover, 2012; Claveau, 2012). In the present setting the response variable is a time series, so the causal effect of interest is the difference between the observed series and the series that would have been observed had the intervention not taken place. Broadly speaking, there are three sources of information available for inferring the counterfactual time series. The first is the time-series behaviour of the response itself, prior to the interventi... |

10 | Estimating the Causal Effects of Marketing Interventions Using Propensity Score Methodology.
- Rubin, Waterman
- 2006
(Show Context)
Citation Context ...; Hitchcock, 2004; Robinson, McNulty and Krasno, 2009; Winship and Morgan, 1999; Camillo and d’Attoma, 2010; Antonakis et al., 2010; Lewis and Reiley, 2011; Lewis, Rao and Reiley, 2011; Kleinberg and Hripcsak, 2011; Vaver and Koehler, 2011). The principal problem in observational studies is endogeneity: the possibility that the observed outcome might not be the result of the treatment but of other omitted, endogenous variables. In principle, propensity scores can be used to correct for the selection bias that arises when the treatment effect is correlated with the likelihood of being treated (Rubin and Waterman, 2006; Chan et al., 2010). However, the propensityscore approach requires that exposure can be measured at the individual level, and it, too, does not guarantee valid inferences, for example in the presence of a specific type of selection bias recently termed ‘activity bias’ (Lewis, Rao and Reiley, 2011). Counterfactual modelling approaches avoid these issues when it can be assumed that the treatment market was chosen at random. 26 K.H. BRODERSEN ET AL. Overall, inferring the causal impact of a designed market intervention may play an increasingly prominent role in providing quantitative accounts o... |

8 |
Determining the Optimal Return on investment for an Advertising Campaign.
- Danaher, Rust
- 1996
(Show Context)
Citation Context ...ing the counterfactual. Inferring the impact of market interventions is an important and timely problem. Partly because of recent interest in ‘big data,’ many firms have Keywords and phrases: causal inference, counterfactual forecasting, observational, intervention, advertising, marketing, econometrics 1 2 K.H. BRODERSEN ET AL. begun to understand that a competitive advantage can be had by systematically using impact measures to inform strategic decision making. An example is the use of ‘A/B experiments’ to identify the most effective market treatments for the purpose of allocating resources (Danaher and Rust, 1996; Seggie, Cavusgil and Phelan, 2007; Leeflang et al., 2009; Stewart, 2009). Here, we focus on measuring the impact of a discrete marketing event, such as the release of a new product, the introduction of a new feature, or the beginning or end of an advertising campaign, with the aim of measuring the event’s impact on a response metric of interest (e.g., sales). The causal impact of a treatment is the difference between the observed value of the response and the (unobserved) value that would have been obtained under the alternative treatment, i.e., the ‘effect of treatment on the treated’ (Rubi... |

8 | Predicting the Present with Bayesian Structural Time Series.
- Scott, Varian
- 2013
(Show Context)
Citation Context ...hod for obtaining accurate counterfactual predictions since they account for variance components that are shared by the series, including in particular the effects of other unobserved causes otherwise unaccounted for by the model. A natural way of including control series in the model is through a linear regression whose coefficients can be static or time-varying. Static regression coefficients involve significantly less computational burden, so we can handle more of them. If the number of control series J is very large, we can impose sparsity using a spike-and-slab prior on the coefficients (Scott and Varian, 2013). A static regression can be written in state-space form by setting Zt = β Txt and setting zt = 1 with Tt = 1 and Qt = 0. One advantage of working in a fully Bayesian treatment is that we do not need to commit to a fixed set of covariates. The spike-and-slab prior described in Section 2.2 allows us to integrate out our posterior uncertainty about which covariates to include and how strongly they should influence our predictions. All covariates are assumed to be contemporaneous; the present model does not infer on a potential lag between treated and untreated time series. A known lag, however, ... |

6 | Rao-Blackwellization for Bayesian variable selection and model averaging in linear and binary regression: A novel data augmentation approach.
- Ghosh, Clyde
- 2011
(Show Context)
Citation Context ... % b% − βT% V −1% β%. To sample from (2.13), we use a Gibbs sampler that draws each %j given all other %−j . Each full-conditional is easy to evaluate because %j can only assume two possible values. It should be noted that the dimension of all matrices in (2.13) is ∑ j %j , which is small if the model is truly sparse. There are many matrices to manipulate, but because each is small the overall algorithm is fast. Once the draw of % is complete, we sample directly from p(β, 1/σ2 |%, y) using standard conjugate formulae. For an alternative that may be even more computationally efficient, see Ghosh and Clyde (2011). Posterior predictive simulation. While the posterior over model parameters and states p(θ, z|y) can be of interest in its own right, causal impact analyses are primarily concerned with the posterior incremental effect, p (yn+1, . . . , ym |y1, . . . , yn, x1, . . . , xm).(2.14) As shown by its indices, the density in equation (2.14) is defined precisely for that portion of the time series which we do not observe: the counterfactual market response yn+1, . . . , ym that would have been observed in the treatment market, after the intervention, in the absence of treatment. It is also worth ... |

6 | Observing the Counterfactual? The Search for Political Experiments in - Robinson, McNulty, et al. - 2009 |

6 |
Marketing accountability: Linking marketing actions to financial results.
- Stewart
- 2009
(Show Context)
Citation Context ...nt and timely problem. Partly because of recent interest in ‘big data,’ many firms have Keywords and phrases: causal inference, counterfactual forecasting, observational, intervention, advertising, marketing, econometrics 1 2 K.H. BRODERSEN ET AL. begun to understand that a competitive advantage can be had by systematically using impact measures to inform strategic decision making. An example is the use of ‘A/B experiments’ to identify the most effective market treatments for the purpose of allocating resources (Danaher and Rust, 1996; Seggie, Cavusgil and Phelan, 2007; Leeflang et al., 2009; Stewart, 2009). Here, we focus on measuring the impact of a discrete marketing event, such as the release of a new product, the introduction of a new feature, or the beginning or end of an advertising campaign, with the aim of measuring the event’s impact on a response metric of interest (e.g., sales). The causal impact of a treatment is the difference between the observed value of the response and the (unobserved) value that would have been obtained under the alternative treatment, i.e., the ‘effect of treatment on the treated’ (Rubin, 1974; Hitchcock, 2004; Morgan and Winship, 2007; Rubin, 2007; Cox and W... |

5 | A new data mining approach to estimate causal effects of policy interventions. - Camillo, D’Attoma - 2010 |

4 | Measuring Ad Effectiveness Using Geo Experiments
- Vaver, Koehler
- 2011
(Show Context)
Citation Context ... for the causal effect (Figure 1). Related work. As with other domains, causal inference in marketing requires subtlety. Marketing data are often observational and rarely follow the ideal of a randomised design. They typically exhibit a low signal-to-noise ratio. They are subject to multiple seasonal variations, and they are often confounded by the effects of unobserved variables and their interactions (for recent examples, see Seggie, Cavusgil and Phelan, 2007; Stewart, 2009; Leeflang et al., 2009; Takada and Bass, 1998; Chan et al., 2010; Lewis and Reiley, 2011; Lewis, Rao and Reiley, 2011; Vaver and Koehler, 2011, 2012). Rigorous causal inferences can be obtained through randomised experiments, often implemented in the form of geo experiments (Vaver and Koehler, 2011, 2012). Many market interventions, however, fail to satisfy the requirements of such approaches. For instance, advertising campaigns are frequently launched across multiple channels, online and offline, which precludes measurement of individual exposure. Campaigns are often targeted at an entire country, and one country only, which prohibits the use of geographic controls within that country. Likewise, a campaign might be launched in seve... |

3 | Modeling Internet firm survival using Bayesian dynamic models with time-varying coefficients. - Banerjee, Kauffman, et al. - 2007 |

3 | Variational Bayesian mixed-effects inference for classification studies.
- Brodersen, Daunizeau, et al.
- 2013
(Show Context)
Citation Context ... that we can reuse the samples from the posterior to obtain credible intervals for all summary statistics of interest. Such statistics include, for example, the average absolute and relative effect caused by the intervention as well as its cumulative effect. Posterior inference was implemented in C++ and R and, for all empirical datasets presented in Section 4, took less than 30 seconds on a standard Linux machine. If the computational burden of sampling-based inference ever became prohibitive, one option would be to replace it by a variational Bayesian approximation (see Mathys et al., 2011; Brodersen et al., 2013, for examples). Another way of using the proposed model is for power analyses. In particular, given past time series of market activity, we can define a point in the past to represent a hypothetical intervention and apply the model in the usual fashion. As a result, we obtain a measure of uncertainty about the response in the treated market after the beginning of the hypothetical intervention. This provides an estimate of what incremental effect would have been required to be outside of the 95% central interval of what would have happened in the absence of treatment. BAYESIAN CAUSAL IMPACT AN... |

3 |
Statistical Inference for Causal Effects,
- Rubin
- 2007
(Show Context)
Citation Context ..., 2009; Stewart, 2009). Here, we focus on measuring the impact of a discrete marketing event, such as the release of a new product, the introduction of a new feature, or the beginning or end of an advertising campaign, with the aim of measuring the event’s impact on a response metric of interest (e.g., sales). The causal impact of a treatment is the difference between the observed value of the response and the (unobserved) value that would have been obtained under the alternative treatment, i.e., the ‘effect of treatment on the treated’ (Rubin, 1974; Hitchcock, 2004; Morgan and Winship, 2007; Rubin, 2007; Cox and Wermuth, 2001; Heckman and Vytlacil, 2007; Antonakis et al., 2010; Kleinberg and Hripcsak, 2011; Hoover, 2012; Claveau, 2012). In the present setting the response variable is a time series, so the causal effect of interest is the difference between the observed series and the series that would have been observed had the intervention not taken place. Broadly speaking, there are three sources of information available for inferring the counterfactual time series. The first is the time-series behaviour of the response itself, prior to the intervention. The second is the behaviour of othe... |

2 | Building brands. - Ataman, Mela, et al. - 2008 |

2 |
The RussoWilliamson Theses in the social sciences: Causal inference drawing on two types of evidence.
- Claveau
- 2012
(Show Context)
Citation Context ...the introduction of a new feature, or the beginning or end of an advertising campaign, with the aim of measuring the event’s impact on a response metric of interest (e.g., sales). The causal impact of a treatment is the difference between the observed value of the response and the (unobserved) value that would have been obtained under the alternative treatment, i.e., the ‘effect of treatment on the treated’ (Rubin, 1974; Hitchcock, 2004; Morgan and Winship, 2007; Rubin, 2007; Cox and Wermuth, 2001; Heckman and Vytlacil, 2007; Antonakis et al., 2010; Kleinberg and Hripcsak, 2011; Hoover, 2012; Claveau, 2012). In the present setting the response variable is a time series, so the causal effect of interest is the difference between the observed series and the series that would have been observed had the intervention not taken place. Broadly speaking, there are three sources of information available for inferring the counterfactual time series. The first is the time-series behaviour of the response itself, prior to the intervention. The second is the behaviour of other time series that were predictive of the target series prior to the intervention. Examples of such control series are the same product... |

2 |
Does Retail Advertising Work?
- Lewis, Reiley
- 2011
(Show Context)
Citation Context ...ives a semiparametric Bayesian posterior distribution for the causal effect (Figure 1). Related work. As with other domains, causal inference in marketing requires subtlety. Marketing data are often observational and rarely follow the ideal of a randomised design. They typically exhibit a low signal-to-noise ratio. They are subject to multiple seasonal variations, and they are often confounded by the effects of unobserved variables and their interactions (for recent examples, see Seggie, Cavusgil and Phelan, 2007; Stewart, 2009; Leeflang et al., 2009; Takada and Bass, 1998; Chan et al., 2010; Lewis and Reiley, 2011; Lewis, Rao and Reiley, 2011; Vaver and Koehler, 2011, 2012). Rigorous causal inferences can be obtained through randomised experiments, often implemented in the form of geo experiments (Vaver and Koehler, 2011, 2012). Many market interventions, however, fail to satisfy the requirements of such approaches. For instance, advertising campaigns are frequently launched across multiple channels, online and offline, which precludes measurement of individual exposure. Campaigns are often targeted at an entire country, and one country only, which prohibits the use of geographic controls within that c... |

2 |
Multiple Time Series Analysis of Competitive Marketing Behavior.
- Takada, Bass
- 1998
(Show Context)
Citation Context ...onse during the post-intervention period gives a semiparametric Bayesian posterior distribution for the causal effect (Figure 1). Related work. As with other domains, causal inference in marketing requires subtlety. Marketing data are often observational and rarely follow the ideal of a randomised design. They typically exhibit a low signal-to-noise ratio. They are subject to multiple seasonal variations, and they are often confounded by the effects of unobserved variables and their interactions (for recent examples, see Seggie, Cavusgil and Phelan, 2007; Stewart, 2009; Leeflang et al., 2009; Takada and Bass, 1998; Chan et al., 2010; Lewis and Reiley, 2011; Lewis, Rao and Reiley, 2011; Vaver and Koehler, 2011, 2012). Rigorous causal inferences can be obtained through randomised experiments, often implemented in the form of geo experiments (Vaver and Koehler, 2011, 2012). Many market interventions, however, fail to satisfy the requirements of such approaches. For instance, advertising campaigns are frequently launched across multiple channels, online and offline, which precludes measurement of individual exposure. Campaigns are often targeted at an entire country, and one country only, which prohibits t... |

1 |
Causal Inference and Statistical Fallacies.
- Cox, Wermuth
- 2001
(Show Context)
Citation Context ...rt, 2009). Here, we focus on measuring the impact of a discrete marketing event, such as the release of a new product, the introduction of a new feature, or the beginning or end of an advertising campaign, with the aim of measuring the event’s impact on a response metric of interest (e.g., sales). The causal impact of a treatment is the difference between the observed value of the response and the (unobserved) value that would have been obtained under the alternative treatment, i.e., the ‘effect of treatment on the treated’ (Rubin, 1974; Hitchcock, 2004; Morgan and Winship, 2007; Rubin, 2007; Cox and Wermuth, 2001; Heckman and Vytlacil, 2007; Antonakis et al., 2010; Kleinberg and Hripcsak, 2011; Hoover, 2012; Claveau, 2012). In the present setting the response variable is a time series, so the causal effect of interest is the difference between the observed series and the series that would have been observed had the intervention not taken place. Broadly speaking, there are three sources of information available for inferring the counterfactual time series. The first is the time-series behaviour of the response itself, prior to the intervention. The second is the behaviour of other time series that were... |

1 |
Economic Theory and Causal Inference.
- Hoover
- 2012
(Show Context)
Citation Context ... new product, the introduction of a new feature, or the beginning or end of an advertising campaign, with the aim of measuring the event’s impact on a response metric of interest (e.g., sales). The causal impact of a treatment is the difference between the observed value of the response and the (unobserved) value that would have been obtained under the alternative treatment, i.e., the ‘effect of treatment on the treated’ (Rubin, 1974; Hitchcock, 2004; Morgan and Winship, 2007; Rubin, 2007; Cox and Wermuth, 2001; Heckman and Vytlacil, 2007; Antonakis et al., 2010; Kleinberg and Hripcsak, 2011; Hoover, 2012; Claveau, 2012). In the present setting the response variable is a time series, so the causal effect of interest is the difference between the observed series and the series that would have been observed had the intervention not taken place. Broadly speaking, there are three sources of information available for inferring the counterfactual time series. The first is the time-series behaviour of the response itself, prior to the intervention. The second is the behaviour of other time series that were predictive of the target series prior to the intervention. Examples of such control series are ... |

1 | Creating lift versus 28 K.H. BRODERSEN ET AL. building the base: Current trends in marketing dynamics. - Leeflang, Bijmolt, et al. - 2009 |

1 | Periodic Measurement of Advertising Effectiveness Using Multiple-Test-Period Geo Experiments - Vaver, Koehler - 2012 |