Basic identity of causal inference
we can decompose the observed outcome of a treatment into two effects:
Outcome for treated − Outcome for untreated
= [Outcome for treated − Outcome for treated if not treated] + [Outcome for treated if not treated − Outcome for untreated]
=Impact of treatment on treated +selection bias.
The basic identity nicely shows why randomized trials are the gold standard for causal inference. If the treated group is a random sample of the population, then the first term is an estimate of the causal impact of the treatment on the population, and if the assignment is random, then the second term has an expected value of zero.
Furious Five methods of causal inference
- random assignment
- instrumental variables
- regression discontinuity
- differences in differences
In advertising, the advertiser can simply increase spend for a limited period, and we can compare
the outcome of that experiment to an estimate of the counterfactual—what would have happened during the limited period without that increase in spend.
where does the counterfactual come from? Answer: it is a predictive model developed using data from before the experiment was run. In the classic experiment design described earlier, we compare treated and untreated subjects. Here, we treat all of the subjects for a limited time and measure their aggregate response. Our counterfactual is a prediction of what would have happened
during the limited period of spend increase.
We have seen that causal inference involves comparing actual outcomes to counterfactual outcomes. The standard approach is typically a cross-section model to compare treated subjects to
untreated subjects. In this case, the counterfactual is a prediction of the outcome for those treated if they had not been treated, which is typically based on the outcome for the control group
(sometimes with an adjustment for other factors).
As the above example illustrates, one can also examine a single subject before, after, and during treatment. In this case, the counterfactual is the forecast of the outcome for the subject
constructed using data from before the experiment. To implement this approach, one would normally build a model using time series methods such as trend, seasonal effects, autocorrelation, persistence of treatment effect, and so on.
Train-test-treat-compare (TTTC) process
In building the predictive model, we can use standard machine learning tools such as cross-validation to tune parameters. Once we are satisfied with our model, we can apply it to a test set to determine how well it performs. We can then apply the model during the treatment period to predict the counterfactual and compare what actually happened to the treated to the prediction
of our model of what would have happened without the treatment. This train-test-treat-compare (TTTC) process is illustrated in Fig. 1.
Fig. 1. Hypothetical TTTC process. The model is estimated during the training
period and its predictive performance is assessed during the test period. The
extrapolation of the model during the treat period (red line) serves as a
counterfactual. This counterfactual is compared with the actual outcome (black
line), and the difference is the estimated treatment effect. When the treatment
is ended, the outcome returns to something close to the original level.
The observational methods often fail to produce
the same effects as the randomized experiments, even after conditioning on extensive demographic and behavioral variables. We also characterize the incremental explanatory power the data would require to enable observational methods to successfully measure advertising effects. The findings suggest that commonly used observational approaches based on the data usually available in the industry often fail to accurately measure the true effect of advertising