Why lockdowns both ‘work’ and ‘don’t work’
In the previous post, I outlined why the future predictions of Covid rates and of counter-acting policies were always going to be highly uncertain (i.e., inaccurate), particularly early in the pandemic. You might expect that now we are able to look back at what actually happened, we must surely have a much more certain idea of what worked or didn’t in terms of policy choice. Unfortunately, there is still a lot of uncertainty. This is why.
What do we need for causal inference?
First, a very basic summary of what we generally treat as causal evidence in this space. Ultimately, for any causal inference we are wanting to compare ‘what happened’ with the policy in place versus ‘what would have happened’ if the policy hadn’t been in place (what we call the counterfactual). In reality, we can only observe one of these scenarios in any given place at a given time, i.e., the policy is either enacted or not, we can never observe and measure effects of the counterfactual.
Every method of causal inference, then, is trying to approximate this ‘what would have happened’ scenario in some way. Without going into detail, randomisation is the best way to do this. Effectively, if done correctly and with a big enough sample size, it creates two groups that are, on average, identical in terms of everything else that could affect the outcome of interest – observed/measurable factors and unobserved/unmeasurable factors. So, when comparing at the aggregate level, we effectively have our measurable counterfactual. In a randomised experiment, we can then implement policy X for one of our groups and not the other and measure the effect.
In reality, of course, it’s not possible/feasible to randomise our populations for most policies, so instead we frequently use quasi-experiments. They use more sophisticated statistical analyses, and assumptions, to effectively try and approximate the randomisation scenario as closely as possible. But, again, what we need here ideally to get a causal effect is for only a single policy, policy X, to be implemented at a given time so that we can measure the causal effect of that policy alone.
In a controlled randomised experiment, we can also control the data we collect, e.g., specific survey questions we want to ask to capture what we think might be affected. For quasi-experiments, which look retrospectively at a policy that has been rolled out, we instead rely on data that is already collected, e.g., administrative datasets such as healthcare records or existing ongoing surveys that might contain relevant questions.
To sum up then, what we ideally want is randomisation (or to be able to construct this afterwards), a single policy being enacted (we really don’t have causal methods capable of handling multiple policies while being able to separate effects of each), and plenty of data on all of the relevant outcomes that we think might be positively/negatively affected by the policy.
What did we get for Covid policies?
Well, you can see where this is going. We had nowhere near randomisation; multiple policies rolled out in tandem or quick succession; policies rolled out in response to already high Covid rates creating bias; policies implemented mostly across a whole country, meaning we would have to compare with perhaps very different countries; and copying of neighbours’ policies, so not even a good number of comparison groups for ‘what would have happened’. Basically, a mess for causal inference.
We also had very limited outcomes that were comparable across countries. Basically, especially early on, this came down to measuring (i) Covid cases (with problems with access to testing, rates varying by country) and (ii) Covid deaths (with some issues with what was recorded as a Covid death also varying somewhat by country). Other outcomes, because unavailable, basically reduced in significance in the early debates, particularly measures of any potentially negative consequences some of these policies might have had, for example, on mental health, economic measures, or others. Ideally, we would want to get a big picture of all of these to see if the unintended costs are worth the intended results.
Of course, plenty of studies still tried to measure the effects. This is a vital question, after all, so we can’t just ignore it even if we can’t get a perfect causal inference. Some, for instance:
Used their Covid prediction models to construct a counterfactual – subject to all of the issues I discussed in my previous post, fundamentally becomes unpredictable, and there were major concerns from experts on what went into these models when trying to model a (very) complex system and new virus
Tried to focus on only a single policy and basically ignored the others that were introduced simultaneously and would bias results
Looked at various ‘bundles’ of policies together and then tried to say something about the potential effects of Covid policies more generally, rather than effects of single policies
Those, like ours (and some others discussed in our paper) which accepted that the effect was going to be non-causal, but tried to employ novel methods (we tried to exploit the natural lag period between virus transmission and death, for instance), emphasise the importance of being able to select policies for maximal benefit with minimal disruption, and the need to recognise and control for the presence of similar policies being enacted
None of these studies was perfect, or near perfect, for causal analysis. Particularly if we consider that some of the effects of these policies could be much longer-term, e.g., potential effects on health system backlogs, economic supply chains and macroeconomic measures (such as inflation), and social capital effects (such as trust in each other, or governments and institutions). What is more, such longer-term effects are even harder to connect causally than the direct Covid virus effects that were the focus of much of this research.
We’re left with a cat
What we’re left with, then, is a high degree of uncertainty. We have a series of Schrödinger's cats, policies whose undisputed effects are unmeasurable, where we aren’t completely sure what ‘worked’ or ‘didn’t work’, and especially in the totality of all of their potential positive/negative effects.Nevertheless, if you turn on the news or load social media you will still frequently hear claims that “lockdowns worked”, or “lockdowns didn’t work” very definitively stated. Sometimes these claims are just based on misunderstandings, naïve readings of peaks and troughs of an epidemic wave (see, for example, Sweden, one of the countries that implemented the least policies, where the virus still went in waves) – oh, of course it rained, I did my rain dance! Sometimes, unfortunately, some ‘experts’, or those who should at least know better, seem to get carried away with their own narratives, or the politics. Just be aware that if someone is telling you that they are completely convinced either way on Covid policies, they likely fall into one of these groups.