There has been a lot of talk about the IHME Covid-19 projection model. Ellie Murray & I have a chat about it on Episode 10 of Casual Inference; here is a quick description of what is going on here with a focus on the **uncertainty**.

When I look at models, I usually start with two things:

📈 What method is being used?

📥 What data is it based on?

Let’s start with the methods!

### Methods

in particular it is using a non-linear mixed effects model

📈 The IHME model is estimating the log of the cumulative death rate for a given state at a given time

🌊 Using curve fitting

📏 parametrized with info about the state’s social distancing

Since the IHME model is trying to estimate a **curve** there are ✌️ two important pieces:

1️⃣ When will deaths “peak”

2️⃣ How many deaths will there be at the “peak”

To estimate when these occur, the IHME model has two sources of info:

⏱ the current death rate over time for the state

📏 the social distancing measures being implemented

This information is combined with some 🌏global info as well

👶 In the short run, the model is impacted more by the state’s data

👴 In the long run, they use info from locations that have seemingly already reached a peak: Wuhan, 5 in Italy, 2 in Spain

### Uncertainty

OKAY now that we know what the IHME model is doing, let’s get to the good stuff - where is the uncertainty?

- There is uncertainty that the model itself will accurately predict what will happen (it’s based on a Gaussian error function - is that right?)

- There is uncertainty in the distributional assumptions of the model

- Even if the model is correctly specified, there is uncertainty in the parameter estimation (this is a mixed effects model, so there is uncertainty associated with fixed and the random effects)

- There may be systematic uncertainty in the reported state-by-state death data. Why? Fewer deaths may be reported on weekends, if systems are overrun, COVID-19 related deaths may go unreported, etc NPR reports that NYC is seeing a spike of deaths at home that are not originally included in COVID-19 count

- There may be random uncertainty in the reported state-by-state death data
- There is uncertainty in the reported information coming from cities that seem to have already peaked

So let’s recap on the uncertainty in the IHME model:

1️⃣ model choice

2️⃣ model parameters

3️⃣ model estimation

4️⃣ data from the states (systematic)

5️⃣ data from the states (random)

6️⃣ data from the “peaked” locations

In the original model (pre-last week) the error bands you saw only accounted for 3️⃣, since then the model was updated so that the uncertainty also accounts for out-of-sample uncertainty, which I believe covers 5️⃣

The shaded red region in the model is the *uncertainty* the model accounts for, just two of the 6:

❌1️⃣ model choice

❌2️⃣ model parameters

✅3️⃣ model estimation

❌4️⃣ data from the states (systematic)

✅5️⃣ data from the states (random)

❌6️⃣ data from the “peaked” locations

This is not unusual or bad! It is just good to keep in mind the uncertainty that these projections carry with them. If all of the uncertainty we’ve talked about today was quantified, it’s possible we’d basically have no answers to go off of 🤷♀

"You guys think I don't give you straight answers. You have to talk to these statisticians. They will not give you a direct answer on anything." 🤣 https://t.co/4AhCHYaDtz

— Hilary Parker (@hspter) April 6, 2020

Think I missed something important? Please let me know! 🙏