Model Detective
It has recently been brought to our attention that a model is predicting we will have 0 COVID-19 related deaths after May 15th. This model was described as a “cubic polynomial”. Inspired by Dave Robinson’s data sleuthing, I decided to see if we could reverse engineer this “model”.
The “cubic model” from @CEA + Kevin Hassett was pretty clearly fit on log(deaths + 1). Which is… pretty dangerous for forecasting.
— David Robinson (@drob) May 5, 2020
Just imagine if they'd fit a quartic model pic.twitter.com/VTlFoD9qvC
Turns out we can!
Let’s read in the data from the New York Times GitHub Repo.
library(tidyverse)
library(magick)
library(grid)
img <- image_read("https://pbs.twimg.com/media/EXQlsS6WAAAXy3W?format=jpg&name=medium")
d <- read_csv("https://raw.githubusercontent.com/nytimes/covid-19-data/master/us.csv")
I’m going to do some data manipulation:
- Add some future dates for our “prediction model”
- Filter so we only include dates from the graph we are trying to replicate
- The New York Times gives us cumulative deaths, we want just deaths per day, so I’ll create that
dat <- d %>%
bind_rows(tibble(date = Sys.Date() + 1:91)) %>%
filter(date >= "2020-02-26") %>%
mutate(death = c(deaths[1], diff(deaths)))
Here is a VERY basic prediction model. It includes a 3rd degree polynomial term it is predicting log(death + a small constant). I tried a few, a small constant of 0.5 seems to fit the best 🤷.
p <- predict(lm(log(death + 0.5) ~ poly(date, 3), data = dat), newdata = dat)
Let’s plot it. First I’ll create a plot that just includes the original graph.
p1 <- ggplot(dat, aes(x = date, y = death)) +
geom_line() +
annotation_custom(
rasterGrob(img, width = unit(1, "npc"), height = unit(1, "npc")),
-Inf, Inf, -Inf, Inf) +
ylim(c(-500, 5000)) +
theme(axis.title = element_blank(),
axis.text = element_blank(),
axis.ticks = element_blank()) +
scale_x_date(limits = as.Date(c('2020-02-17','2020-08-10')))
p1
Now I’ll add a plot that has the New York Times death data overlaid, just to be sure things line up properly.
p2 <- ggplot(dat, aes(x = date, y = death)) +
geom_line() +
annotation_custom(
rasterGrob(img, width = unit(1, "npc"), height = unit(1, "npc")),
-Inf, Inf, -Inf, Inf) +
geom_line() +
ylim(c(-500, 5000)) +
theme(axis.title = element_blank(),
axis.text = element_blank(),
axis.ticks = element_blank()) +
scale_x_date(limits = as.Date(c('2020-02-17','2020-08-10')))
p2
Finally, let’s plot our “model”.
p3 <- ggplot(dat, aes(x = date, y = death)) +
annotation_custom(
rasterGrob(img, width = unit(1, "npc"), height = unit(1, "npc")),
-Inf, Inf, -Inf, Inf) +
geom_line() +
geom_line(aes(x = date, y = exp(p))) +
ylim(c(-500, 5000)) +
theme(axis.title = element_blank(),
axis.text = element_blank(),
axis.ticks = element_blank()) +
scale_x_date(limits = as.Date(c('2020-02-17','2020-08-10')))
p3
Let’s make this into a gif for sharing!
fig1 <- image_graph(width = 1000, height = 650, res = 300)
p1
dev.off()
fig2 <- image_graph(width = 1000, height = 650, res = 300)
p2
dev.off()
fig2 <- fig2 %>%
image_annotate("I've added death data from NYTimes - @LucyStats",
location = "+650+400", size = 15)
fig3 <- image_graph(width = 1000, height = 650, res = 300)
p3
dev.off()
fig3 <- fig3 %>%
image_annotate("I've added death data from NYTimes - @LucyStats",
location = "+650+400", size = 15) %>%
image_annotate("I've added a model with 3rd degree polynomial\n for log(death + 0.5) - @LucyStats",
location = "+650+420", size = 15)
image_join(fig1, fig2, fig3) %>%
image_animate(fps = 1)
But wait there’s more!
Thomas Lumley aptly pointed out that while we’re extrapolating, let’s see how this model does if we peek a little into the past.
So, since (as you know) a cubic must go to -infinity at one end and +infinity at the other, I wondered how well this would extrapolate the other way. Turns out the entire world population were cases on December 4, which isn't quite how I remembered it.
— Thomas Lumley (@tslumley) May 5, 2020
Hmm fascinating, it looks like deaths began to increase again as of January 7th. 🤔 I don’t seem to remember that.
Okay, let’s see how bad things got, according to this model.
Yikes. Looks like the whole population died on December 4, 2019. RIP us.