# Model Detective

It has recently been brought to our attention that a model is predicting we will have 0 COVID-19 related deaths after May 15th. This model was described as a “cubic polynomial”. Inspired by Dave Robinson’s data sleuthing, I decided to see if we could reverse engineer this “model”.

Turns out we can!

Let’s read in the data from the New York Times GitHub Repo.

library(tidyverse)
library(magick)
library(grid)

d <- read_csv("https://raw.githubusercontent.com/nytimes/covid-19-data/master/us.csv")

I’m going to do some data manipulation:

1. Add some future dates for our “prediction model”
2. Filter so we only include dates from the graph we are trying to replicate
3. The New York Times gives us cumulative deaths, we want just deaths per day, so I’ll create that
dat <- d %>%
bind_rows(tibble(date = Sys.Date() + 1:91)) %>%
filter(date >= "2020-02-26") %>%
mutate(death = c(deaths, diff(deaths)))

Here is a VERY basic prediction model. It includes a 3rd degree polynomial term it is predicting log(death + a small constant). I tried a few, a small constant of 0.5 seems to fit the best 🤷.

p <- predict(lm(log(death + 0.5) ~ poly(date, 3), data = dat), newdata = dat)

Let’s plot it. First I’ll create a plot that just includes the original graph.

p1 <- ggplot(dat, aes(x = date, y = death)) +
geom_line() +
annotation_custom(
rasterGrob(img, width = unit(1, "npc"),  height = unit(1, "npc")),
-Inf, Inf, -Inf, Inf) +
ylim(c(-500, 5000)) +
theme(axis.title = element_blank(),
axis.text = element_blank(),
axis.ticks = element_blank()) +
scale_x_date(limits = as.Date(c('2020-02-17','2020-08-10')))
p1 Now I’ll add a plot that has the New York Times death data overlaid, just to be sure things line up properly.

p2 <- ggplot(dat, aes(x = date, y = death)) +
geom_line() +
annotation_custom(
rasterGrob(img, width = unit(1, "npc"),  height = unit(1, "npc")),
-Inf, Inf, -Inf, Inf) +
geom_line() +
ylim(c(-500, 5000)) +
theme(axis.title = element_blank(),
axis.text = element_blank(),
axis.ticks = element_blank()) +
scale_x_date(limits = as.Date(c('2020-02-17','2020-08-10')))
p2 Finally, let’s plot our “model”.

p3 <- ggplot(dat, aes(x = date, y = death)) +
annotation_custom(
rasterGrob(img, width = unit(1, "npc"),  height = unit(1, "npc")),
-Inf, Inf, -Inf, Inf) +
geom_line() +
geom_line(aes(x = date, y = exp(p))) +
ylim(c(-500, 5000)) +
theme(axis.title = element_blank(),
axis.text = element_blank(),
axis.ticks = element_blank()) +
scale_x_date(limits = as.Date(c('2020-02-17','2020-08-10')))
p3 Let’s make this into a gif for sharing!

fig1 <- image_graph(width = 1000, height = 650, res = 300)
p1
dev.off()

fig2 <- image_graph(width = 1000, height = 650, res = 300)
p2
dev.off()

fig2 <- fig2 %>%
image_annotate("I've added death data from NYTimes - @LucyStats",
location = "+650+400", size = 15)

fig3 <- image_graph(width = 1000, height = 650, res = 300)
p3
dev.off()

fig3 <- fig3 %>%
image_annotate("I've added death data from NYTimes - @LucyStats",
location = "+650+400", size = 15) %>%
image_annotate("I've added a model with 3rd degree polynomial\n for log(death + 0.5) - @LucyStats",
location = "+650+420", size = 15)
image_join(fig1, fig2, fig3) %>%
image_animate(fps = 1) ## But wait there’s more!

Thomas Lumley aptly pointed out that while we’re extrapolating, let’s see how this model does if we peek a little into the past.

Hmm fascinating, it looks like deaths began to increase again as of January 7th. 🤔 I don’t seem to remember that. Okay, let’s see how bad things got, according to this model. Yikes. Looks like the whole population died on December 4, 2019. RIP us. Lucy D'Agostino McGowan

Currently excited about: observational study methods, translational research, BB-8