Pulling co-authors for grant docs

rstats
I just submitted my first grant. It turns out you need tons of little things when you submit a grant (who knew!) and one of the little things is a list of all of the coauthors you’ve published with in the past four years. Instead of tracking that down, I automated the process using R and then stuck the code here so I have it for next time!
Author

Lucy D’Agostino McGowan

Published

November 14, 2019

So I just submitted my first grant (yikes) and WOW are there a lot of little pieces! One of the little required pieces was a spreadsheet detailing all of my co-authors for the past 4 years with their affiliations 😱. This sounded like a total nightmare to compile And I don’t have very many papers! I am on a few clinical papers that have a ton of co-authors. Sidenote to people who know grant things: Did I actually need to do this for every co-author? Please comment with your advise! and naturally I left it until the last second to do. I am mostly writing this post so I have a way to do this quickly documented for next time.

It turns out R has once again come to the rescue! There is a 💣 package, easyPubMed that made this easy peasy.

Load packages

I’m loading four packages: easyPubMed, the trusty tidyverse 📦, one of my all times favorites, glue, and the unfortunately named lubridate

library(easyPubMed) 
library(tidyverse)
library(glue)
library(lubridate)

Pull my papers

Here I’m querying pubmed for the papers I’ve authored. I’m creating a data frame that includes details about the papers (include all of the authors!).

query <- "Lucy D'Agostino McGowan[AU] OR LD McGowan[AU]"
get_pubmed_ids(query) %>%
fetch_pubmed_data(encoding = "ASCII") %>%
table_articles_byAuth(included_authors = "all", 
                            max_chars = 100, 
                            autofill = TRUE) -> my_papers

Clean it up

Using the glue package, I combine the first and lastname variables into the lastname, firstname form requested by the granting agency. Year is pulled in as a ??character?? variable, so we need to fix that. I also filter out myself and only keep the papers from the past four years. Finally I select just the co-author’s name and affiliation and filter out any duplicates.

my_papers %>%
  mutate(name = glue("{lastname}, {firstname}"),
         year = as.numeric((year))) %>%
  filter(!str_detect(name, "D'Agostino"), year >= year(today()) - 4) %>%
  select(name, address) %>%
  distinct() -> coauthors

And there you have it! 🎉