runconf17, an analysis of emoji use

I had such a delightful time at rOpenSci’s unconference last week. 21 📦 were produced! Not only was it extremely productive, but in between the crazy productivity was some epic community building.

Stefanie kicked the conference off with ice breakers, where we explored topics ranging from #rcatladies & #rdogfellas for the record, I’m an #rchickenlady, IT’S HAPPENING to impostor syndrome. It was an excellent way to get conversations starting!


Karthik and I worked on two packages:

arresteddev: a package for when your development is going awry! Mostly, this was a good excuse to look up Arrested Development gifs, which, we established, is pronounced with a g like giraffe. Includes functions such as lmgtfy(), that will seamlessly google your last error message, David Robinson’s tracestack() that will query your last error message on Stack Overflow, and squirrel(), a function that will randomly send you to a distracting website - for when things are really going poorly 💁.

ponyexpress: a package for automating speedy emails from R - copy and paste neigh more 🐴. This package allows you to send templated emails to a list of contacts. Great for conferences, birthday parties, or karaoke invitations.


Between our package building, there were SO many opportunities to get to know some of the most talented people.

Jenny & I enthusiastically working on googledrive.
More than anything, this was an excellent opportunity to feel like a part of a community – and a community that certainly extends beyond the people that attended the unconference! There were so many people following along, tweeting along, and assisting along the way.

a few highlights:


In an effort to stay on brand, I decided to do a small analysis of the tweets that came out of #runconf17. I designed a small study: Note: this is not particularly statistically rigorous, but it is VERY fun.

  • pulled all tweets (excluding retweets) using the hashtag #runconf17 between May 24th and May 30th
  • also pulled all tweets (excluding retweets) using the hashtag #rstats during the same time period

Question: Are twitter users who used the #runconf17 hashtag more likely to use emojis than those who only tweeted with the #rstats hashtag during the same time period?

I used the rtweet package to pull the tweets, dplyr and fuzzyjoin to wrangle the data a bit, and rms to analyze it.

runconf <- search_tweets(q = "#runconf17 AND since:2017-05-23 AND until:2017-05-31",
                         n = 1e4, 
                         include_rts = FALSE)

rstats <- search_tweets(q = "#rstats AND since:2017-05-23 AND until:2017-05-31",
                        n = 1e4,
                        include_rts = FALSE)

After pulling in the tweets, I categorized tweeters as either using the #runconf17 hashtag during the week or not. I then merged the tweets with an emoji dictionary The emoji dictionary was discoved by the lovely Maëlle!, and grouped by tweeter. If the tweeter used an emoji at any point during the week, they were categorized as an emoji-user, if not, they were sad (jk, there is room for all here!).

## create variable for whether tweeted about runconf
runconf$runconf <- "yes"

rstats <- rstats %>%
  mutate(runconf = ifelse(screen_name %in% runconf$screen_name, "yes", "no"))

## load in the emoji dictionary
dico <- readr::read_csv2("")
## Using ',' as decimal and '.' as grouping mark. Use read_delim() for more control.
## Parsed with column specification:
## cols(
##   Description = col_character(),
##   Native = col_character(),
##   Bytes = col_character(),
##   `R-encoding` = col_character()
## )
## combine datasets, keep only unique tweets
data <- bind_rows(runconf, rstats) %>%
  distinct(text, .keep_all = TRUE)

## summarize by user, did they tweet about runconf in the past week 
## & did they use an emoji in the past week?
used_emoji <- regex_left_join(data, dico, by = c(text = "Native")) %>%
         emoji = Native) %>%
  group_by(screen_name) %>%
  mutate(tot_emoji = sum(!,
         used_emoji = ifelse(tot_emoji > 0, "yes", "no"),
         tot_tweets = n_distinct(text)) %>%
  distinct(screen_name, .keep_all = TRUE)


We had 526 tweeters that just used the #rstats hashtag, and 107 that tweeted with the #runconf17 hashtag. THESE ARE MY PEOPLE 🙌 Among the #rstats tweeters, 5.9% used at least one emoji in their tweets, whereas among #runconf17 tweeters, 25.2% used emojis!

used_emoji %>%
  group_by(`tweeted #runconf` = runconf, `used emoji` = used_emoji) %>%
  tally() %>%
  mutate(`%` = 100*prop.table(n)) %>%
  knitr::kable(digits = 1)
tweeted #runconf used emoji n %
no no 495 94.1
no yes 31 5.9
yes no 80 74.8
yes yes 27 25.2

Alright, that looks pretty promising, but let’s get some confidence intervals. It’s time to model it! 💃

## modeling time!
dd <- datadist(used_emoji)
options(datadist = "dd")

lrm(used_emoji~runconf, data = used_emoji) %>%
  summary() %>%
Effects   Response: used_emoji
Low High Δ Effect S.E. Lower 0.95 Upper 0.95
runconf --- yes:no 1 2 1.684 0.2895 1.117 2.252
Odds Ratio 1 2 5.389 3.056 9.505

Tweeting the #runconf17 hashtag seems undeniably associated with a higher odds of emoji use (OR: 5.4, 95% CI: 3.1, 9.5).

Lucy D'Agostino McGowan image
Lucy D'Agostino McGowan

Currently excited about: observational study methods, translational research, BB-8