library("rtweet")
library("dplyr")
Warning: package 'dplyr' was built under R version 4.2.3
library("fuzzyjoin")
library("rms")
Lucy D’Agostino McGowan
June 4, 2017
I had such a delightful time at rOpenSci’s unconference last week.
21 📦 were produced!
Not only was it extremely productive, but in between the crazy productivity was some epic community building.
for the record, I’m an #rchickenlady, IT’S HAPPENING
Stefanie kicked the conference off with ice breakers, where we explored topics ranging from #rcatladies & #rdogfellas to impostor syndrome. It was an excellent way to get conversations starting!
Karthik and I worked on two packages:
arresteddev: a package for when your development is going awry! ::: column-margin Mostly, this was a good excuse to look up Arrested Development gifs, which, we established, is pronounced with a g like giraffe. ::: Includes functions such as lmgtfy()
, that will seamlessly google your last error message, David Robinson’s tracestack()
that will query your last error message on Stack Overflow, and squirrel()
, a function that will randomly send you to a distracting website - for when things are really going poorly 💁.
ponyexpress: a package for automating speedy emails from R - copy and paste neigh more 🐴. This package allows you to send templated emails to a list of contacts. Great for conferences, birthday parties, or karaoke invitations.
Between our package building, there were SO many opportunities to get to know some of the most talented people.
<img src = “https://github.com/LFOD/real-blog/raw/master/static/images/jenny_lucy.jpg”“>
Jenny & I enthusiastically working on googledrive.
More than anything, this was an excellent opportunity to feel like a part of a community – and a community that certainly extends beyond the people that attended the unconference! There were so many people following along, tweeting along, and assisting along the way.
a few highlights:
Note: this is not particularly statistically rigorous, but it is VERY fun.
In an effort to stay on brand, I decided to do a small analysis of the tweets that came out of #runconf17. I designed a small study:
Question: Are twitter users who used the #runconf17 hashtag more likely to use emojis than those who only tweeted with the #rstats hashtag during the same time period?
I used the rtweet package to pull the tweets, dplyr and fuzzyjoin to wrangle the data a bit, and rms to analyze it.
Warning: package 'dplyr' was built under R version 4.2.3
The emoji dictionary was discovered by the lovely Maëlle!
After pulling in the tweets, I categorized tweeters as either using the #runconf17 hashtag during the week or not. I then merged the tweets with an emoji dictionary, and grouped by tweeter. If the tweeter used an emoji at any point during the week, they were categorized as an emoji-user, if not, they were sad (jk, there is room for all here!).
## create variable for whether tweeted about runconf
runconf$runconf <- "yes"
rstats <- rstats %>%
mutate(runconf = ifelse(screen_name %in% runconf$screen_name, "yes", "no"))
## load in the emoji dictionary
dico <- readr::read_csv2("https://raw.githubusercontent.com/today-is-a-good-day/emojis/master/emDict.csv")
ℹ Using "','" as decimal and "'.'" as grouping mark. Use `read_delim()` for more control.
Rows: 842 Columns: 4
── Column specification ────────────────────────────────────
Delimiter: ";"
chr (4): Description, Native, Bytes, R-encoding
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## combine datasets, keep only unique tweets
data <- bind_rows(runconf, rstats) %>%
distinct(text, .keep_all = TRUE)
## summarize by user, did they tweet about runconf in the past week
## & did they use an emoji in the past week?
used_emoji <- regex_left_join(data, dico, by = c(text = "Native")) %>%
select(screen_name,
text,
runconf,
emoji = Native) %>%
group_by(screen_name) %>%
mutate(tot_emoji = sum(!is.na(emoji)),
used_emoji = ifelse(tot_emoji > 0, "yes", "no"),
tot_tweets = n_distinct(text)) %>%
distinct(screen_name, .keep_all = TRUE)
We had 526 tweeters that just used the #rstats hashtag, and 107 that tweeted with the #runconf17 hashtag. ::: column-margin THESE ARE MY PEOPLE 🙌 ::: Among the #rstats tweeters, 5.9% used at least one emoji in their tweets, whereas among #runconf17 tweeters, 25.2% used emojis!
used_emoji %>%
group_by(`tweeted #runconf` = runconf, `used emoji` = used_emoji) %>%
tally() %>%
mutate(`%` = 100*prop.table(n)) %>%
knitr::kable(digits = 1)
tweeted #runconf | used emoji | n | % |
---|---|---|---|
no | no | 495 | 94.1 |
no | yes | 31 | 5.9 |
yes | no | 80 | 74.8 |
yes | yes | 27 | 25.2 |
Alright, that looks pretty promising, but let’s get some confidence intervals. It’s time to model it! 💃
## modeling time!
dd <- datadist(used_emoji)
options(datadist = "dd")
lrm(used_emoji~runconf, data = used_emoji) %>%
summary() %>%
html()
Effects Response: used_emoji |
|||||||
---|---|---|---|---|---|---|---|
Low | High | Δ | Effect | S.E. | Lower 0.95 | Upper 0.95 | |
runconf --- yes:no | 1 | 2 | 1.684 | 0.2895 | 1.117 | 2.252 | |
Odds Ratio | 1 | 2 | 5.389 | 3.056 | 9.505 |
Tweeting the #runconf17 hashtag seems undeniably associated with a higher odds of emoji use (OR: 5.4, 95% CI: 3.1, 9.5).
Now let’s checkout which emojis were most popular among #runconf17 tweeters. This time I’ll allow for retweets 👯
For this I used ggplot2, magick, and webshot
This (like many things I do) was very much inspired by Maëlle’s post.
plot_emojis <- function(limit) {
emojis_filter <- emojis %>%
filter(emojis$n <= limit)
out_svg <- paste0("file://emojis_", limit,".svg")
out_png <- paste0("emojis_", limit, ".png")
p <- ggplot(emojis_filter, aes(num, n)) +
geom_col() +
xlim(c(0,16)) +
geom_text(aes(x = num,
y = n + 1,
label = Native), size = 5) +
theme(axis.text.y=element_blank(),
axis.ticks=element_blank(),
legend.position="none") +
ylim(c(0, max(emojis$n) + 10)) +
xlab("emoji") +
ggtitle("#runconf17 emojis") +
coord_flip()
print(p)
gridSVG::grid.export(out_svg)
webshot(out_svg,
out_png,
vwidth = 100,
vheight = 100,
zoom = 3)
out_png
}
Now let’s make them into a gif!
Phew, the 🐔
The purple heart seems to be the most popular emoji, which makes sense given 25% of us were #RLadies! I think it’s a credit to the awesome geographic diversity that we have two different globe emojis in our top 15!
All in all, it was an epic experience. Thank you so much to the conference organizers, attendees, and #runconf17 tweeters for such a delightful week!