library("rtweet")
library("dplyr")
library("fuzzyjoin")
library("rms")
I had such a delightful time at rOpenSci’s unconference last week.
21 📦 were produced!
Not only was it extremely productive, but in between the crazy productivity was some epic community building.
for the record, I’m an #rchickenlady, IT’S HAPPENING
Stefanie kicked the conference off with ice breakers, where we explored topics ranging from #rcatladies & #rdogfellas to impostor syndrome. It was an excellent way to get conversations starting!
work
Karthik and I worked on two packages:
arresteddev: a package for when your development is going awry! ::: column-margin Mostly, this was a good excuse to look up Arrested Development gifs, which, we established, is pronounced with a g like giraffe. ::: Includes functions such as
lmgtfy()
, that will seamlessly google your last error message, David Robinson’s tracestack()
that will query your last error message on Stack Overflow, and squirrel()
, a function that will randomly send you to a distracting website - for when things are really going poorly 💁.
ponyexpress: a package for automating speedy emails from R - copy and paste neigh more 🐴. This package allows you to send templated emails to a list of contacts. Great for conferences, birthday parties, or karaoke invitations.
play
Between our package building, there were SO many opportunities to get to know some of the most talented people.
<img src = “https://github.com/LFOD/real-blog/raw/master/static/images/jenny_lucy.jpg”“>
Jenny & I enthusiastically working on googledrive.
More than anything, this was an excellent opportunity to feel like a part of a community – and a community that certainly extends beyond the people that attended the unconference! There were so many people following along, tweeting along, and assisting along the way.
a few highlights:
- 🍨 ice cream outings
- 🎤 karaoke adventures
- 🍸 happy hours (complete with R-themed drinks)
- 💪 Karthik attempting to lick his elbow
analysis
Note: this is not particularly statistically rigorous, but it is VERY fun.
In an effort to stay on brand, I decided to do a small analysis of the tweets that came out of #runconf17. I designed a small study:
- pulled all tweets (excluding retweets) using the hashtag #runconf17 between May 24th and May 30th
- also pulled all tweets (excluding retweets) using the hashtag #rstats during the same time period
Question: Are twitter users who used the #runconf17 hashtag more likely to use emojis than those who only tweeted with the #rstats hashtag during the same time period?
I used the rtweet package to pull the tweets, dplyr and fuzzyjoin to wrangle the data a bit, and rms to analyze it.
<- search_tweets(q = "#runconf17 AND since:2017-05-23 AND until:2017-05-31",
runconf n = 1e4,
include_rts = FALSE)
<- search_tweets(q = "#rstats AND since:2017-05-23 AND until:2017-05-31",
rstats n = 1e4,
include_rts = FALSE)
The emoji dictionary was discovered by the lovely Maëlle!
After pulling in the tweets, I categorized tweeters as either using the #runconf17 hashtag during the week or not. I then merged the tweets with an emoji dictionary, and grouped by tweeter. If the tweeter used an emoji at any point during the week, they were categorized as an emoji-user, if not, they were sad (jk, there is room for all here!).
## create variable for whether tweeted about runconf
$runconf <- "yes"
runconf
<- rstats %>%
rstats mutate(runconf = ifelse(screen_name %in% runconf$screen_name, "yes", "no"))
## load in the emoji dictionary
<- readr::read_csv2("https://raw.githubusercontent.com/today-is-a-good-day/emojis/master/emDict.csv") dico
ℹ Using "','" as decimal and "'.'" as grouping mark. Use `read_delim()` for more control.
Rows: 842 Columns: 4
── Column specification ────────────────────────────────────
Delimiter: ";"
chr (4): Description, Native, Bytes, R-encoding
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## combine datasets, keep only unique tweets
<- bind_rows(runconf, rstats) %>%
data distinct(text, .keep_all = TRUE)
## summarize by user, did they tweet about runconf in the past week
## & did they use an emoji in the past week?
<- regex_left_join(data, dico, by = c(text = "Native")) %>%
used_emoji select(screen_name,
text,
runconf,emoji = Native) %>%
group_by(screen_name) %>%
mutate(tot_emoji = sum(!is.na(emoji)),
used_emoji = ifelse(tot_emoji > 0, "yes", "no"),
tot_tweets = n_distinct(text)) %>%
distinct(screen_name, .keep_all = TRUE)
results
We had 526 tweeters that just used the #rstats hashtag, and 107 that tweeted with the #runconf17 hashtag. ::: column-margin THESE ARE MY PEOPLE 🙌 ::: Among the #rstats tweeters, 5.9% used at least one emoji in their tweets, whereas among #runconf17 tweeters, 25.2% used emojis!
%>%
used_emoji group_by(`tweeted #runconf` = runconf, `used emoji` = used_emoji) %>%
tally() %>%
mutate(`%` = 100*prop.table(n)) %>%
::kable(digits = 1) knitr
tweeted #runconf | used emoji | n | % |
---|---|---|---|
no | no | 495 | 94.1 |
no | yes | 31 | 5.9 |
yes | no | 80 | 74.8 |
yes | yes | 27 | 25.2 |
Alright, that looks pretty promising, but let’s get some confidence intervals. It’s time to model it! 💃
## modeling time!
<- datadist(used_emoji)
dd options(datadist = "dd")
lrm(used_emoji~runconf, data = used_emoji) %>%
summary() %>%
html()
Effects Response: used_emoji |
|||||||
---|---|---|---|---|---|---|---|
Low | High | Δ | Effect | S.E. | Lower 0.95 | Upper 0.95 | |
runconf --- yes:no | 1 | 2 | 1.684 | 0.2895 | 1.117 | 2.252 | |
Odds Ratio | 1 | 2 | 5.389 | 3.056 | 9.505 |
Tweeting the #runconf17 hashtag seems undeniably associated with a higher odds of emoji use (OR: 5.4, 95% CI: 3.1, 9.5).
most popular emojis
Now let’s checkout which emojis were most popular among #runconf17 tweeters. This time I’ll allow for retweets 👯
For this I used ggplot2, magick, and webshot
library("ggplot2")
library("webshot")
library("magick")
<- search_tweets(q = "#runconf17 AND since:2017-05-23 AND until:2017-05-31",
runconf_emojis n = 1e4)
<- regex_left_join(runconf_emojis, dico, by = c(text = "Native")) %>%
emojis group_by(Native) %>%
filter(!is.na(Native)) %>%
summarise(n = n()) %>%
arrange(desc(n)) %>%
head(15) %>%
mutate(num = 1:15)
This (like many things I do) was very much inspired by Maëlle’s post.
<- function(limit) {
plot_emojis <- emojis %>%
emojis_filter filter(emojis$n <= limit)
<- paste0("file://emojis_", limit,".svg")
out_svg <- paste0("emojis_", limit, ".png")
out_png <- ggplot(emojis_filter, aes(num, n)) +
p geom_col() +
xlim(c(0,16)) +
geom_text(aes(x = num,
y = n + 1,
label = Native), size = 5) +
theme(axis.text.y=element_blank(),
axis.ticks=element_blank(),
legend.position="none") +
ylim(c(0, max(emojis$n) + 10)) +
xlab("emoji") +
ggtitle("#runconf17 emojis") +
coord_flip()
print(p)
::grid.export(out_svg)
gridSVGwebshot(out_svg,
out_png,vwidth = 100,
vheight = 100,
zoom = 3)
out_png }
Now let’s make them into a gif!
<- purrr::map_chr(emojis$n, plot_emojis)
out_png
::map(unique(rev(out_png)), image_read) %>%
purrrimage_join() %>%
image_animate(fps=1) %>%
image_write("runconf_emojis.gif")
Phew, the 🐔
The purple heart seems to be the most popular emoji, which makes sense given 25% of us were #RLadies! I think it’s a credit to the awesome geographic diversity that we have two different globe emojis in our top 15!
All in all, it was an epic experience. Thank you so much to the conference organizers, attendees, and #runconf17 tweeters for such a delightful week!