library(tidyverse)
<- read_csv("https://raw.githubusercontent.com/LFOD/real-blog/master/static/data/harry-potter.csv") harry_potter
This one cool hack will…help you categorize Harry Potter characters!
Hilary Parker & Roger Peng mention a cool tactic for categorizing data on their podcast, Not So Standard Deviations. (If I recall correctly, I think Hilary mentioned it first, then Roger used it in a subsequent episode, unfortunately I don’t remember which episodes these were and a quick look-back proved futile. If I figure it out, I’ll link it here!)
The basic concept is to make a small table & join it into the data frame you are trying to categorize instead of writing a bunch of if/else statements. This is especially useful if you are:
☝️ Using the same categories on a bunch of different data frames (you can just create the small table of categories once!)
✌️ Creating multiple new variables from a single variable
Have a bunch of things to re-categorize?
— Lucy D’Agostino McGowan (@LucyStats) May 15, 2020
TIRED: write a bunch of case_when or if/else statements
WIRED: left join a small data frame with the updated categories
*this message is brought to you by @nssdeviations (THANK YOU!) and my HAPPY HEART each time I do this now pic.twitter.com/smFubWEGcq
I tweeted about this kernel of wisdom and a few people asked me to write up an example, so here it is!
Example
For this example, I am going to rank Harry Potter characters based on the house they were sorted into on a variety of characteristics.
This data originated from a Kaggle Dataset by Gulsah Demiryurek
Tired
Here is how I would do this with case_when()
.
%>%
harry_potter mutate(
smart_rank = case_when(
== "Ravenclaw" ~ 1,
House == "Gryffindor" ~ 2,
House == "Slytherin" ~ 3,
House == "Hufflepuff" ~ 4
House
),brave_rank = case_when(
== "Gryffindor" ~ 1,
House == "Slytherin" ~ 2,
House == "Ravenclaw" ~ 3,
House == "Hufflepuff" ~ 4
House
),cunning_rank = case_when(
== "Slytherin" ~ 1,
House == "Ravenclaw" ~ 2,
House == "Gryffindor" ~ 3,
House == "Hufflepuff" ~ 4
House
),kind_rank = case_when(
== "Hufflepuff" ~ 1,
House == "Gryffindor" ~ 2,
House == "Ravenclaw" ~ 3,
House == "Slytherin" ~ 4
House
) )
Wired
Here’s how I would do this with a data frame
<- tibble(
ranks House = c("Gryffindor", "Ravenclaw", "Hufflepuff", "Slytherin"),
smart_rank = c(2, 1, 4, 3),
brave_rank = c(1, 3, 4, 2),
cunning_rank = c(3, 2, 4, 1),
kind_rank = c(2, 3, 1, 4)
)
%>%
harry_potter left_join(ranks, by = "House")
Inspired
After tweeting this out, several people pointed out that this is a nice use case for tribble()
!
Such lookup tables are things where
— Konrad Rudolph (@klmr) May 16, 2020tribble
really shines, IMHO.
tribble() would be just as clear as the tired one and just as nifty as the wired one.
— Dave Harris, but masked (@davidjayharris) May 16, 2020
That’s how set nice labels for plots and tables. Make a tribble and join at the last minute.
— tj mahr 🍕🍍 (@tjmahr) May 16, 2020
Let’s see how that looks!
<- tribble(
ranks ~House, ~smart_rank, ~brave_rank, ~cunning_rank, ~kind_rank,
"Gryffindor", 2, 1, 3, 2,
"Ravenclaw", 1, 3, 3, 3,
"Hufflepuff", 4, 4, 4, 1,
"Slytherin", 3, 2, 1, 4,
)
%>%
harry_potter left_join(ranks, by = "House")