This one cool hack will...help you categorize Harry Potter characters!

Hilary Parker & Roger Peng mention a cool tactic for categorizing data on their podcast, Not So Standard Deviations. (If I recall correctly, I think Hilary mentioned it first, then Roger used it in a subsequent episode, unfortunately I don’t remember which episodes these were and a quick look-back proved futile. If I figure it out, I’ll link it here!)

The basic concept is to make a small table & join it into the data frame you are trying to categorize instead of writing a bunch of if/else statements. This is especially useful if you are:

☝️ Using the same categories on a bunch of different data frames (you can just create the small table of categories once!)
✌️ Creating multiple new variables from a single variable

I tweeted about this kernel of wisdom and a few people asked me to write up an example, so here it is!

Example

For this example, I am going to rank Harry Potter characters based on the house they were sorted into on a variety of characteristics.

This data originated from a Kaggle Dataset by Gulsah Demiryurek

library(tidyverse)
harry_potter <- read_csv("https://raw.githubusercontent.com/LFOD/real-blog/master/static/data/harry-potter.csv")

Tired

Here is how I would do this with case_when().

harry_potter %>%
  mutate(
    smart_rank = case_when(
      House == "Ravenclaw" ~ 1,
      House == "Gryffindor" ~ 2,
      House == "Slytherin" ~ 3,
      House == "Hufflepuff" ~ 4
    ),
    brave_rank = case_when(
      House == "Gryffindor" ~ 1,
      House == "Slytherin" ~ 2,
      House == "Ravenclaw" ~ 3,
      House == "Hufflepuff" ~ 4
    ),
    cunning_rank = case_when(
      House == "Slytherin" ~ 1,
      House == "Ravenclaw" ~ 2,
      House == "Gryffindor" ~ 3,
      House == "Hufflepuff" ~ 4
    ),
    kind_rank = case_when(
      House == "Hufflepuff" ~ 1,
      House == "Gryffindor" ~ 2,
      House == "Ravenclaw" ~ 3,
      House == "Slytherin" ~ 4
    )
  )

Wired

Here’s how I would do this with a data frame

ranks <- tibble(
  House = c("Gryffindor", "Ravenclaw", "Hufflepuff", "Slytherin"),
  smart_rank = c(2, 1, 4, 3),
  brave_rank = c(1, 3, 4, 2),
  cunning_rank = c(3, 2, 4, 1),
  kind_rank = c(2, 3, 1, 4)
)

harry_potter %>%
  left_join(ranks, by = "House")

Inspired

After tweeting this out, several people pointed out that this is a nice use case for tribble()!

Let’s see how that looks!

ranks <- tribble(
  ~House,      ~smart_rank, ~brave_rank, ~cunning_rank, ~kind_rank,
  "Gryffindor", 2,           1,          3,             2,
  "Ravenclaw",  1,           3,          3,             3,
  "Hufflepuff", 4,           4,          4,             1,
  "Slytherin",  3,           2,          1,             4,
)

harry_potter %>%
  left_join(ranks, by = "House") 

Lucy D'Agostino McGowan image
Lucy D'Agostino McGowan

Currently excited about: observational study methods, translational research, BB-8