Hilary Parker & Roger Peng mention a cool tactic for categorizing data on their podcast, Not So Standard Deviations. (If I recall correctly, I think Hilary mentioned it first, then Roger used it in a subsequent episode, unfortunately I don’t remember which episodes these were and a quick look-back proved futile. If I figure it out, I’ll link it here!)

The basic concept is to make a small table & join it into the data frame you are trying to categorize instead of writing a bunch of if/else statements. This is especially useful if you are:

☝️ Using the same categories on a bunch of different data frames (you can just create the small table of categories once!)
✌️ Creating multiple new variables from a single variable

## Example

For this example, I am going to rank Harry Potter characters based on the house they were sorted into on a variety of characteristics.

This data originated from a Kaggle Dataset by Gulsah Demiryurek

library(tidyverse)
harry_potter <- read_csv("https://raw.githubusercontent.com/LFOD/real-blog/master/static/data/harry-potter.csv")

## Tired

Here is how I would do this with case_when().

harry_potter %>%
mutate(
smart_rank = case_when(
House == "Ravenclaw" ~ 1,
House == "Gryffindor" ~ 2,
House == "Slytherin" ~ 3,
House == "Hufflepuff" ~ 4
),
brave_rank = case_when(
House == "Gryffindor" ~ 1,
House == "Slytherin" ~ 2,
House == "Ravenclaw" ~ 3,
House == "Hufflepuff" ~ 4
),
cunning_rank = case_when(
House == "Slytherin" ~ 1,
House == "Ravenclaw" ~ 2,
House == "Gryffindor" ~ 3,
House == "Hufflepuff" ~ 4
),
kind_rank = case_when(
House == "Hufflepuff" ~ 1,
House == "Gryffindor" ~ 2,
House == "Ravenclaw" ~ 3,
House == "Slytherin" ~ 4
)
)

## Wired

Here’s how I would do this with a data frame

ranks <- tibble(
House = c("Gryffindor", "Ravenclaw", "Hufflepuff", "Slytherin"),
smart_rank = c(2, 1, 4, 3),
brave_rank = c(1, 3, 4, 2),
cunning_rank = c(3, 2, 4, 1),
kind_rank = c(2, 3, 1, 4)
)

harry_potter %>%
left_join(ranks, by = "House")

## Inspired

After tweeting this out, several people pointed out that this is a nice use case for tribble()!

Let’s see how that looks!

ranks <- tribble(
~House,      ~smart_rank, ~brave_rank, ~cunning_rank, ~kind_rank,
"Gryffindor", 2,           1,          3,             2,
"Ravenclaw",  1,           3,          3,             3,
"Hufflepuff", 4,           4,          4,             1,
"Slytherin",  3,           2,          1,             4,
)

harry_potter %>%
left_join(ranks, by = "House") 

Lucy D'Agostino McGowan

Currently excited about: observational study methods, translational research, BB-8