Say i have this dataframe:
library(tidyverse)
# Sample data frame
df <- data.frame(
id = 1:3,
fruits = c("apple | oranges", "apple | bananas", "bananas | oranges")
)
df
id | fruits |
---|---|
1 | apple | oranges |
2 | apple | bananas |
3 | bananas | oranges |
I want to separate the values from the fruits column and then perform one hot encoding for each one as follows:
# Step 1: Separate the values based on |
df_separated <- df %>%
separate_rows(fruits, sep = " \\| ")
# Step 2: Create a dummy variable for each element
df_dummy <- df_separated %>%
mutate(value = TRUE) %>%
spread(fruits, value, fill = FALSE)
# View the result
print(df_dummy)
id | apple | bananas | oranges |
---|---|---|---|
1 | TRUE | FALSE | TRUE |
2 | TRUE | TRUE | FALSE |
3 | FALSE | TRUE | TRUE |
However, I cannot manage to convert this code as a recipe step to incorporate it into a tidymodels workflow. Any ideas how to do so?
library(tidymodels)
dummies_fruit <- recipe(~ fruits, data = df) |>
step_dummy_extract(fruits, sep = " | ") |>
prep()