I currently have a dataset in R that is in long format and I'm trying to make it wide with a couple of specifications. So my dataset has a respondent ID and their gender along with one other column (let's say "fruits") that I'm interested in.
id <- c(1,1,1,2,3,4,4,5,5,5)
gender <- c("F","F","F","M","Unknown","F","F","M","M","M")
fruit <- c("pear", "apple", "banana", "pear", "strawberry","banana", "banana", "pear", "strawberry", "banana")
df <- cbind(id, gender, fruit)
id gender fruit
[1,] "1" "F" "pear"
[2,] "1" "F" "apple"
[3,] "1" "F" "banana"
[4,] "2" "M" "pear"
[5,] "3" "Unknown" "strawberry"
[6,] "4" "F" "banana"
[7,] "4" "F" "banana"
[8,] "5" "M" "pear"
[9,] "5" "M" "strawberry"
[10,] "5" "M" "banana"
My goal is to create a binary column for each potential response in that column to determine whether that individual provided that response at all throughout the dataset.
id <- c(1,2,3,4,5)
gender <- c("F","M","Unknown","F","M")
pear <- c(1,1,0,0,1)
apple <- c(1,0,0,0,0)
banana <- c(1,0,0,1,1)
strawberry <- c(0,0,1,0,1)
df2 <- cbind(id, gender, pear, apple, banana, strawberry)
id gender pear apple banana strawberry
[1,] "1" "F" "1" "1" "1" "0"
[2,] "2" "M" "1" "0" "0" "0"
[3,] "3" "Unknown" "0" "0" "0" "1"
[4,] "4" "F" "0" "0" "1" "0"
[5,] "5" "M" "1" "0" "1" "1"
I'm not sure if this is important to note but while in it's original long format, a specific ID can have multiple rows with the same response to "fruits" like I have shown with my mock respondent 4. I hope that's clear; thanks very much in advance!
Here is a pure Tidyverse approach:
library(tidyverse)
df |>
distinct() |>
pivot_wider(
names_from = fruit,
values_from = fruit,
values_fn = \(x) 1,
values_fill = 0
)
#> # A tibble: 5 × 6
#> id gender pear apple banana strawberry
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 1 F 1 1 1 0
#> 2 2 M 1 0 0 0
#> 3 3 Unknown 0 0 0 1
#> 4 4 F 0 0 1 0
#> 5 5 M 1 0 1 1
Created on 2023-04-12 with reprex v2.0.2