I have this data. Lots of people share the same id, but have different details about themselves recorded (name and month are often different).
data.frame(
id = c("a", "a", "a", "a", "b", "b"),
name = c("jane", "laura", "lauran", "lilly", "james", "jimmy"),
month = c("april", "april", "may", "june", "june", "june")
)
id | name | month |
---|---|---|
a | jane | april |
a | laura | april |
a | lauran | may |
a | lilly | june |
b | james | june |
b | jimmy | june |
I want a way of pivoting this wider, so each combination of name for each id appears (note: do not need to use pivot_wider if there is a better way).
This data has 1000s of rows in reality.
id | name_1 | name_2 | month_1 | month_2 | notes |
---|---|---|---|---|---|
a | jane | laura | april | april | |
a | jane | lauran | april | may | |
a | jane | lilly | april | june | |
a | laura | jane | april | april | * see below |
a | laura | lauran | april | may | |
a | laura | lilly | april | june | |
a | lauran | jane | may | april | |
a | lauran | laura | may | april | |
a | lauran | lilly | may | june | |
a | lilly | jane | june | april | |
a | lilly | laura | june | april | |
a | lilly | lauran | june | may | |
b | james | jimmy | june | june | |
b | jimmy | james | june | june |
*ideally solution would not have this row because this combination of names lauran/jane appears in row 1 already jane/lauran etc etc etc on future rows. However this solution would be ok, if that is as far as we can get.
Some sort of dplyr answer would be amazing but ok without!
Here you go. Easier to use a many-to-many merge than pivot wider in this case.
library(dplyr, warn.conflicts = FALSE) |>
suppressWarnings()
dat <- data.frame(
id = c(rep("a", 4), rep("b",2)),
name = c("jane", "laura", "lauran", "lilly", "james", "jimmy"),
month = c("april", "april", "may", rep("june",3))
)
dat |>
# Created row numbers to allow filtering later. Note use of magritta pipe here.
mutate(row = row_number(), .by = id) %>%
left_join(., ., by = "id", relationship = "many-to-many", suffix = c("_1", "_2")) |>
filter(row_2 > row_1) |>
select(id, starts_with("name"), starts_with("month"))
#> id name_1 name_2 month_1 month_2
#> 1 a jane laura april april
#> 2 a jane lauran april may
#> 3 a jane lilly april june
#> 4 a laura lauran april may
#> 5 a laura lilly april june
#> 6 a lauran lilly may june
#> 7 b james jimmy june june
Created on 2024-10-29 with reprex v2.1.1