I want to create a new column in a data frame that generates a unique value combining info from another two columns, regardless of order.
Example
df = tibble(x = c(1,2,3,3,4,10,9), y=c(2,1,9,9,9,1,3))
df
# A tibble: 7 × 2
x y
<dbl> <dbl>
1 1 2
2 2 1
3 3 9
4 3 9
5 4 9
6 10 1
7 9 3
I want to generate this
# A tibble: 7 × 3
x y type
<dbl> <dbl> <dbl>
1 1 2 1
2 2 1 1
3 3 9 2
4 3 9 2
5 4 9 3
6 10 1 4
7 9 3 2
How can this be achieved for a general data frame?
EDIT: This is not the same question as those being linked.
The suggested answers results in
> df |>
+ group_by(x,y) |>
+ mutate(type = cur_group_id())
# A tibble: 7 × 3
# Groups: x, y [6]
x y type
<dbl> <dbl> <int>
1 1 2 1
2 2 1 2
3 3 9 3
4 3 9 3
5 4 9 4
6 10 1 6
7 9 3 5
which is wrong.
For the case with two columns, we can neutralize the ordering by (arbitrarily) putting the two columns in order when determining their group.
df |>
mutate(grp = paste(pmin(x,y), pmax(x,y))) |>
mutate(type = cur_group_id(), .by = grp)
Result
x y grp type
<dbl> <dbl> <chr> <int>
1 1 2 1 2 1
2 2 1 1 2 1
3 3 9 3 9 2
4 3 9 3 9 2
5 4 9 4 9 3
6 10 1 1 10 4
7 9 3 3 9 2