[SOLVED] create new order for existing column values without reordering rows in dataframe

create new order for existing column values without reordering rows in dataframe - R

I have some results cluster labels from kmeans done on different ids (reprex example below). the problem is the kmeans clusters codes are not ordered consistently across ids although all ids have 3 clusters.

reprex = data.frame(id = rep(1:2, each = 41, 
           v1 = rep(seq(1:4), 2),
           cluster = c(2,2,1,3,3,1,2,2))

reprex
   id v1 cluster
1  1  1       2
2  1  2       2
3  1  3       1
4  1  4       3
5  2  1       3
6  2  2       1
7  2  3       2
8  2  4       2

what I want is that the variable cluster should always start with 1 within each ID. Note I don't want to reorder that dataframe by cluster, the order needs to remain the same. so the desired result would be:

reprex_desired<- data.frame(id = rep(1:2, each = 4), 
           v1 = rep(seq(1:4), 2),
           cluster = c(2,2,1,3,3,1,2,2),
           what_iWant = c(1,1,2,3,1,2,3,3))

reprex_desired
  id v1 cluster what_iWant
1  1  1       2          1
2  1  2       2          1
3  1  3       1          2
4  1  4       3          3
5  2  1       3          1
6  2  2       1          2
7  2  3       2          3
8  2  4       2          3

Solution

We can use match after grouping by 'id'

library(dplyr)
reprex <- reprex %>%
     group_by(id) %>% 
     mutate(what_IWant = match(cluster, unique(cluster))) %>%
     ungroup

-output

reprex
# A tibble: 8 × 4
     id    v1 cluster what_IWant
  <int> <int>   <dbl>      <int>
1     1     1       2          1
2     1     2       2          1
3     1     3       1          2
4     1     4       3          3
5     2     1       3          1
6     2     2       1          2
7     2     3       2          3
8     2     4       2          3