rdata-manipulationrecoderle

create new order for existing column values without reordering rows in dataframe - R


I have some results cluster labels from kmeans done on different ids (reprex example below). the problem is the kmeans clusters codes are not ordered consistently across ids although all ids have 3 clusters.

reprex = data.frame(id = rep(1:2, each = 41, 
           v1 = rep(seq(1:4), 2),
           cluster = c(2,2,1,3,3,1,2,2))

reprex
   id v1 cluster
1  1  1       2
2  1  2       2
3  1  3       1
4  1  4       3
5  2  1       3
6  2  2       1
7  2  3       2
8  2  4       2

what I want is that the variable cluster should always start with 1 within each ID. Note I don't want to reorder that dataframe by cluster, the order needs to remain the same. so the desired result would be:

reprex_desired<- data.frame(id = rep(1:2, each = 4), 
           v1 = rep(seq(1:4), 2),
           cluster = c(2,2,1,3,3,1,2,2),
           what_iWant = c(1,1,2,3,1,2,3,3))

reprex_desired
  id v1 cluster what_iWant
1  1  1       2          1
2  1  2       2          1
3  1  3       1          2
4  1  4       3          3
5  2  1       3          1
6  2  2       1          2
7  2  3       2          3
8  2  4       2          3


Solution

  • We can use match after grouping by 'id'

    library(dplyr)
    reprex <- reprex %>%
         group_by(id) %>% 
         mutate(what_IWant = match(cluster, unique(cluster))) %>%
         ungroup
    

    -output

    reprex
    # A tibble: 8 × 4
         id    v1 cluster what_IWant
      <int> <int>   <dbl>      <int>
    1     1     1       2          1
    2     1     2       2          1
    3     1     3       1          2
    4     1     4       3          3
    5     2     1       3          1
    6     2     2       1          2
    7     2     3       2          3
    8     2     4       2          3