rdplyr

Get position index of second highest value by row for selected columns


Below is a reproducible example with a test dataframe:

ts<-structure(list(OP = c(1, 3, 5), x = c(0.405595375690609, 0.256223899079487, 
0.0131878938991576), y = c(0.792096293298528, 0.0120686823502183, 
0.518370501697063), z = c(0.400826650671661, 0.279657100560144, 
0.409006189322099)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-3L))

I want to add a column using mutate that gets me for each row the column position of the second highest value within the selected columns. So far I'm only able to get the index column of the maximum value within selected columns by doing:

td<-ts %>% 
  rowwise() %>% 
  mutate(second_max = which.max(c_across(c(2:4))))

However, for this specific dataframe, my desired output would be:

OP x y z second_max
1 0.4055954 0.7920963 0.4008267 1
3 0.2562239 0.0120687 0.2796571 1
5 0.0131879 0.5183705 0.4090062 3

Solution

  • One possibility could be:

    ts %>%
     rowwise() %>%
     mutate(second_max = which(dense_rank(-c_across(c(2:4))) == 2))
    
         OP      x      y     z second_max
      <dbl>  <dbl>  <dbl> <dbl>      <int>
    1     1 0.406  0.792  0.401          1
    2     3 0.256  0.0121 0.280          1
    3     5 0.0132 0.518  0.409          3
    

    The same idea with the addition of purrr:

    ts %>%
     mutate(secod_max = pmap_int(across(c(2:4)), ~ which(dense_rank(-c(...)) == 2)))