rdataframetidyversetop-n

R dataframe - Top n values in row with column names


I want to sort rowwise values in specific columns, get top 'n' values, and get corresponding column names in new columns.

The output would look something like this:

      SL   SW    PL    PW   Species high1 high2 high3 col1 col2  col3
      dbl> <dbl> <dbl> <dbl> <fct>  <dbl> <dbl>  <dbl>
 1     5.1  3.5  1.4    0.2   setosa   3.5  1.4   0.2  SW   PL     PW
 2     4.9  3    1.4    0.2   setosa   3    1.4   0.2  SW   PL     PW
 3     4.7  3.2  1.3    0.2   setosa   3.2  1.3   0.2  SW   PL     PW

Tried something like code below, but unable to get column names. What I'm hoping to achieve is to compare the highest 'n' values (rows[n]) with values in dataframe for each row, and then extract corresponding column name of matching value. For eg. rows[1] == 3.5 (from column 'SW'). Is this feasible? Help appreciated.

 iris %>% 
      rowwise() %>%
      mutate(rows = list(sort(c( Sepal.Width, Petal.Length, Petal.Width), decreasing = TRUE))) %>%
      mutate(high1 = rows[1], col1 = names(~.)[which(~.[] ==rows[1]),
             high2 = rows[2], col2 = names(~.)[which(~.[] ==rows[2]),
             high3 = rows[3], col3 = names(~.)[which(~.[] ==rows[3])
             ) %>%
      select(-rows)

Solution

  • You could pivot to long, group by the corresponding original row, use slice_max to get the top values, then pivot back to wide and bind that output to the original table.

    library(dplyr, warn.conflicts = FALSE)
    library(tidyr)
    
    iris %>% 
      group_by(rn = row_number()) %>% 
      pivot_longer(-c(Species, rn), 'col', values_to = 'high') %>% 
      slice_max(col, n = 2) %>% 
      mutate(nm = row_number()) %>% 
      pivot_wider(values_from = c(high, col), 
                  names_from = nm) %>% 
      ungroup() %>% 
      select(-c(Species, rn)) %>% 
      bind_cols(iris)
    #> # A tibble: 150 × 9
    #>    high_1 high_2 col_1   col_2 Sepal.Length Sepal.Width Petal.Length Petal.Width
    #>     <dbl>  <dbl> <chr>   <chr>        <dbl>       <dbl>        <dbl>       <dbl>
    #>  1    5.1    3.5 Sepal.… Sepa…          5.1         3.5          1.4         0.2
    #>  2    4.9    3   Sepal.… Sepa…          4.9         3            1.4         0.2
    #>  3    4.7    3.2 Sepal.… Sepa…          4.7         3.2          1.3         0.2
    #>  4    4.6    3.1 Sepal.… Sepa…          4.6         3.1          1.5         0.2
    #>  5    5      3.6 Sepal.… Sepa…          5           3.6          1.4         0.2
    #>  6    5.4    3.9 Sepal.… Sepa…          5.4         3.9          1.7         0.4
    #>  7    4.6    3.4 Sepal.… Sepa…          4.6         3.4          1.4         0.3
    #>  8    5      3.4 Sepal.… Sepa…          5           3.4          1.5         0.2
    #>  9    4.4    2.9 Sepal.… Sepa…          4.4         2.9          1.4         0.2
    #> 10    4.9    3.1 Sepal.… Sepa…          4.9         3.1          1.5         0.1
    #> # … with 140 more rows, and 1 more variable: Species <fct>
    

    Created on 2022-02-16 by the reprex package (v2.0.1)

    Edited to remove the unnecessary rename and mutate, thanks to tip from @Onyambu!