rdplyr

Hw can I use arrange in dplyr to order groups?


I would like to group data and then arrange the table so that groups with the highest values are shown first. E.g. in mtcars dataset, I would like to group the cars by number of cylinders and then arrange the table so that the groups with the highest mean mpg are shown first

mtcars %>% group_by (cyl)  %>% arrange (desc(mean (mpg)))

this produces an error:

Error: incorrect size (1) at position 1, expecting : 32

the reason I am asking is that filter() when applied after group_by() is applied to the whole group, not individual rows.


Solution

  • Perhaps this? First, group by cyl, then fill a new column with mean(mpg), which you can then arrange by however you want, and finally remove the temporary mean(mpg) column.

    mtcars %>% 
      group_by(cyl) %>% 
      mutate(mean_mpg = mean(mpg)) %>%
      arrange(desc(mean_mpg)) %>%
      select(-mean_mpg)
    
    #> # A tibble: 32 x 11
    #> # Groups:   cyl [3]
    #>      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
    #>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
    #>  1  22.8     4 108      93  3.85  2.32  18.6     1     1     4     1
    #>  2  24.4     4 147.     62  3.69  3.19  20       1     0     4     2
    #>  3  22.8     4 141.     95  3.92  3.15  22.9     1     0     4     2
    #>  4  32.4     4  78.7    66  4.08  2.2   19.5     1     1     4     1
    #>  5  30.4     4  75.7    52  4.93  1.62  18.5     1     1     4     2
    #>  6  33.9     4  71.1    65  4.22  1.84  19.9     1     1     4     1
    #>  7  21.5     4 120.     97  3.7   2.46  20.0     1     0     3     1
    #>  8  27.3     4  79      66  4.08  1.94  18.9     1     1     4     1
    #>  9  26       4 120.     91  4.43  2.14  16.7     0     1     5     2
    #> 10  30.4     4  95.1   113  3.77  1.51  16.9     1     1     5     2
    #> # ... with 22 more rows