rdplyrtidyverse

How to count the frequency of values per month so they are in the right order (Jan to Dec) over multiple years


I have a large data frame (388 x 729) and I am trying to calculate the frequency of daffodil bulbs (numeric column) per month (factor column)_over 14 years.

Structure of my data frame:

$ Month                              : Factor w/ 18 levels "April","April ",..: 9 8 8 8 8 8 8 8 8 1 ...
$ Daffodil Bulbs                     : num  0 3 0 3 2 1 0 0 0 0 ...

When I run my code, the months are in the wrong order and it's not pooling the frequency of daffodil bulbs over the 14 years per month:

Count_Daffodils <- MyDf %>% mutate(Month=ordered(trimws(Month), levels=month.name)) %>% 
                               group_by(Month) %>%
                               count(Number_Daffodils, sort=TRUE) %>%
                               ungroup()  
                   

Current output:

# A tibble: 92 × 3
   Month     Number_Boats     n
   <ord>            <dbl> <int>
 1 June                 0    86
 2 May                  0    73
 3 November             0    67
 4 April                0    51
 5 February             0    46
 6 March                0    36
 7 December             0    35
 8 July                 0    24
 9 October              0    22
10 September            0    18
# ℹ 82 more rows
# ℹ Use `print(n = ...)` to see more rows

The output I'm trying to achieve:

   Month     Number_Daffodils     
   <ord>            <int>
 1 January            86
 2 February           73
 3 March              67
 4 April              51
 5 May                46
 6 June               36
 7 July               35
 8 August             24
 9 September          22
10 October            56
11 November           14
12 December           18

Solution

  • Currently your code is:

    If you're looking to sum per month then summarise is what you want to use. Your factoring of Month is working perfectly to order the months variable, which can then be sorted after summing:

    library(tidyverse)
    
    tibble(
      Month = sample(month.name, 120, replace = TRUE),
      Number_Daffodils = sample(1:5, 120, replace = TRUE)
    ) |> 
      mutate(Month = factor(Month, levels = month.name)) |> 
      summarise(Number_Daffodils = sum(Number_Daffodils), .by = "Month") |> 
      arrange(Month)
    #> # A tibble: 12 × 2
    #>    Month     Number_Daffodils
    #>    <fct>                <int>
    #>  1 January                 23
    #>  2 February                44
    #>  3 March                   50
    #>  4 April                   33
    #>  5 May                     25
    #>  6 June                    24
    #>  7 July                    33
    #>  8 August                  40
    #>  9 September               45
    #> 10 October                 20
    #> 11 November                24
    #> 12 December                27