rdplyrtidyversetidy

Count Categorical Variables Summarized by Month and Year


I am attempting to summarize a count of categorical variables by month and year. Below shows what I obtain versus what I desire.

library(dplyr)
library(tidyverse)
library(lubridate)

data <- tibble(Date = as.Date(c("2021-10-01", "2021-10-01", "2021-10-01", "2022-10-01")),
               b = c("honey", "honey", "free", "love"))
               
data

# A tibble: 4 × 2
#  Date       b    
#  <date>     <chr>
#1 2021-10-01 honey
#2 2021-10-01 honey
#3 2021-10-01 free 
#4 2022-10-01 love 

The code:

data %>% group_by(year=year(Date), month=month(Date)) %>% count()

Produces this:

# A tibble: 2 × 3
# Groups:   year, month [2]
#   year month     n
#   <dbl> <dbl> <int>
# 1  2021    10     3
# 2  2022    10     1

But this is what I desire as output ...

#    year month   honey   free    love
#   <dbl> <dbl>   <int>  <int>   <int>
# 1  2021    10       2      1       0
# 2  2022    10       0      0       1


Solution

  • You can use tidyr::pivot_wider() with the values_fn (to count how many occurrences using length) and values_fill (to replace NA values with 0):

    library(dplyr)
    library(tidyr)
    
    data %>% 
      pivot_wider(names_from = b, 
                  values_from = b, 
                  values_fn = length,
                  values_fill = 0)
    

    Output:

     Date       honey  free  love
      <date>     <int> <int> <int>
    1 2021-10-01     2     1     0
    2 2022-10-01     0     0     1
    

    If you want to add month in there, just use a simple mutate after pivoting:

    data %>% 
      pivot_wider(names_from = b, 
                  values_from = b, 
                  values_fn = length,
                  values_fill = 0) %>%
      mutate(month = lubridate::month(Date))
    

    Output:

     Date       honey  free  love month
      <date>     <int> <int> <int> <dbl>
    1 2021-10-01     2     1     0    10
    2 2022-10-01     0     0     1    10