I am attempting to summarize a count of categorical variables by month and year. Below shows what I obtain versus what I desire.
library(dplyr)
library(tidyverse)
library(lubridate)
data <- tibble(Date = as.Date(c("2021-10-01", "2021-10-01", "2021-10-01", "2022-10-01")),
b = c("honey", "honey", "free", "love"))
data
# A tibble: 4 × 2
# Date b
# <date> <chr>
#1 2021-10-01 honey
#2 2021-10-01 honey
#3 2021-10-01 free
#4 2022-10-01 love
The code:
data %>% group_by(year=year(Date), month=month(Date)) %>% count()
Produces this:
# A tibble: 2 × 3
# Groups: year, month [2]
# year month n
# <dbl> <dbl> <int>
# 1 2021 10 3
# 2 2022 10 1
But this is what I desire as output ...
# year month honey free love
# <dbl> <dbl> <int> <int> <int>
# 1 2021 10 2 1 0
# 2 2022 10 0 0 1
You can use tidyr::pivot_wider()
with the values_fn
(to count how many occurrences using length
) and values_fill
(to replace NA
values with 0):
library(dplyr)
library(tidyr)
data %>%
pivot_wider(names_from = b,
values_from = b,
values_fn = length,
values_fill = 0)
Output:
Date honey free love
<date> <int> <int> <int>
1 2021-10-01 2 1 0
2 2022-10-01 0 0 1
If you want to add month
in there, just use a simple mutate
after pivoting:
data %>%
pivot_wider(names_from = b,
values_from = b,
values_fn = length,
values_fill = 0) %>%
mutate(month = lubridate::month(Date))
Output:
Date honey free love month
<date> <int> <int> <int> <dbl>
1 2021-10-01 2 1 0 10
2 2022-10-01 0 0 1 10