Having trouble developing code that will make a graph to count per month the number of employees in the company by the Original Hire Date and Termination Date:
structure(list(original_hire_date = c("7/8/2019", "7/15/2019", "7/29/2019", "8/5/2019", "8/12/2019", "8/19/2019", "8/26/2019", "8/26/2019", "8/26/2019", "9/3/2019", "9/9/2019", "9/9/2019", "10/8/2019", "9/30/2019", "9/30/2019", "9/30/2019", "9/30/2019", "9/30/2019", "9/30/2019", "9/30/2019", "9/30/2019", "9/30/2019", "9/30/2019", "10/14/2019", "10/28/2019"), termination_date = c(NA, NA, NA, "8/21/2020", NA, "6/30/2020", NA, "7/25/2020", NA, NA, NA, NA, "8/21/2020", "6/30/2020", NA, "6/30/2020", NA, "6/30/2020", "6/5/2020", "6/30/2020", "6/30/2020", NA, "3/2/2020", "8/27/2021", NA)), row.names = c(NA, -25L), class = c("tbl_df", "tbl", "data.frame" ))
The goal is to create a graph by Year and Month (ex. July 2019) and the count of employees during that time. Since some employees have not left, it is really subtracting and adding employees over time from when they start or left the company.
Here, I reshape long, count hires as +1 and terminations as -1, summarize by date, then calc the count as the cumulative sum of hires and terminations. (One possible variation might be to adjust the date on terminations to be one day later -- for example if someone worked a single day, we might think of them being +1 in the morning and -1 after the day, so they contributed +1 for the day, not zero. I have not made that adjustment here.)
library(tidyverse)
df1 |>
pivot_longer(1:2) %>%
mutate(change = if_else(name == "original_hire_date", 1, -1),
date = lubridate::mdy(value)) %>%
filter(!is.na(date)) |>
arrange(date) |>
count(date, wt = change, name = "change") |>
complete(date = seq.Date(min(date), max(date), by = 1), # to fill in all days,
fill = list(change = 0)) |> # so count doesn't drift between observations
mutate(count = cumsum(change)) |>
ggplot(aes(date, count)) +
geom_line()