I'm trying to see the number of new employees a manager got between time one and time 2. I have a string of all employee ids that roll up under that manager.
My below code always says there is 1 new employee, but as you can see, there's 2. How do I find out how many new employees there are? The ids aren't guaranteed to always be in the same order, but they will always be split by a ", ".
library(dplyr)
library(stringr)
#First data set
mydata_q2 <- tibble(
leader = 1,
reports_q2 = "2222, 3333, 4444"
)
#Second dataset
mydata_q3 <- tibble(
leader = 1,
reports_q3 = "2222, 3333, 4444, 55555, 66666"
)
#Function to count number of new employees
calculate_number_new_emps <- function(reports_time1, reports_time2) {
time_1_reports <- ifelse(is.na(reports_time1), character(0), str_split(reports_time1, " ,\\s*")[[1]])
time_2_reports <- str_split(reports_time2, " ,\\s*")[[1]]
num_new_employees <- length(setdiff(time_1_reports, time_2_reports))
num_new_employees
}
#Join data and count number of new staff--get wrong answer
mydata_q2 %>%
left_join(mydata_q3) %>%
mutate(new_staff_count = calculate_number_new_emps(reports_q2, reports_q3))
EDIT:
The output that I want is for new_staff_count = 2 for this example.
That's because there are 2 new employees (55555 and 66666) in q3 that weren't in time q2.
The ifelse statement is not working correctly. You need to use the if/then/else construct. Then calculate the difference between the two vector lenghts.
calculate_number_new_emps <- function(reports_time1, reports_time2) {
if (is.na(reports_time1))
{time_1_reports <-character(0)}
else
{time_1_reports <- str_split(reports_time1, ",\\s*")[[1]]}
print(time_1_reports)
time_2_reports <- str_split(reports_time2, ",\\s*")[[1]]
num_new_employees <- length(time_2_reports) - length(time_1_reports)
num_new_employees
}
#Join data and count number of new staff--get wrong answer
mydata_q2 %>%
left_join(mydata_q3) %>%
mutate(new_staff_count = calculate_number_new_emps(reports_q2, reports_q3))
EDIT from Original Poster:
Thank you, Dave! I was able to simplify. Also, I modified the equation because I got negative numbers of new staff if someone had more count at time 1 than time 2, and if employees just changed, then it gave the wrong count.
calculate_number_new_emps <- function(reports_time1, reports_time2) {
time_1_reports <- str_split(reports_time1, ", ")[[1]]
time_2_reports <- str_split(reports_time2, ", ")[[1]]
length(setdiff(time_2_reports, time_1_reports))
}