I have a data frame that looks like this:
w<-read.table(header=TRUE, text="
start.date end.date manager
2006-05-01 2007-04-30 a
2006-09-30 2007-12-31 b
1999-09-30 2007-12-31 c
2008-01-01 2012-04-30 d
2008-01-01 2020-02-28 e
2009-05-01 2016-04-08 f")
I'd like to obtain a dataframe which returns which managers were working during each month in the period, so for example
df<-read.table(header=TRUE, text="
month manager1 manager2 manager3 manager4
01-2006 a b c NA
02-2006 a b c d
03-2006 b c d NA
04-2006 b d NA NA")
I started by defining a function datseq that returns the months between the start.date and the end.date
datseq <- function(t1, t2) {
format(seq.Date(from = as.Date(t1,"%Y-%m-%d"),
to = as.Date(t2,"%Y-%m-%d"),by="month"),
"%m/%Y")
but then I cannot create a proper loop to obtain the desired result. Thank you in advance to everyone replying!
Since you need to know only the overlap at the month level and not the day level, you can consider that managers have started at day 1 and left at last day of month. This can be achieved using floor_date
and ceiling_date
from the package lubridate
.
library(lubridate)
w.extended <- w
w.extended$start.date <- floor_date(as.Date(w.extended$start.date), "month")
w.extended$end.date <- ceiling_date(as.Date(w.extended$end.date), "month") - 1
#List of months
timeperiod <- seq(min(w.extended$start.date),
by = "month",
to = max(w.extended$end.date))
You can then use a %within% b
from the package lubridate
which can check if a date falls within a list of intervals. Apply this function to each of your months with the intervals you provided.
df <- data.frame(t(sapply(timeperiod,
function(x){
managersWorking <- x %within% interval(w.extended$start.date,
w.extended$end.date)
c(as.character(x), managersWorking)
})),
stringsAsFactors = F)
#Replace the 'character' format of columns to the appropriate one
df[-1] <- apply(df[-1], 2, as.logical)
df[,1]<- format(as.Date(df[,1]), "%Y/%m")
colnames(df) <- c("month", paste0("manager.", w$manager))
head(df)
# month manager.a manager.b manager.c manager.d manager.e manager.f
#1 1999/09 FALSE FALSE TRUE FALSE FALSE FALSE
#2 1999/10 FALSE FALSE TRUE FALSE FALSE FALSE
#3 1999/11 FALSE FALSE TRUE FALSE FALSE FALSE
#4 1999/12 FALSE FALSE TRUE FALSE FALSE FALSE
#5 2000/01 FALSE FALSE TRUE FALSE FALSE FALSE
#6 2000/02 FALSE FALSE TRUE FALSE FALSE FALSE
Raw data:
w <- read.table(header=TRUE, text="
start.date end.date manager
2006-05-01 2007-04-30 a
2006-09-30 2007-12-31 b
1999-09-30 2007-12-31 c
2008-01-01 2012-04-30 d
2008-01-01 2020-02-28 e
2009-05-01 2016-04-08 f")
w
# start.date end.date manager
#1 2006-05-01 2007-04-30 a
#2 2006-09-30 2007-12-31 b
#3 1999-09-30 2007-12-31 c
#4 2008-01-01 2012-04-30 d
#5 2008-01-01 2020-02-28 e
#6 2009-05-01 2016-04-08 f