I have cohort study data with start and end dates for each patient. I would like to calculate the incidence of a disease in each year and each month from the first of January 2014 till the end of August 2021. How can I calculate person-months and person-years using the start and end date for each patient so I can get the incidence using the equation: number of new cases/ total population during time frame?
This is how my data currently looks like:
patid | start_date | end_date | disease | disease_date |
---|---|---|---|---|
1 | 01/03/1993 | 31/08/2021 | yes | 15/11/2017 |
2 | 24/03/2000 | 31/08/2021 | no | NA |
3 | 01/03/2020 | 23/08/2021 | yes | 15/08/2020 |
4 | 24/03/2016 | 01/08/2019 | no | NA |
5 | 24/03/2001 | 17/08/2020 | no | NA |
6 | 01/03/1999 | 04/08/2014 | yes | 01/01/2014 |
7 | 01/03/2016 | 31/08/2018 | yes | 18/03/2017 |
Sample data:
df <- data.frame(patid=c("1","2","3","4","5","6","7"),
start_date=c("01/03/1993","24/03/2000",
"01/03/2020","24/03/2016",
"24/03/2001","01/03/1999",
"01/03/2016"),
end_date=c("31/08/2021","31/08/2021",
"23/08/2021","01/08/2019",
"17/08/2020","04/08/2014",
"31/08/2018"),
disease=c("yes","no","yes","no",
"no","yes","yes"),
disease_date=c("15/11/2017",NA,
"15/08/2020",NA,NA,
"01/01/2014","18/03/2017") )
Please try the below code where i used the formula number of events/(end_date-start_date+1/365.25)*100
df2 <- df %>% mutate(start_date=as.Date(start_date,'%d/%m/%Y'),
end_date=as.Date(end_date,'%d/%m/%Y'), disease_date=as.Date(disease_date,'%d/%m/%Y'),
person_year=as.numeric(end_date-start_date+1/365.25)
) %>% group_by(patid) %>% mutate(n=n(),
per_year2=(n/person_year)*100)