rlongitudinal

calculating incidence of disease in R using start and end date and disease occurrence date


I have cohort study data with start and end dates for each patient. I would like to calculate the incidence of a disease in each year and each month from the first of January 2014 till the end of August 2021. How can I calculate person-months and person-years using the start and end date for each patient so I can get the incidence using the equation: number of new cases/ total population during time frame?

This is how my data currently looks like:

patid start_date end_date disease disease_date
1 01/03/1993 31/08/2021 yes 15/11/2017
2 24/03/2000 31/08/2021 no NA
3 01/03/2020 23/08/2021 yes 15/08/2020
4 24/03/2016 01/08/2019 no NA
5 24/03/2001 17/08/2020 no NA
6 01/03/1999 04/08/2014 yes 01/01/2014
7 01/03/2016 31/08/2018 yes 18/03/2017

Sample data:

df <- data.frame(patid=c("1","2","3","4","5","6","7"), 
                  
                  start_date=c("01/03/1993","24/03/2000", 
                               
                               "01/03/2020","24/03/2016", 
                               
                               "24/03/2001","01/03/1999", 
                               
                               "01/03/2016"), 
                  
                  end_date=c("31/08/2021","31/08/2021", 
                             
                             "23/08/2021","01/08/2019", 
                             
                             "17/08/2020","04/08/2014", 
                             
                             "31/08/2018"), 
                  
                  disease=c("yes","no","yes","no", 
                            
                            "no","yes","yes"), 
                  
                  disease_date=c("15/11/2017",NA, 
                                 
                                 "15/08/2020",NA,NA, 
                                 
                                 "01/01/2014","18/03/2017") )

Solution

  • Please try the below code where i used the formula number of events/(end_date-start_date+1/365.25)*100

    df2 <- df %>% mutate(start_date=as.Date(start_date,'%d/%m/%Y'), 
    end_date=as.Date(end_date,'%d/%m/%Y'), disease_date=as.Date(disease_date,'%d/%m/%Y'), 
                         person_year=as.numeric(end_date-start_date+1/365.25)
                         ) %>% group_by(patid) %>% mutate(n=n(),
                                                          per_year2=(n/person_year)*100)