rcountfrequencylongitudinal

Exploring data coverage for different time point ranges in longitudinal (weekly) data


This is a very basic question and maybe more about data wrangling than coding, sorry.

I have a dataset in which 1520 participants were measured once a week for 12 weeks. However, there are a lot of missing values, and different participants have provided different amount of data, and data from different weeks (e.g. some people have data from all 12 weeks, some from weeks 1-3 only, some from weeks 4-8 only, some from 1-2 AND 5-7 but not from 3-4 etc.).

I can easily compute how many participants have 1, 2, 3 etc. observations and also how many observations I have from week 1, 2 etc. However I'd like to find out what is my data coverage for different week ranges (e.g. what is the number of observations for weeks 1-5, 2-6, 3-9...). In addition, missing "week rows" per participant have been removed before I received this data (see below). I'm using R (I have a feeling this is something I should be able to do from frequencies table but I can't wrap my brain around it).

reprex with 5 participants and 10 weeks:

id<-rep(1:5, each=10)
week<-rep(1:10, times=5)
outcome<-rnorm(50)  #outcome is probably not needed for the example, but I put it in for  
                    #completeness
ind <- which(week %in% sample(week, 15))
week[ind]<-NA
exdata<-data.frame(id, week, outcome)
exdata2<-subset(exdata, exdata$week != "NA")

Could someone suggest a procedure with which can I find out the amount of data coverage for different "week ranges" from data like this? Thanks in advance!


Solution

  • Defining a function might help since you did not specify how many ranges you need to calculate this for.

    count_weeks_in_range <- function(dataset, start_value, end_value) {
      # Filter to select rows where the 'week' column falls within the specified range
      filtered_data <- subset(dataset, week >= start_value & week <= end_value)
      
      # Count the number of rows
      count <- nrow(filtered_data)
      
      return(count)
    }
    
    # Define the range you want to count
    start_range <- 3
    end_range <- 7
    
    count <- count_weeks_in_range(exdata2, start_range, end_range)
    
    cat("Number of occurrences in the range", start_range, "-", end_range, ":", count, "\n")
    

    Hope this helps :)