This is a very basic question and maybe more about data wrangling than coding, sorry.
I have a dataset in which 1520 participants were measured once a week for 12 weeks. However, there are a lot of missing values, and different participants have provided different amount of data, and data from different weeks (e.g. some people have data from all 12 weeks, some from weeks 1-3 only, some from weeks 4-8 only, some from 1-2 AND 5-7 but not from 3-4 etc.).
I can easily compute how many participants have 1, 2, 3 etc. observations and also how many observations I have from week 1, 2 etc. However I'd like to find out what is my data coverage for different week ranges (e.g. what is the number of observations for weeks 1-5, 2-6, 3-9...). In addition, missing "week rows" per participant have been removed before I received this data (see below). I'm using R (I have a feeling this is something I should be able to do from frequencies table but I can't wrap my brain around it).
reprex with 5 participants and 10 weeks:
id<-rep(1:5, each=10)
week<-rep(1:10, times=5)
outcome<-rnorm(50) #outcome is probably not needed for the example, but I put it in for
#completeness
ind <- which(week %in% sample(week, 15))
week[ind]<-NA
exdata<-data.frame(id, week, outcome)
exdata2<-subset(exdata, exdata$week != "NA")
Could someone suggest a procedure with which can I find out the amount of data coverage for different "week ranges" from data like this? Thanks in advance!
Defining a function might help since you did not specify how many ranges you need to calculate this for.
count_weeks_in_range <- function(dataset, start_value, end_value) {
# Filter to select rows where the 'week' column falls within the specified range
filtered_data <- subset(dataset, week >= start_value & week <= end_value)
# Count the number of rows
count <- nrow(filtered_data)
return(count)
}
# Define the range you want to count
start_range <- 3
end_range <- 7
count <- count_weeks_in_range(exdata2, start_range, end_range)
cat("Number of occurrences in the range", start_range, "-", end_range, ":", count, "\n")
Hope this helps :)