I have the following long format data frame with columns, id, age, and BMI. I have restricted the dataset such that only people (id) with at least 3 repeated measurements between age 2 weeks and 10 years.
How can I further restrict this sample so that it includes only people (id) with at least 3 repeated measurements between age 2 weeks and 10 years, and those people have to have at least one measurement before age 2 years, and at least one measurement between ages 5 and 7 years?
In other words, I want to exclude id's that
I think tidyverse's between(x, left, right)
function might do the trick but not sure: https://dplyr.tidyverse.org/reference/between.html
library(tidyverse)
# load data set & restrict to age between 2 weeks and 10 years
dat <- read.csv((
"https://raw.githubusercontent.com/aelhak/NCRM2023/main/bmi_long.csv")) %>%
select(id, age, bmi) %>% filter(age > 0.038 & age < 10.1)
# restrict to 3+ repeat measurements
dat <- subset(dat, id %in% with(rle(dat$id), values[lengths > 2]))
You can filter out the data with a basic grouped filter command and the use of any()
dat %>%
filter(any(age<2) & any(age>=5 & age <=7), .by=id)