rdplyrtidyversedata-manipulationdata-wrangling

restrict to those with data at specific age ranges in R


I have the following long format data frame with columns, id, age, and BMI. I have restricted the dataset such that only people (id) with at least 3 repeated measurements between age 2 weeks and 10 years.

How can I further restrict this sample so that it includes only people (id) with at least 3 repeated measurements between age 2 weeks and 10 years, and those people have to have at least one measurement before age 2 years, and at least one measurement between ages 5 and 7 years?

In other words, I want to exclude id's that

I think tidyverse's between(x, left, right) function might do the trick but not sure: https://dplyr.tidyverse.org/reference/between.html

library(tidyverse)
    
    # load data set & restrict to age between 2 weeks and 10 years
    
    dat <- read.csv((
      "https://raw.githubusercontent.com/aelhak/NCRM2023/main/bmi_long.csv")) %>% 
      select(id, age, bmi) %>% filter(age > 0.038 & age < 10.1)
    
    # restrict to 3+ repeat measurements
    
    dat <- subset(dat, id %in% with(rle(dat$id), values[lengths > 2]))

Solution

  • You can filter out the data with a basic grouped filter command and the use of any()

    dat %>% 
      filter(any(age<2) & any(age>=5 & age <=7), .by=id)