rdplyrcasetidyicd

R code for depression severity based on ICD-10 criteria


I am trying to come up with an R code for depression severity based on ICD-10 criteria with data from the MDI (Major Depression Inventory). This questionnaire consists of 12 questions (mdi_1, mdi_2, mdi_3, mdi_4, mdi_5, mdi_6, mdi_7, mdi_8a, mdi_8b, mdi_9, mdi_10a, mdi_10b). The first three questions characterize the main criteria (in bold), the remaining questions the secondary criteria. For questions 8 (a,b) and 10 (a,b), only the higher value is counted.

The ICD-10 criterion for mild depression states that at least 2 main criteria (i.e. questions 1-3) must be rated with at least 4 points and 2 or 3 of the remaining 7 items must be rated with at least 3 points.

I am trying to implement this with dplyr, mutate, case_when at the moment by mapping every possible condition, but I wonder if there is a more tidy way.

The other depression severity categories work similar, so with a tidy solution for "mild depression", I can adapt it to the remaining categories.

Thank you for your help!

df |> 
  mutate(mdi_cat =
           case_when(
                 # Check if at least 2 out of the first 3 questions have a score of at least 4
                 (mdi_1 >= 4 & mdi_2 >= 4) | (mdi_1 >= 4 & mdi_3 >= 4) | (mdi_2 >= 4 & mdi_3 >= 4) &
                 # Check if 2 or 3 of the remaining 7 questions have a score of at least 3
                 ((mdi_4 >= 3 & mdi_5 >= 3) | (mdi_4 >= 3 & mdi_6 >= 3) | (mdi_4 >= 3 & mdi_7 >= 3) | ...
                  (mdi_4 >= 3 & mdi_5 >= 3 & mdi_6 >= 3) | (mdi_4 >= 3 & mdi_5 >= 3 & mdi_7 >= 3) | ... 
                  ) ~ "mild",
                 TRUE ~ "no" 
               ))```

Solution

  • Instead of checking each possible condition you could use across() with rowSums() to determine the number of questions that fulfil the conditions regarding the minimum score or points. Then you can easily check whether an individual fullfills both conditions. As a first step I collapse the a/b questions into one new column containing the max score using pmax.

    Using some fake random example data:

    set.seed(123)
    
    n <- 10
    max_pt <- 6
    
    df <- data.frame(
      mdi_1 = sample(max_pt, n, replace = TRUE),
      mdi_2 = sample(max_pt, n, replace = TRUE),
      mdi_3 = sample(max_pt, n, replace = TRUE),
      mdi_4 = sample(max_pt, n, replace = TRUE),
      mdi_5 = sample(max_pt, n, replace = TRUE),
      mdi_8a = sample(max_pt, n, replace = TRUE),
      mdi_8b = sample(max_pt, n, replace = TRUE),
      mdi_10a = sample(max_pt, n, replace = TRUE),
      mdi_10b = sample(max_pt, n, replace = TRUE)
    )
    
    library(dplyr, warn=FALSE)
    
    df |> 
      mutate(
        mdi_8 = pmax(mdi_8a, mdi_8b),
        mdi_10 = pmax(mdi_10a, mdi_10b)
      ) |> 
      mutate(
        # Number of first three questions with at least 4 points
        n_mdi_cat1 = rowSums(across(any_of(paste0("mdi_", 1:3)), ~ .x >= 4)),
        # Number of other questions with at least 3 points
        n_mdi_cat2 = rowSums(across(any_of(paste0("mdi_", 4:10)), ~ .x >= 3)),
        mdi_cat = case_when(
          n_mdi_cat1 >= 2 & n_mdi_cat2 >= 2 ~ "mild",
          .default = "no"
        )
      )
    #>    mdi_1 mdi_2 mdi_3 mdi_4 mdi_5 mdi_8a mdi_8b mdi_10a mdi_10b mdi_8 mdi_10
    #> 1      3     6     1     1     4      5      4       1       1     5      1
    #> 2      6     1     5     3     5      2      4       2       6     4      6
    #> 3      3     2     3     5     5      1      6       4       1     6      4
    #> 4      2     3     2     4     3      1      6       5       3     6      5
    #> 5      2     5     2     2     6      3      3       5       6     3      6
    #> 6      6     3     1     5     1      1      6       6       4     6      6
    #> 7      3     3     6     1     2      6      6       3       1     6      3
    #> 8      5     1     3     1     5      5      1       1       6     5      6
    #> 9      4     4     4     2     5      1      6       4       6     6      6
    #> 10     6     1     6     3     4      2      2       6       3     2      6
    #>    n_mdi_cat1 n_mdi_cat2 mdi_cat
    #> 1           1          2      no
    #> 2           2          4    mild
    #> 3           0          4      no
    #> 4           0          4      no
    #> 5           1          3      no
    #> 6           1          3      no
    #> 7           1          2      no
    #> 8           1          3      no
    #> 9           3          3    mild
    #> 10          2          3    mild