rdplyr

dplyr - arrange on missingness in two variables


I have been stuck on this problem for hours and it's becoming somewhat frustrating. Basically I want to arrange some data so that the NA's appear first based on a grouping structure. I can get part of the way there, but nothing I try gets me to the desired result.

With this code,

df <-  df |> 
  group_by(AESOC, AEPT) |> 
  arrange(!is.na(AEPT), !is.na(Severity), .by_group = TRUE)

I have been able to achieve what is shown in the image.

enter image description here

But I would still like to arrange further so that rows 9-12 appear before row 1 and rows 25-28 appear before row 13 (i.e at the very beginning of the groups determined by AESOC and AEPT.

This small data is included here:

df <-  structure(list(AESOC = c("Blood and lymphatic system disorders", 
"Blood and lymphatic system disorders", "Blood and lymphatic system disorders", 
"Blood and lymphatic system disorders", "Blood and lymphatic system disorders", 
"Blood and lymphatic system disorders", "Blood and lymphatic system disorders", 
"Blood and lymphatic system disorders", "Blood and lymphatic system disorders", 
"Blood and lymphatic system disorders", "Blood and lymphatic system disorders", 
"Blood and lymphatic system disorders", "Cardiac disorders", 
"Cardiac disorders", "Cardiac disorders", "Cardiac disorders", 
"Cardiac disorders", "Cardiac disorders", "Cardiac disorders", 
"Cardiac disorders", "Cardiac disorders", "Cardiac disorders", 
"Cardiac disorders", "Cardiac disorders", "Cardiac disorders", 
"Cardiac disorders", "Cardiac disorders", "Cardiac disorders"
), AEPT = c("    Anaemia", "    Anaemia", "    Anaemia", "    Anaemia", 
"    Lymphopenia", "    Lymphopenia", "    Lymphopenia", "    Lymphopenia", 
NA, NA, NA, NA, "    Dizziness", "    Dizziness", "    Dizziness", 
"    Dizziness", "    Palpitations", "    Palpitations", "    Palpitations", 
"    Palpitations", "    Presyncope", "    Presyncope", "    Presyncope", 
"    Presyncope", NA, NA, NA, NA), Severity = c("        mild", 
"        moderate", "        severe", NA, "        mild", "        moderate", 
"        severe", NA, "    mild", "    moderate", "    severe", 
NA, "        mild", "        moderate", "        severe", NA, 
"        mild", "        moderate", "        severe", NA, "        moderate", 
"        mild", "        severe", NA, "    moderate", "    mild", 
"    severe", NA)), row.names = c(NA, -28L), class = c("tbl_df", 
"tbl", "data.frame"))

Any help would be greatly appreciated.


Solution

  • You can use arrange in the following way :

    library(dplyr)
    
    df %>% arrange(AESOC, !is.na(AEPT), AEPT, !is.na(Severity), Severity)
    

    which returns :

                                  AESOC             AEPT         Severity
    1  Blood and lymphatic system disorders             <NA>             <NA>
    2  Blood and lymphatic system disorders             <NA>             mild
    3  Blood and lymphatic system disorders             <NA>         moderate
    4  Blood and lymphatic system disorders             <NA>           severe
    5  Blood and lymphatic system disorders          Anaemia             <NA>
    6  Blood and lymphatic system disorders          Anaemia             mild
    7  Blood and lymphatic system disorders          Anaemia         moderate
    8  Blood and lymphatic system disorders          Anaemia           severe
    9  Blood and lymphatic system disorders      Lymphopenia             <NA>
    10 Blood and lymphatic system disorders      Lymphopenia             mild
    11 Blood and lymphatic system disorders      Lymphopenia         moderate
    12 Blood and lymphatic system disorders      Lymphopenia           severe
    13                    Cardiac disorders             <NA>             <NA>
    14                    Cardiac disorders             <NA>             mild
    15                    Cardiac disorders             <NA>         moderate
    16                    Cardiac disorders             <NA>           severe
    17                    Cardiac disorders        Dizziness             <NA>
    18                    Cardiac disorders        Dizziness             mild
    19                    Cardiac disorders        Dizziness         moderate
    20                    Cardiac disorders        Dizziness           severe
    21                    Cardiac disorders     Palpitations             <NA>
    22                    Cardiac disorders     Palpitations             mild
    23                    Cardiac disorders     Palpitations         moderate
    24                    Cardiac disorders     Palpitations           severe
    25                    Cardiac disorders       Presyncope             <NA>
    26                    Cardiac disorders       Presyncope             mild
    27                    Cardiac disorders       Presyncope         moderate
    28                    Cardiac disorders       Presyncope           severe
    

    and the same in base R :

    df[with(df, order(AESOC, !is.na(AEPT), AEPT, !is.na(Severity), Severity)), ]