rdataframedplyrslice

Arrange a dataframe conditionally based on a row's name, and then by column values


I'm building a table based on gender data, which has three categorical values and a great deal of missing data. I converted the NAs to "Missing" to create a fourth categorical value. I want to arrange the table so that it sorts the categories from largest to smallest values, with the "Missing" category always on the bottom.

However, since this table will feed from a regularly updated dataset, the order of categories based on their values may change, and I always want whichever non-missing data category that has the largest value to be shown first, and the missing category to always be last. This makes using the slice() function to arrange by index numbers not viable.

The code below produces a dataframe summarizing the categorical values, with "Missing" on top, as it is by default the largest category:

gender<-data.frame(peopledata$`Gender`)%>%
rename(`gender`=`peopledata.Gender`)%>%
mutate(`gender`= replace_na(`gender`,"Missing"))%>%    
table()%>%    
as.data.frame()%>%    
rename(`count`=`Freq`)%>%     
arrange(desc(`count`))

-----------------------------------------------------------------------------------------------

Output:

gender count
Missing 293
Male 28
Female 14
Undisclosed 1

-----------------------------------------------------------------------------------------------

Input data (reproducing the summarized table from the gender data, for simplicity's sake):

peopledata<-data.frame(gender = c("Missing", "Male", "Female", "Undisclosed"), count= c(293, 28, 14, 1))

-----------------------------------------------------------------------------------------------

My question is focused on the arrange(desc(`count`)) line, and how to alter it so that the "Missing" row is always down on the bottom, and to otherwise sort the rest of the rows by the "count" column. Is there a conditional call l I can feed into arrange() to facilitate that?


Solution

  • You could move the Missing row after sorting by count.

    library(dplyr)
    
    
    peopledata |>
      arrange(desc(count)) |>
      slice(order(gender == 'Missing'))
    #>        gender count
    #> 1        Male    28
    #> 2      Female    14
    #> 3 Undisclosed     1
    #> 4     Missing   293
    

    Or even simpler

    peopledata |>
      arrange(gender == 'Missing', desc(count))