rselect

Selecting columns based on specific values in the field


I'm creating a table based on survey data that asks respondents to rank their satisfaction with elements of an event. If a respondent indicated dissatisfaction with an element of the event, they received a free-text child question asking them to elaborate on their dissatisfaction.

What I currently have is all event elements and their rankings formatted as individual columns, along with the general free-text dissatisfaction feedback field, filtered to exclude NA values in the free-text field:

# Current Output 

Response_ID | Reasons_for_Dissatisfaction | Rank_Food             | Rank_Facilities       | Rank_Content
----------------------------------------------------------------------------------------------------------     
1           | <Free-text feedback>        | Somewhat dissatisfied | Very dissatisfied     | Very satisfied 
----------------------------------------------------------------------------------------------------------
2           | <Free-text feedback>        | Very dissatisfied     | Somewhat dissatisfied | Very satisfied

I want my table to only display elements of the event that respondents were either "Somewhat dissatisfied" or "Very dissatisfied" with as columns, and the reason for the rating as an additional column (free-text). It should look something like this:

# Desired Output

Response_ID | Reasons_for_Dissatisfaction | Rank_Food             | Rank_Facilities       
-----------------------------------------------------------------------------------------
1           | <Free-text feedback>        | Somewhat dissatisfied | Very dissatisfied
-----------------------------------------------------------------------------------------
2           | <Free-text feedback>        | Very dissatisfied     | Somewhat dissatisfied

I want the code to update as the dataset updates, so that if a new response comes in and indicates that an additional event element was dissatisfactory, it adds that element and response record to the table. That precludes me from using absolute column references in my code. I cannot figure out how to specifically select columns that only have "Somewhat dissatisfied" or "Very dissatisfied" as values, in addition to the "Response ID" and "Reasons for Dissatisfaction". I tried select(where()), but that threw an error:

# Sample code

satisfactionrank<-data.frame(`Response_ID`=c(1,2),
                                 `Reasons_for_Dissatisfaction`=c("<Free-text feedback>","<Free-text feedback>"),
                                 `Rank_Food`=c("Somewhat dissatisfied","Very dissatisfied"),
                                 `Rank_Facilities`=c("Very dissatisfied","Somewhat dissatisfied"),
                                 `Rank_Content`=c("Very satisfied","Very satisfied"))

eventdissatisfaction<-satisfactionrank%>%
  select(`Response_ID`,
       `Reasons_for_Dissatisfaction`,
       where(contains("Rank") & any(. == "Somewhat dissatisfied") | any(. == "Very dissatisfied")))

# Error

Error in `select()`:
ℹ In argument: `where(...)`.
Caused by error in `where()`:
! Can't convert `fn`, a logical vector, to a function.
Run `rlang::last_trace()` to see where the error occurred.

Where am I going wrong?


Solution

  • You would want to modify your logic in where(...) to check if all of the values in a column is either "Somewhat dissatisfied" or "Very dissatisfied".

    library(dplyr)
    
    satisfactionrank |> 
      select(Response_ID,
             Reasons_for_Dissatisfaction,
             where(~all(.x %in% c("Somewhat dissatisfied", "Very dissatisfied")))
             )
    
    # output
      Response_ID Reasons_for_Dissatisfaction             Rank_Food       Rank_Facilities
    1           1        <Free-text feedback> Somewhat dissatisfied     Very dissatisfied
    2           2        <Free-text feedback>     Very dissatisfied Somewhat dissatisfied