rdataframeanti-join

Returning Unique Responses from different data frames in R


I have three different data frames containing questions from a survey given over three years. Over the years some of the questions have been edited slightly. I would like to create a new data frame which tells me which questions have been changed and gives me the wording of the question from all three years.

The data frames look something like:

2019 <- data.frame(V1 = c("Q1","Q2), V2= c("How many times have you done this?", "Is this your first time?"))
2020 <- data.frame(V1 = c("Q1","Q2), V2= c("How many times have you done this? (before this time)", "Is this your first time?"))``
2021 <- data.frame(V1 = c("Q1","Q2), V2= c("How many times have you done this( before this time)?", "Is this your first time?"))

I would like it to return:

data.frame(V1 = c("Q1"),2019 = c("How many times have you done this?"), 2020 = c("How many times have you done this? (before this time)"), 2021 = c("How many times have you done this( before this time)?"))

My first thought was to use something like anti_join() but I don't know how that works for three data frames.


Solution

  • Row bind the data frames into a single frame (adding a year identifier (y)), group by Question (V1), and filter the groups where there are more than one unique wording of (V2):

    dt %>% 
      group_by(V1) %>% 
      filter(length(unique(V2))>1)
    

    Output:

          y V1    V2                                                   
      <dbl> <chr> <chr>                                                
    1  2019 Q1    How many times have you done this?                   
    2  2020 Q1    How many times have you done this? (before this time)
    3  2021 Q1    How many times have you done this( before this time)?
    

    Input:

    dt = rbind(
      data.frame(y=2019,V1 = c("Q1","Q2"), V2= c("How many times have you done this?", "Is this your first time?")),
      data.frame(y=2020,V1 = c("Q1","Q2"), V2= c("How many times have you done this? (before this time)", "Is this your first time?")),
      data.frame(y=2021,V1 = c("Q1","Q2"), V2= c("How many times have you done this( before this time)?", "Is this your first time?"))
    )