I have three different data frames containing questions from a survey given over three years. Over the years some of the questions have been edited slightly. I would like to create a new data frame which tells me which questions have been changed and gives me the wording of the question from all three years.
The data frames look something like:
2019 <- data.frame(V1 = c("Q1","Q2), V2= c("How many times have you done this?", "Is this your first time?"))
2020 <- data.frame(V1 = c("Q1","Q2), V2= c("How many times have you done this? (before this time)", "Is this your first time?"))``
2021 <- data.frame(V1 = c("Q1","Q2), V2= c("How many times have you done this( before this time)?", "Is this your first time?"))
I would like it to return:
data.frame(V1 = c("Q1"),2019 = c("How many times have you done this?"), 2020 = c("How many times have you done this? (before this time)"), 2021 = c("How many times have you done this( before this time)?"))
My first thought was to use something like anti_join()
but I don't know how that works for three data frames.
Row bind the data frames into a single frame (adding a year identifier (y
)), group by Question (V1
), and filter the groups where there are more than one unique wording of (V2
):
dt %>%
group_by(V1) %>%
filter(length(unique(V2))>1)
Output:
y V1 V2
<dbl> <chr> <chr>
1 2019 Q1 How many times have you done this?
2 2020 Q1 How many times have you done this? (before this time)
3 2021 Q1 How many times have you done this( before this time)?
Input:
dt = rbind(
data.frame(y=2019,V1 = c("Q1","Q2"), V2= c("How many times have you done this?", "Is this your first time?")),
data.frame(y=2020,V1 = c("Q1","Q2"), V2= c("How many times have you done this? (before this time)", "Is this your first time?")),
data.frame(y=2021,V1 = c("Q1","Q2"), V2= c("How many times have you done this( before this time)?", "Is this your first time?"))
)