I am working in R, and would prefer a dplyr solution if possible.
sample data:
data.frame(
col1 = c("a", "b", "c", "d"),
col2 = c("a", "b", "d", "a"),
col3 = rep("a", 4L),
col4 = c("a", "b", "d", "a"),
col5 = c("a", "a", "c", "d"),
col6 = rep(c("b", "a"), each = 2L)
)
col1 | col2 | col3 | col4 | col5 | col6 |
---|---|---|---|---|---|
a | a | a | a | a | b |
b | b | a | b | a | b |
c | d | a | d | c | a |
d | a | a | a | d | a |
Question
I would like to know for each row, whether col1, col2 and col3 are the same as col4, col5 and col6, but the order of col1 - col3 and col4 - col6 should be ignored.
So for row 1, if col1 - col3 contained a,a,b respectively, and col4 - col6 contained b,a,a respectively, then that would be considered a match.
Desired result
Have put a note on "assessment" column to aid understanding
col1 | col2 | col3 | col4 | col5 | col6 | assessment |
---|---|---|---|---|---|---|
a | a | a | a | a | b | FALSE (because 1-3 are not same as 4-6) |
b | b | a | b | a | b | TRUE (because 1-3 are the same as 4-6, if ignore order) |
c | d | a | d | c | a | TRUE (because 1-3 are the same as 4-6, if ignore order) |
d | a | a | a | d | a | TRUE (because 1-3 are the same as 4-6, if ignore order) |
Using dplyr you can do the following:
df %>%
rowwise() %>%
mutate(result = all(sort(c_across(col1:col3)) == sort(c_across(col4:col6))))
# A tibble: 4 × 7
# Rowwise:
col1 col2 col3 col4 col5 col6 result
<chr> <chr> <chr> <chr> <chr> <chr> <lgl>
1 a a a a a b FALSE
2 b b a b a b TRUE
3 c d a d c a TRUE
4 d a a a d a TRUE