I have this example data.frame:
df1 <- data.frame(v1 = c('A', 'B', 'C', 'D', 'E', 'F', 'G', 'H'),
v2 = c('B', 'A', 'D', 'C', 'F', 'E', 'H', 'G'),
value = c(1.12, 1.12, 12.52, 12.52, 3.19, 3.19, 12.52, 12.52))
> df1
v1 v2 value
1 A B 1.12
2 B A 1.12
3 C D 12.52
4 D C 12.52
5 E F 3.19
6 F E 3.19
7 G H 12.52
8 H G 12.52
Combinations such as A and B in row 1
are the same to me as combinations such as B and A, where values in column value
are also the same. How can I remove rows which for my purpose are duplicates?
Expected result:
df2 <- data.frame(v1 = c('A', 'C', 'E', 'G'),
v2 = c('B', 'D', 'F', 'H'),
value = c(1.12, 12.52, 3.19, 12.52))
> df2
v1 v2 value
1 A B 1.12
2 C D 12.52
3 E F 3.19
4 G H 12.52
The idea is to consider v1 and v2 interchangeable.
df1 <- data.frame(v1 = c('A', 'B', 'C', 'D', 'E', 'F', 'G', 'H'),
v2 = c('B', 'A', 'D', 'C', 'F', 'E', 'H', 'G'),
value = c(1.12, 1.12, 12.52, 12.52, 3.19, 3.19, 12.52, 12.52))
### with tidyverse:
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(purrr)
df2 <- df1 %>%
mutate(combination = pmap_chr(list(v1, v2), ~ paste(sort(c(..1, ..2)), collapse = ","))) %>%
filter(!duplicated(combination)) %>%
select(-combination)
df2
#> v1 v2 value
#> 1 A B 1.12
#> 2 C D 12.52
#> 3 E F 3.19
#> 4 G H 12.52
### Base R:
df2 <- df1[!duplicated(t(apply(df1[, c("v1", "v2")], 1, sort))), ]
df2
#> v1 v2 value
#> 1 A B 1.12
#> 3 C D 12.52
#> 5 E F 3.19
#> 7 G H 12.52
Created on 2023-12-24 with reprex v2.0.2