I'm trying to use the eulerr package to create Venn diagrams. I have 2 lists that I would like to use to create the Venn diagram with. 1 of the lists is a subset of the first. Strangely, eulerr seems to think that there is one value present in list
b
that is unique to that subset. I can't seem to figure out which values it thinks are unique.
> length(a)
[1] 3278
> length(b)
[1] 1318
When I check overlap between the subsets I get the expected results:
> length(which(a %in% b))
[1] 1318
> length(which((b %in% a)))
[1] 1318
> length(which(!(b %in% a)))
[1] 0
> length(which(!(a %in% b)))
[1] 1960
But when I use eulerr to plot a Venn diagram I get:
library(eulerr)
fit <- euler(list("A" = a, "B" = b))
plot(fit, counts = TRUE)
Notably, the number of values that eulerr thinks are unique to A is one longer than what I get using
length(which(!(a %in b)))
Any help understanding this behavior would be greatly appreciated!
I found out what's causing this behaviour but I can't explain why. It's because there is a duplicate value in both a
and b
, and it's the same value.
> a[duplicated(a)]
[1] "Crybg3"
> b[duplicated(b)]
[1] "Crybg3"
If I remove this value from both vectors it works.
a1 <- a[!duplicated(a)]
b1 <- b[!duplicated(b)]
fit <- euler(list("A" = a1, "B" = b1))
plot(fit, counts = TRUE)
> fit
original fitted residuals region_error
A 1960 1960 0 0
B 0 0 0 0
A&B 1317 1317 0 0
diag_error: 0
stress: 0