rlistvenn-diagrameulerr

Accurate Venn diagrams using eulerr


I'm trying to use the eulerr package to create Venn diagrams. I have 2 lists that I would like to use to create the Venn diagram with. 1 of the lists is a subset of the first. Strangely, eulerr seems to think that there is one value present in list

b

that is unique to that subset. I can't seem to figure out which values it thinks are unique.

https://pastebin.com/J7tPcfAt

> length(a)
[1] 3278

> length(b)
[1] 1318

When I check overlap between the subsets I get the expected results:

> length(which(a %in% b))
[1] 1318

> length(which((b %in% a)))
[1] 1318

> length(which(!(b %in% a)))
[1] 0

> length(which(!(a %in% b)))
[1] 1960

But when I use eulerr to plot a Venn diagram I get:

library(eulerr)
fit <- euler(list("A" = a, "B" = b))
plot(fit, counts = TRUE)

enter image description here

Notably, the number of values that eulerr thinks are unique to A is one longer than what I get using

length(which(!(a %in b)))

Any help understanding this behavior would be greatly appreciated!


Solution

  • I found out what's causing this behaviour but I can't explain why. It's because there is a duplicate value in both a and b, and it's the same value.

    > a[duplicated(a)]
    [1] "Crybg3"
    > b[duplicated(b)]
    [1] "Crybg3"
    

    If I remove this value from both vectors it works.

    a1 <- a[!duplicated(a)]
    b1 <- b[!duplicated(b)]
    
    fit <- euler(list("A" = a1, "B" = b1))
    plot(fit, counts = TRUE)
    
    > fit
        original fitted residuals region_error
    A       1960   1960         0            0
    B          0      0         0            0
    A&B     1317   1317         0            0
    
    diag_error:  0 
    stress:      0 
    

    enter image description here