rintersectionset-intersectionupsetr

Problem with upset plot intersection numbers


I have four sets A, B, C and D like below:

A <- c("ENSG00000103472", "ENSG00000130600", "ENSG00000177335", "ENSG00000177337", 
"ENSG00000178977", "ENSG00000180139", "ENSG00000180539", "ENSG00000187621", 
"ENSG00000188511", "ENSG00000197099", "ENSG00000203446", "ENSG00000203739", 
"ENSG00000203804", "ENSG00000204261", "ENSG00000204282", "ENSG00000204584", 
"ENSG00000205056", "ENSG00000205837", "ENSG00000206337", "ENSG00000213057")

B <- c("ENSG00000146521", "ENSG00000165511", "ENSG00000174171", "ENSG00000176659", 
"ENSG00000179428", "ENSG00000179840", "ENSG00000180539", "ENSG00000204261", 
"ENSG00000204282", "ENSG00000204949", "ENSG00000206337", "ENSG00000223534", 
"ENSG00000223552", "ENSG00000223725", "ENSG00000226252", "ENSG00000226751", 
"ENSG00000226777", "ENSG00000227066", "ENSG00000227260", "ENSG00000227403")

C <- c("ENSG00000167912", "ENSG00000168405", "ENSG00000172965", "ENSG00000177234", 
"ENSG00000177699", "ENSG00000177822", "ENSG00000179428", "ENSG00000179840", 
"ENSG00000180139", "ENSG00000181800", "ENSG00000181908", "ENSG00000183674", 
"ENSG00000189238", "ENSG00000196668", "ENSG00000196979", "ENSG00000197301", 
"ENSG00000203446", "ENSG00000203999", "ENSG00000204261", "ENSG00000206337")

D <- c("ENSG00000122043", "ENSG00000162888", "ENSG00000167912", "ENSG00000176320", 
"ENSG00000177699", "ENSG00000179253", "ENSG00000179428", "ENSG00000179840", 
"ENSG00000180539", "ENSG00000181800", "ENSG00000185433", "ENSG00000188511", 
"ENSG00000189238", "ENSG00000197301", "ENSG00000205056", "ENSG00000205562", 
"ENSG00000213279", "ENSG00000214922", "ENSG00000215533", "ENSG00000218018")

An upset plot gave me following result:

library(UpSetR)
mine <- list("A" = A,
             "B" = B,
             "C" = C,
             "D" = D)

upset(fromList(mine), keep.order = TRUE)

enter image description here

But I'm interested in looking at intersections between specific sets. A & B, A & C, A & D. So, I did it like below:

upset(fromList(mine), intersections = list(list("A"),list("B"),list("C"),
                                           list("D"),list("A", "B"), 
                                           list("A", "C"),
                                           list("A", "D")), keep.order = TRUE)

enter image description here

But, the common between A & B are 4, A & C are 4 and A & D are 3. Why the above upset plot show wrong numbers?

How to make it right showing correct common number? I don't want the common between all sets.


Solution

  • The numbers are correct! The issue is very specific and complex.

    There are different ways to calculate set intersection size:

    1. "distinct" mode
    2. "intersect" mode
    3. "union" mode

    UpSetR uses the "distinct" mode.

    The "intersect" mode may be what the user expects.

    ComplexHeatmap and ComplexUpset packages allows the user to choose which mode to use.

    I found a real sufficient explanation by Jakob Rosenthal here https://github.com/hms-dbmi/UpSetR/issues/72 especially this graphic:

    enter image description here