rggvenn

ggvenn adding non-existent value to Venn


I have generated a Venn diagram using ggvenn and a csv file with three columns.

In my final png (attached), an additional value has been counted between groups Dro and Xvm (I have circled this in green with an arrow and "?"). Dro and Xvm should have a total of 3 values each, and there should be nothing shared between just Dro and Xvm. Instead, they have 4 values in total and a mystery shared value is being generated. I have checked the csv file to make sure there are no hidden spaces etc, other than this, I can't work out where this extra shared value is coming form.

Here is my code

#load the csv
data <- read.table("infile.csv", header = T, sep = ",")

# the data are laid out in columns, with features in a list under each header. ggvenn will identify the shared and unique features.
# convert data to a format suitable for ggvenn
ggvenn_data <- list(
  Dro = data$Dro_v_Con,
  Foc = data$Foc_v_Con,
  Xvm = data$Xvm_v_Con
)

When I check ggvenn_data, there appear to be no added values (n=3 for Xvm and n=3 for Dro):

print(ggvenn_data)

$Dro
 [1] "M830.837T11.111"  "M883.492T15.092"  "M367.151T397.982" ""                
 [5] ""                 ""                 ""                 ""                
 [9] ""                 ""                

$Foc
 [1] "M883.492T15.092"  "M1279.087T13.111" "M830.837T11.111"  "M1116.775T17.12" 
 [5] "M1244.755T16.127" "M1030.792T17.119" "M1334.742T13.111" "M1239.758T16.118"
 [9] "M1472.714T13.108" "M970.807T12.104" 

$Xvm
 [1] "M1094.782T12.105" "M991.807T13.109"  "M830.837T11.111"  ""                
 [5] ""                 ""                 ""                 ""                
 [9] ""                 ""    

I then build my Venn diagram;

ggvenn_plot <- ggvenn(ggvenn_data, fill_color = c("#0073C2FF", "#EFC000FF", "#CD534CFF"),
  stroke_size = 0.5, show_percentage = F, show_elements = F) +
  theme_void()

And the final image contains an extra "1" shared between Dro and Xvm. I have tried using the show_elements = T to see if there is any mystery text, but the section is blank. So where is the "1" coming from? Can anyone help with this, I'm stuck!? I have had a look around on Google and I haven't been able to identify this issue elsewhere.

enter image description here

Edit: Addition of reproducible example of data

dput(ggvenn_data)

list(Dro = c("M830.837T11.111", "M883.492T15.092", "M367.151T397.982", 
"", "", "", "", "", "", ""), Foc = c("M883.492T15.092", "M1279.087T13.111", 
"M830.837T11.111", "M1116.775T17.12", "M1244.755T16.127", "M1030.792T17.119", 
"M1334.742T13.111", "M1239.758T16.118", "M1472.714T13.108", "M970.807T12.104"
), Xvm = c("M1094.782T12.105", "M991.807T13.109", "M830.837T11.111", 
"", "", "", "", "", "", ""))

Solution

  • The issue is that the empty strings are counted as a category. To fix that, drop them from your data:

    library(ggvenn)
    
    ggvenn_data <- ggvenn_data |> 
      lapply(\(x) x[!x == ""])
    
    ggvenn(ggvenn_data,
      fill_color = c("#0073C2FF", "#EFC000FF", "#CD534CFF"),
      stroke_size = 0.5, show_percentage = F, show_elements = F
    ) +
      theme_void()