rvenn-diagram

Create a Venn Diagram in R while creating a sum of numbers


I'm trying to create a Venn Diagram in R to show whether certain tests on different machines are performed for all participants. In other words, I'm interested to see if certain tests for participants are performed on all three, two, one or none of the machines.

Here is an example of the data:

dat <- data.frame(id=1:30,
                  machine1 = sample(0:7, 30, replace =T),
                  machine2 = sample(0:3, 30, replace =T),
                  machine3 = sample(0:6, 30, replace =T))

These machine columns are sums of original columns for different tests. I have omitted those, but if easier they can be created with: machine1test1 = sample(0:1, 30, replace = T) etcetera

So, if a participant had 2 tests on machine 1 and 3 tests on machine 2 and 0 tests on machine 3, it should add a value of 5 in the Venn diagram for the overlap between machine 1 and machine 2.

I have tried to follow several examples online, but they all seem to take in string values for a Venn Diagram. This would require me to restructure the data, and I was hoping it's possible without converting to strings. I've tried to follow these example:

https://www.datanovia.com/en/blog/venn-diagram-with-r-or-rstudio-a-million-ways/ Making a venn diagram from a count table How to add count values in venn diagram for more than 6 sets? Create a Venn Diagram in R to represent rows with the same value from a dataframe

But none of those seem to fully apply, since they mostly apply to string values. Any help would be much appreciated!


Solution

  • The simplest way I can think of would take advantage of how my nVennR package (link, the CRAN version is unavailable at this time) labels regions in a Venn diagram (as explained here). You would need an auxiliary function and row processing:

    library(nVennR)
    dat <- data.frame(id=1:30,
                      machine1 = sample(0:7, 30, replace =T),
                      machine2 = sample(0:3, 30, replace =T),
                      machine3 = sample(0:6, 30, replace =T))
    toBin <- function(l){
      result <- 0
      bit <- 0
      for (v in rev(l)){
        if (v > 0){
          bpos <- bitwShiftL(1, bit)
          result <- result + bpos
        }
        bit <- bit + 1
      }
      return(result + 1)
    }
    
    nReg <- bitwShiftL(1, ncol(dat) - 1)
    sets <- as.list(rep(0, nReg))
    for (r in rownames(dat)){
      set <- toBin(dat[r, 2:ncol(dat)])
      sets[[set]] <- sets[[set]] + sum(dat[r, 2:ncol(dat)])
    }
    
    myV <- createVennObj(nSets = ncol(dat) - 1, sNames = colnames(dat[,2:ncol(dat)]), sSizes = sets)
    myV <- plotVenn(nVennObj = myV)
    

    And the result would be:

    enter image description here

    The key is toBin, where the values in each row get converted into a number whose binary representation is 1 where the value is higher than zero and 0 otherwise. With a couple of transformations, that is the Venn region (set in the code) where you want to store the sum of the values (sum(dat[r, 2:ncol(dat)). There is more information about nVennR at its vignette.