rggplot2venn-diagram

Visualise success rates for multiple metrics


I have 4 columns specifying whether the specific metric was fullfilled in each trial or not.

mydata <- data.frame(trial = c(1,2,3,4,5,6,7,...), # eg. up to 27 000
                     metricA = c('success', 'failed', 'failed', 'success',...), 
                     metricB = c('failed', 'success', 'success', 'success',...), 
                     metricC = c('failed', 'failed', 'success', 'failed',...),
                     metricD = c('success', 'success', 'failed', 'success',...),
                     )

The metric columns are as long as the trial column, so that for each trial it is known whether it failed or succeeded in each metric.

Now I would like to visualise how many trials were successful or failed for each metric and across metrics. I.e. 10% of trials that succeeded in metric a failed in metric C and so on. I want to visualise it with a Venn diagramm. This is the code I have produced:

mydata <- read.csv("trials-metrics.csv")

mA<-mydata$metricA 
mB<-mydata$metricB 
mC<-mydata$metricC 
mD<-mydata$metricD 

x <- list(
  A = mA, 
  B = mB, 
  C = mC,
  D = mD
)

ggVennDiagram(x, category.names = c("A","B","C","D"))

This produces the following plot.

enter image description here

Most likely, this type of Venn diagram only compares shared values between groups. Therefore, I assume I need to produce a unique value for each combination of metric outcomes. How can I implement this? Or am I missing something else?

I have found this similar entry, where the same Venn diagram was successfully produced with the same "True/False" type of dimeric data.

Making a Venn Diagram from a Dataframe

I am incredibly new to R, so the most parsimonious code solution woud be greatly appreciated.


Solution

  • As you already guessed and as we are dealing with sets, we have to make the elements unique, e.g. you can use the trial column and filter for e.g. the successes per metric.

    Using some random example data:

    n <- 1000
    set.seed(123)
    
    mydata <- data.frame(
      trial = seq_len(n), # eg. up to 27 000
      metricA = sample(c("success", "failed"), n, replace = TRUE),
      metricB = sample(c("success", "failed"), n, replace = TRUE),
      metricC = sample(c("success", "failed"), n, replace = TRUE),
      metricD = sample(c("success", "failed"), n, replace = TRUE)
    )
    
    library(ggVennDiagram)
    
    x <- list(
      A = mydata$trial[mydata$metricA == "success"],
      B = mydata$trial[mydata$metricB == "success"],
      C = mydata$trial[mydata$metricC == "success"],
      D = mydata$trial[mydata$metricD == "success"]
    )
    ggVennDiagram(x, category.names = LETTERS[1:4])
    

    Or instead of creating the list manually you might consider using lapply:

    xx <- lapply(mydata[-1], \(x) mydata$trial[x == "success"])
    ggVennDiagram(xx, category.names = LETTERS[seq_along(xx)])