I have 4 columns specifying whether the specific metric was fullfilled in each trial or not.
mydata <- data.frame(trial = c(1,2,3,4,5,6,7,...), # eg. up to 27 000
metricA = c('success', 'failed', 'failed', 'success',...),
metricB = c('failed', 'success', 'success', 'success',...),
metricC = c('failed', 'failed', 'success', 'failed',...),
metricD = c('success', 'success', 'failed', 'success',...),
)
The metric columns are as long as the trial column, so that for each trial it is known whether it failed or succeeded in each metric.
Now I would like to visualise how many trials were successful or failed for each metric and across metrics. I.e. 10% of trials that succeeded in metric a failed in metric C and so on. I want to visualise it with a Venn diagramm. This is the code I have produced:
mydata <- read.csv("trials-metrics.csv")
mA<-mydata$metricA
mB<-mydata$metricB
mC<-mydata$metricC
mD<-mydata$metricD
x <- list(
A = mA,
B = mB,
C = mC,
D = mD
)
ggVennDiagram(x, category.names = c("A","B","C","D"))
This produces the following plot.
Most likely, this type of Venn diagram only compares shared values between groups. Therefore, I assume I need to produce a unique value for each combination of metric outcomes. How can I implement this? Or am I missing something else?
I have found this similar entry, where the same Venn diagram was successfully produced with the same "True/False" type of dimeric data.
Making a Venn Diagram from a Dataframe
I am incredibly new to R, so the most parsimonious code solution woud be greatly appreciated.
As you already guessed and as we are dealing with sets, we have to make the elements unique, e.g. you can use the trial
column and filter for e.g. the success
es per metric.
Using some random example data:
n <- 1000
set.seed(123)
mydata <- data.frame(
trial = seq_len(n), # eg. up to 27 000
metricA = sample(c("success", "failed"), n, replace = TRUE),
metricB = sample(c("success", "failed"), n, replace = TRUE),
metricC = sample(c("success", "failed"), n, replace = TRUE),
metricD = sample(c("success", "failed"), n, replace = TRUE)
)
library(ggVennDiagram)
x <- list(
A = mydata$trial[mydata$metricA == "success"],
B = mydata$trial[mydata$metricB == "success"],
C = mydata$trial[mydata$metricC == "success"],
D = mydata$trial[mydata$metricD == "success"]
)
ggVennDiagram(x, category.names = LETTERS[1:4])
Or instead of creating the list manually you might consider using lapply
:
xx <- lapply(mydata[-1], \(x) mydata$trial[x == "success"])
ggVennDiagram(xx, category.names = LETTERS[seq_along(xx)])