pythonvenn

Python: Venn diagram from score data


I have the following data:

df =
id testA testB
1  3     NA
1  1     3
2  2     NA
2  NA    1
2  0     0
3  NA    NA
3  1     1

I would like to create a Venn diagram of the number of times that testA and testB appear, testA but not testB, and testB but not testA.

The expected outcome would be the following groups:

enter image description here

Both tests: 3
A but not B: 2
B but not A: 1

Solution

  • While I am not sure how you get to your index in the dataframe, or if you have another index. Also, I assumed NA to be np.nan.

    In any case, you can try something like the following (but start where your df exists). First, I try to recreate your DataFrame. Then, i create two sets, namely setA and setB, which contain the indices of where the data is not nan. Finally, a Venn diagram is created, containing these two sets.

    from matplotlib_venn import venn2
    import pandas
    import numpy as np
    
    df = pandas.DataFrame()
    df["testA"] = [3,1,2,np.nan,0,np.nan,1]
    df["testB"] = [np.nan,3,np.nan,1,0,np.nan,1]
    
    setA = set([index_ for index_ in df.index if not np.isnan(df["testA"].loc[index_])])
    setB = set([index_ for index_ in df.index if not np.isnan(df["testB"].loc[index_])])
    venn2([setA, setB])
    

    You then get something like this.