pythonpandassubsetupsetplot

Subsetting columns and counting the 1's (TURF analysis?)


The aim is to count the corresponding 1's in the rows of each subset (>2) of columns:

    0   2   4
0   0   1   0
1   1   1   1
2   1   0   0
3   1   1   0
4   1   0   0
... ... ... ...

In above example we would have 4 subsets. Then the idea is to summarize these counts in a bar plot where each bar is labelled according the subset.

The aim is to make an UpSet plot

UpSet plot


Solution

  • It looks like you're looking for an UpSetPlot:

    # pip install upsetplot
    import upsetplot
    
    upsetplot.plot(df.astype(bool).value_counts())
    

    Output:

    enter image description here

    With all combinations

    upsetplot.plot(df.astype(bool).value_counts()
                     .reindex(product([True, False], repeat=3), fill_value=0)
                  )
    

    enter image description here

    older answer

    It looks like you might want something like:

    df.value_counts().plot.bar()
    

    Output:

    enter image description here

    Or, by column name for 1 values:

    (df.reset_index().melt('index', var_name='cols')
       .query('value == 1')
       .groupby('index')['cols'].agg(frozenset)
       .value_counts().plot.bar()
    )
    

    Output:

    enter image description here