pythonpython-3.xvisualizationupsetplot

How to display intersection values instead of distinct values in Upset plot


I tried to create an upset plot and display intersection among different sets.
But my upset plot is displaying dinstinct value counts among sets.
How do I change it to intersections instead of distinct counts?

This is my code:

mammals = ['Cat', 'Dog', 'Horse', 'Sheep', 'Pig', 'Cattle', 'Rhinoceros', 'Moose']
herbivores = ['Horse', 'Sheep', 'Cattle', 'Moose', 'Rhinoceros']
domesticated = ['Dog', 'Chicken', 'Horse', 'Sheep', 'Pig', 'Cattle', 'Duck']
from upsetplot import from_contents
animals = from_contents({'mammal': mammals, 'herbivore': herbivores, 'domesticated': domesticated})
from upsetplot import UpSet
ax_dict = UpSet(animals, subset_size='count',show_counts=True).plot()

This is my output:

Output

The actual intersection between herbivores and mammals is 5 while my plot shows 2.
Can anyone help me how to show intersections in upset plots?


Solution

  • Okay this question is already some days old but I have not seen any answer yet.

    A couple of years ago I faced a similar problem and I found some old code of mine. The idea is that you manually calculate the intersection size and then create an input object via upsetplot.from_memberships() containing the categories and their associated intersections sizes.

    In your case try something similar to this here:

    import upsetplot
    import itertools
    import numpy as np
    
    mammals = ['Cat', 'Dog', 'Horse', 'Sheep', 'Pig', 'Cattle', 'Rhinoceros', 'Moose']
    herbivores = ['Horse', 'Sheep', 'Cattle', 'Moose', 'Rhinoceros']
    domesticated = ['Dog', 'Chicken', 'Horse', 'Sheep', 'Pig', 'Cattle', 'Duck']
    
    animals_dict = {"mammals": mammals, "herbivores": herbivores, "domesticated": domesticated}
    
    categories = list(animals_dict.keys())
    comb_list_list = []
    comb_intersection_length_list = []
    # identify per category combination the intersection length
    for i in range(len(categories)):
        comb_list = list(itertools.combinations(categories, i+1))
        for elem in comb_list:
            comb_list_list.append(elem)
            # create a list of lists of categories for which to search the intersection length
            cat_lists = [animals_dict[x] for x in elem]
            comb_intersection_length_list.append(len(set(cat_lists[0]).intersection(*cat_lists)))
    
    # remove category combinations with 0 intersections.
    comb_list_list = np.array(comb_list_list)
    comb_intersection_length_list = np.array(comb_intersection_length_list)
    comb_list_list = comb_list_list[comb_intersection_length_list != 0]
    comb_intersection_length_list = comb_intersection_length_list[comb_intersection_length_list != 0]
    
    # create a membership data series which indicates the intersection size between the different sets
    mem_series = upsetplot.from_memberships(comb_list_list,
                                            data=comb_intersection_length_list)
    
    upsetplot.plot(mem_series,
                   orientation='horizontal',
                   show_counts=True)
    

    The problem with this approach is that the total set size (bottom left) inflates as it is the sum over all intersections rather all distinct values, thus is not really useful anymore. For my own purpose, this approach was good enough, any adjustments need to be done by yourself.

    Here is the plot showing intersection sizes:

    Upsetplot showing intersection sizes.