I can plot the venn diagrams (using pyvenn
), choosing how many to compare with musiciansdf.iloc[:, 0:3]
or like musiciansdf = musiciansdf.loc[:, ["Played at Woodstock", "Members of The Beatles", "Guitarists"]]
(anywhere from 2 to 6 keys, here 3) as in,
import pandas as pd
from venn import venn
musiciansdf = pd.DataFrame({
"Members of The Beatles": ["Paul McCartney", "John Lennon", "George Harrison", "Ringo Starr"],
"Members of The Beats": ["Paul McCartney", "Lennon", "George Harrison", "Starr"],
"Guitarists": ["John Lennon", "George Harrison", "Jimi Hendrix", "Eric Clapton"],
"Played at Woodstock": ["Jimi Hendrix", "Carlos Santana", "Keith Moon", "Carlos Santana"],
"Played at more": ["Jimi Hendrix", "Santana", "Keith Moon", "Santana"],
"Cheese factory": ["Jimi", "Carlos Santana", "Keith", "Carlos Santana"]
})
musiciansdf = musiciansdf.iloc[:, 0:3]
Then put the data in the right format (dictionary with sets for values) with
vennmus = {}
for k, v in musiciansdf.to_dict('list').items():
vennmus[k] = set(v)
And plot with
venn(vennmus)
But is there a way to get the values in each part of the venn diagrams, with the corresponding key combinations? Like a dictionary showing all the unions and the values that go with them. I know I could just check what columns are used, and write out sets and unions manually, for any combination, but I'm wondering about a quicker dynamical way.
For example, if I use musiciansdf.iloc[:, 0:2]
I would want a dict like,
{'Members of The Beatles only': {'John Lennon',
'Ringo Starr'},
'Members of The Beats only': {'Lennon',
'Starr'}
'Members of The Beatles & Members of The Beats': {'George Harrison',
'Paul McCartney'}
}
matplotlib-venn
could be used instead if it's a better option. I'm looking for a solution where either musiciansdf = musiciansdf.loc[:, ["Played at Woodstock", "Members of The Beatles", "Guitarists"]]
or musiciansdf = musiciansdf.iloc[:, 0:3]
could be used for selection, so they could be in order or not.
If you're tempted to use a pure pandas approach :
d = (
musiciansdf.iloc[:, 0:2] # or `.loc`
.stack().droplevel(0).rename_axis("membership")
.reset_index(name="musician").drop_duplicates()
.groupby("musician", as_index=False).agg(
lambda x: " & ".join(x) if len(x)>1 else x + " only")
.groupby("membership")["musician"].agg(set).to_dict()
)
Output :
print(d)
{'Members of The Beatles & Members of The Beats': {'George Harrison',
'Paul McCartney'},
'Members of The Beatles only': {'John Lennon', 'Ringo Starr'},
'Members of The Beats only': {'Lennon', 'Starr'}}