I have a dataset similar to this fictional one
data = {'category':['foo','foo','foo','foo','foo','bar','bar','bar','bar','bar'],'part':['101','102','103','104','105','201','202','203','204','205'],'site_a':['yes','yes','no','no','no','no','yes','no','no','no'],'site_b':['no','no','no','no','no','no','yes','no','no','no'],'site_c':['yes','no','no','no','no','no','no','no','no','no']}
I need to group by category and filter which site has all no values for each category value. 'site_b' would be true for category 'foo' and 'site_c' would be true for category 'bar'. I guess if I converted yes to 1 and no to 0 then I could do some kind of aggregation function if I used groupby. I was also reading about multiindexing, but wasnt sure how to iterate through the categories to find which sites had all 'no' values.
This is what my multi index looked like
df.set_index(['category','part'], inplace=True)
site_a site_b site_c
category part
foo 101 yes no yes
102 yes no no
103 no no no
104 no no no
105 no no no
bar 201 no no no
202 yes yes no
203 no no no
204 no no no
205 no no no
This way if a site is not using a part from the category then I would remove that category from each sites list of category/parts.
much appreciation for the guidance on using groupby or multiindex
IIUC,
df.set_index('category')[['site_a', 'site_b', 'site_c']].eq('no').groupby(level=0).all()
Output:
site_a site_b site_c
category
bar False False True
foo False True False
Dataframe restructured:
df.set_index('category')[['site_a', 'site_b', 'site_c']].eq('no').groupby(level=0).all().stack().reset_index()
Output:
category level_1 0
0 bar site_a False
1 bar site_b False
2 bar site_c True
3 foo site_a False
4 foo site_b True
5 foo site_c False