pythonpandasdataframe

find column with all 'no' values for each category group


I have a dataset similar to this fictional one

data = {'category':['foo','foo','foo','foo','foo','bar','bar','bar','bar','bar'],'part':['101','102','103','104','105','201','202','203','204','205'],'site_a':['yes','yes','no','no','no','no','yes','no','no','no'],'site_b':['no','no','no','no','no','no','yes','no','no','no'],'site_c':['yes','no','no','no','no','no','no','no','no','no']}

I need to group by category and filter which site has all no values for each category value. 'site_b' would be true for category 'foo' and 'site_c' would be true for category 'bar'. I guess if I converted yes to 1 and no to 0 then I could do some kind of aggregation function if I used groupby. I was also reading about multiindexing, but wasnt sure how to iterate through the categories to find which sites had all 'no' values.

This is what my multi index looked like

df.set_index(['category','part'], inplace=True)
                      site_a    site_b  site_c
category    part            
foo         101       yes        no      yes
            102       yes        no      no
            103       no         no      no
            104       no         no      no
            105       no         no      no
bar         201       no         no      no
            202       yes        yes     no
            203       no         no      no
            204       no         no      no
            205       no         no      no

This way if a site is not using a part from the category then I would remove that category from each sites list of category/parts.

much appreciation for the guidance on using groupby or multiindex


Solution

  • IIUC,

    df.set_index('category')[['site_a', 'site_b', 'site_c']].eq('no').groupby(level=0).all()
    

    Output:

              site_a  site_b  site_c
    category                        
    bar        False   False    True
    foo        False    True   False
    

    Dataframe restructured:

    df.set_index('category')[['site_a', 'site_b', 'site_c']].eq('no').groupby(level=0).all().stack().reset_index()
    

    Output:

      category level_1      0
    0      bar  site_a  False
    1      bar  site_b  False
    2      bar  site_c   True
    3      foo  site_a  False
    4      foo  site_b   True
    5      foo  site_c  False