pandasfunctiongroup-by

Add Optional Variables in Groupby Function Pandas


Let's say I have the following data frame, and I want to have var_a be optional, so that if it is not supplied there will not be a default value; it will only group by the variable vehicle. How do I do that? I was trying to figure out with **Kwargs, but I am not getting it.

d = {'col1': ['1', '2', '2', '4', '5'], 'vehicle': ['car', 'car', 'truck', 'truck', 'bike'], 'eng': [2, 2, 6, 6, 0], 'id' : ['1','2','3','4','5']}
df_attempt = pd.DataFrame(data=d)

def new_fun(df, var_a):

   return df.groupby(['vehicle', var_a], as_index = False)['id'].count()

new_fun(df_attempt, 'col1') 

Trying with **kwargs:

def new_fun(df, **kwargs):
    var_a = kwargs.get('var_a', None)
    return df.groupby(['vehicle', var_a], as_index = False)['id'].count()

new_fun(df_attempt, 'col1')  
``

Solution

  • Be explicit, provide a default value and test if the default was provided:

    def new_fun(df, var_a=None):
        group = 'vehicle' if var_a is None else ['vehicle', var_a]
        return df.groupby(group, as_index = False)['id'].count()
    
    out = new_fun(df_attempt)
    

    Output:

      vehicle  id
    0    bike   1
    1     car   2
    2   truck   2
    

    If None is possibly a valid column name, you could use a special default (pandas uses sep=_NoDefault.no_default for example in read_csv).

    Here is a more generic example, that also accepts lists of columns:

    from pandas._libs.lib import no_default
    def new_fun(df, var_a=no_default):
        group = ['vehicle']
        if var_a is not no_default:
            group.extend(var_a if isinstance(var_a, list) else [var_a])
        return df.groupby(group, as_index = False)['id'].count()
    
    new_fun(df_attempt)
    new_fun(df_attempt, 'col1')
    new_fun(df_attempt, ['col1', 'eng'])