[SOLVED] Add Optional Variables in Groupby Function Pandas

Add Optional Variables in Groupby Function Pandas

Let's say I have the following data frame, and I want to have var_a be optional, so that if it is not supplied there will not be a default value; it will only group by the variable vehicle. How do I do that? I was trying to figure out with **Kwargs, but I am not getting it.

d = {'col1': ['1', '2', '2', '4', '5'], 'vehicle': ['car', 'car', 'truck', 'truck', 'bike'], 'eng': [2, 2, 6, 6, 0], 'id' : ['1','2','3','4','5']}
df_attempt = pd.DataFrame(data=d)

def new_fun(df, var_a):

   return df.groupby(['vehicle', var_a], as_index = False)['id'].count()

new_fun(df_attempt, 'col1')

Trying with **kwargs:

def new_fun(df, **kwargs):
    var_a = kwargs.get('var_a', None)
    return df.groupby(['vehicle', var_a], as_index = False)['id'].count()

new_fun(df_attempt, 'col1')  
``

Solution

Be explicit, provide a default value and test if the default was provided:

def new_fun(df, var_a=None):
    group = 'vehicle' if var_a is None else ['vehicle', var_a]
    return df.groupby(group, as_index = False)['id'].count()

out = new_fun(df_attempt)

Output:

  vehicle  id
0    bike   1
1     car   2
2   truck   2

If None is possibly a valid column name, you could use a special default (pandas uses sep=_NoDefault.no_default for example in read_csv).

Here is a more generic example, that also accepts lists of columns:

from pandas._libs.lib import no_default
def new_fun(df, var_a=no_default):
    group = ['vehicle']
    if var_a is not no_default:
        group.extend(var_a if isinstance(var_a, list) else [var_a])
    return df.groupby(group, as_index = False)['id'].count()

new_fun(df_attempt)
new_fun(df_attempt, 'col1')
new_fun(df_attempt, ['col1', 'eng'])