Let's say I have the following data frame, and I want to have var_a be optional, so that if it is not supplied there will not be a default value; it will only group by the variable vehicle. How do I do that? I was trying to figure out with **Kwargs, but I am not getting it.
d = {'col1': ['1', '2', '2', '4', '5'], 'vehicle': ['car', 'car', 'truck', 'truck', 'bike'], 'eng': [2, 2, 6, 6, 0], 'id' : ['1','2','3','4','5']}
df_attempt = pd.DataFrame(data=d)
def new_fun(df, var_a):
return df.groupby(['vehicle', var_a], as_index = False)['id'].count()
new_fun(df_attempt, 'col1')
Trying with **kwargs:
def new_fun(df, **kwargs):
var_a = kwargs.get('var_a', None)
return df.groupby(['vehicle', var_a], as_index = False)['id'].count()
new_fun(df_attempt, 'col1')
``
Be explicit, provide a default value and test if the default was provided:
def new_fun(df, var_a=None):
group = 'vehicle' if var_a is None else ['vehicle', var_a]
return df.groupby(group, as_index = False)['id'].count()
out = new_fun(df_attempt)
Output:
vehicle id
0 bike 1
1 car 2
2 truck 2
If None
is possibly a valid column name, you could use a special default (pandas uses sep=_NoDefault.no_default
for example in read_csv
).
Here is a more generic example, that also accepts lists of columns:
from pandas._libs.lib import no_default
def new_fun(df, var_a=no_default):
group = ['vehicle']
if var_a is not no_default:
group.extend(var_a if isinstance(var_a, list) else [var_a])
return df.groupby(group, as_index = False)['id'].count()
new_fun(df_attempt)
new_fun(df_attempt, 'col1')
new_fun(df_attempt, ['col1', 'eng'])