Why don't fillna and other functions work inside a function?
I have a DataFrame with 10 columns. I would like to write a function taking each column and creating multiple columns. My final DataFrame would be 50 columns.
def newVars(df,col='my_var'):
df[col+'_filled'] = df[col].fillna(0)
df[col+'_rank'] = df[col].fillna(0).rank()
df[col+'_percentile'] = df[col].fillna(0).rank(pct=True)
df[col+'_halved'] = df[col]/2
return df
new_df = df.apply(newVars, axis=1)
I get the error: 'float' has no attribute 'fillna'
I am expecting a DataFrame with 5 times the columns of my initial DataFrame. If I take the line outside of the function it works fine:
df['my_var_filled'] = df['my_var].fillna(0)
apply
doesn't really make sense in your context.
It rather looks like you should pass the DataFrame to the function:
df = pd.DataFrame({'my_var': [1,3,20]})
def newVars(df,col='my_var'):
df[col+'_filled'] = df[col].fillna(0)
df[col+'_rank'] = df[col].fillna(0).rank()
df[col+'_percentile'] = df[col].fillna(0).rank(pct=True)
df[col+'_halved'] = df[col]/2
return df
new_df = newVarsars(df)
Or use pipe
:
df = pd.DataFrame({'my_var': [1,3,20]})
def newVars(df,col='my_var'):
df[col+'_filled'] = df[col].fillna(0)
df[col+'_rank'] = df[col].fillna(0).rank()
df[col+'_percentile'] = df[col].fillna(0).rank(pct=True)
df[col+'_halved'] = df[col]/2
return df
new_df = df.pipe(newVarsars)
Output:
my_var my_var_filled my_var_rank my_var_percentile my_var_halved
0 1 1 1.0 0.333333 0.5
1 3 3 2.0 0.666667 1.5
2 20 20 3.0 1.000000 10.0
Note that in both cases your function mutates df
in place and outputs it. I would recommend to do one or the other, not both.