pythonpandasdataframepandas-groupbysplit-apply-combine

pandas apply with parameter list


I have a simple DataFrame Object:

df = pd.DataFrame(np.random.random_sample((5,5)))
df["col"] = ["A", "B", "C", "A" ,"B"]

#simple function
def func_apply(df,param=1):
    pd.Series(np.random(3)*param,name=str(param))

Now applying the function result in the expected DataFrame

df.groupby('col').apply(func_apply)

    1           0         1         2
col                              
A    0.928527  0.383567  0.085651
B    0.567423  0.668644  0.689766
C    0.301774  0.156021  0.222140

Is there a way to pass a parameter list to the groupby to get something like this?

#Pseudocode...
df.groupby('col').apply(func_apply, params=[1,2,10])

    1           0         1         2
par col                              
1    A    0.928527  0.383567  0.085651
1    B    0.567423  0.668644  0.689766
1    C    0.301774  0.156021  0.222140
2    A    0.526494  1.812780  1.515816
2    B    1.180539  0.527171  0.670796
2    C    1.507721  0.156808  1.695386
10   A    7.986563  5.109876  2.330171
10   B    2.096963  6.804624  2.351397
10   C    6.890758  8.079466  1.725226

Thanks a lot for any hint :)


Solution

  • IIUC,

    apply allows additional paramaters. You need to pass it as keyword or positional agurments using args with tuple. How you use the passed parameters is up to your imagination. I.e, it depends on how you write you apply func to utilize them to get your desired output.

    Let's take your sample data. I modified your func_apply as follows to sequential process each group using the additional params and combine them into the final output:

    def func_apply(df,params=[1]):
         d = [pd.Series(np.random.random(3), name=str(par),index=['x', 'y', 'z']) for par in params]
         return pd.DataFrame(d)
    

    Now call apply func_apply and pass [1, 2, 10] to it (I use keyword to pass params):

    df.groupby('col').apply(func_apply, params=[1, 2, 10])
    
    Out[1102]:
                   x         y         z
    col
    A   1   0.074357  0.850912  0.652096
        2   0.307986  0.267658  0.558153
        10  0.351000  0.743816  0.192400
    B   1   0.179359  0.411784  0.535644
        2   0.905294  0.696661  0.794458
        10  0.635706  0.742784  0.963603
    C   1   0.020375  0.693070  0.225971
        2   0.448988  0.288206  0.715875
        10  0.980669  0.474264  0.036715
    

    Without passing the params, apply falls back to the default:

    df.groupby('col').apply(func_apply)
    
    Out[1103]:
                  x         y         z
    col
    A   1  0.499484  0.175008  0.331594
    B   1  0.052399  0.965129  0.649668
    C   1  0.053869  0.297008  0.793262