pythonpandasaggregationpandas-resample

How to pass argument to func in `pandas.resampler.agg()` when using dict input?


I am trying to resample a pandas dataframe, and for some columns I would like to sum on. additionally, I want to get None/nan as result when there is no rows in a resampling period. For aggregation on a single column, I can do the following:

df = pd.DataFrame(index=[pd.to_datetime('2020-01-01')], columns=['value'])
df.resample('5min').agg("sum", min_count=1)

according to pandas doc, the keyword argument min_count will be passed to resample.Resampler.sum associated with the string "sum". and the result is desired.

           value
2020-01-01  None

However, this won't work if I pass a dictionary as agg input, e.g.

df = pd.DataFrame(index=[pd.to_datetime('2020-01-01')], columns=['value'])
df.resample('5min').agg({'value': 'sum'}, min_count=1)

will output:

           value
2020-01-01     0

I would like to know the right way to pass arguments to the aggregation functions specified inside the dict.


Solution

  • This is currently not possible. There is/was a similar issue with agg.

    Assuming multiple columns:

    df = pd.DataFrame(index=[pd.to_datetime('2020-01-01')],
                      columns=['value', 'value2', 'value3'])
    

    If you want to apply the same aggregation, just slice before resample.agg:

    out = df.resample('5min')[['value', 'value2']].agg('sum', min_count=1)
    

    Output:

               value value2
    2020-01-01  None   None
    

    If you need different aggregation functions, use a dictionary and concat:

    funcs = {'value': 'sum', 'value2': 'min'}
    
    r = df.resample('5min')
    out = pd.concat({k: r[k].agg([v], min_count=1)
                     for k, v in funcs.items()}, axis=1)
    

    Output:

               value value2
                 sum    min
    2020-01-01  None    NaN
    

    And if you need different aggregation functions and different kwargs:

    funcs = {'value': 'sum', 'value2': 'min'}
    kwargs = {'value2': {'min_count': 1}}
    
    r = df.resample('5min')
    
    out = pd.concat({k: r[k].agg([v], **kwargs.get(k, {}))
                     for k, v in funcs.items()}, axis=1)
    

    Output:

               value value2
                 sum    min
    2020-01-01     0    NaN