I am trying to resample a pandas dataframe, and for some columns I would like to sum on. additionally, I want to get None/nan as result when there is no rows in a resampling period. For aggregation on a single column, I can do the following:
df = pd.DataFrame(index=[pd.to_datetime('2020-01-01')], columns=['value'])
df.resample('5min').agg("sum", min_count=1)
according to pandas doc, the keyword argument min_count
will be passed to resample.Resampler.sum
associated with the string "sum"
. and the result is desired.
value
2020-01-01 None
However, this won't work if I pass a dictionary as agg input, e.g.
df = pd.DataFrame(index=[pd.to_datetime('2020-01-01')], columns=['value'])
df.resample('5min').agg({'value': 'sum'}, min_count=1)
will output:
value
2020-01-01 0
I would like to know the right way to pass arguments to the aggregation functions specified inside the dict.
This is currently not possible. There is/was a similar issue with agg
.
Assuming multiple columns:
df = pd.DataFrame(index=[pd.to_datetime('2020-01-01')],
columns=['value', 'value2', 'value3'])
If you want to apply the same aggregation, just slice before resample.agg
:
out = df.resample('5min')[['value', 'value2']].agg('sum', min_count=1)
Output:
value value2
2020-01-01 None None
If you need different aggregation functions, use a dictionary and concat
:
funcs = {'value': 'sum', 'value2': 'min'}
r = df.resample('5min')
out = pd.concat({k: r[k].agg([v], min_count=1)
for k, v in funcs.items()}, axis=1)
Output:
value value2
sum min
2020-01-01 None NaN
And if you need different aggregation functions and different kwargs:
funcs = {'value': 'sum', 'value2': 'min'}
kwargs = {'value2': {'min_count': 1}}
r = df.resample('5min')
out = pd.concat({k: r[k].agg([v], **kwargs.get(k, {}))
for k, v in funcs.items()}, axis=1)
Output:
value value2
sum min
2020-01-01 0 NaN