pythonpandasdaskdask-dataframe

dask dataframe aggregation without groupby (ddf.agg(['min','max'])?


Pandas define dataframe.agg, but DASK only defines dask_dataframe.groupby.agg.

Is there a way to have multiple aggregations over a column in dask without groupby?

I know describe() has columns statistics, which solves one specific problem, but I'm looking for a general solution.

First try was to create a dummy columns with a single value and groupby(['min','max']). The result worked but the dask_DF created was a single row, multi-index column which dask can't transpose or stack (unimplemented, unless I'm doing it wrong). I would like to keep all in dask even though the result table is small enough to run in pandas alone, and quite trivial to process, but I'm thinking about how to do it in a general situation where exporting, re-importing to pandas from a local result is unfeasible.


Solution

  • dask.series.reduction might do the trick, see docs.

    IIUC, the key is to construct the relevant functions: aggregate and combine.

    Update: there is also dask.dataframe.reduction, see docs.