I have collected the frequencies in which combinations of certain categorical parameters occur in a DataFrame. Applying a groupby operation produced another DataFrame with a MultiIndex. Now I would like to visualize the frequencies in a mosaic plot. This is what I have tried:
rows=[
{"Mode":"ID", "SortBy":"Start", "SortDir":"ASC", "count":10},
{"Mode":"ID", "SortBy":"Start", "SortDir":"DESC", "count":100},
{"Mode":"FULL", "SortBy":"End", "SortDir":"DESC", "count":1000}
]
df=pd.DataFrame(rows)
hdf=df.groupby(["Mode", "SortBy", "SortDir"]).sum("count")
mosaic(hdf, index=hdf.index)
But it fails with the following error:
KeyError: "None of [MultiIndex([('FULL', 'End', 'DESC'),\n ( 'ID', 'Start', 'ASC'),\n ( 'ID', 'Start', 'DESC')],\n names=['Mode', 'SortBy', 'SortDir'])] are in the [columns]"
I was able to produce a diagram using
mosaic(hdf, ["count"])
But this is obviously not what I want: it shows three same-sized rectangles labelled 10, 100, 1000, whereas I was expecting the categories Mode, SortBy, SortDir arranged around the axes and the rectangles reflecting the proportions of counts.
After looking at several other examples it dawned on me that the problem are the levels (combinations) that are missing from the multi-index. When I reindex the hierarchical dataframe on the cartesion product of the categories and then replace the resulting NaNs with a small numerical value (zero wouldn't be accepted either), it works
hdf=df.groupby(["Mode", "SortBy", "SortDir"])["count"].sum()
mosaic(hdf.reindex(pd.MultiIndex.from_product(hdf.index.levels)).fillna(0.01))
(note that groupby is used a little differently here to construct hdf
compared to my question)
Seems like a bug to me that mosaic
cannot handle this internally, but maybe I'm missing an option?