pythonpandasdataframemosaic-plot

statsmodels.graphics.mosaicplot of a MultiIndex DataFrame


I have collected the frequencies in which combinations of certain categorical parameters occur in a DataFrame. Applying a groupby operation produced another DataFrame with a MultiIndex. Now I would like to visualize the frequencies in a mosaic plot. This is what I have tried:

rows=[
    {"Mode":"ID", "SortBy":"Start", "SortDir":"ASC", "count":10},
    {"Mode":"ID", "SortBy":"Start", "SortDir":"DESC", "count":100},
    {"Mode":"FULL", "SortBy":"End", "SortDir":"DESC", "count":1000}
]
df=pd.DataFrame(rows)
hdf=df.groupby(["Mode", "SortBy", "SortDir"]).sum("count")
mosaic(hdf, index=hdf.index)

But it fails with the following error:

KeyError: "None of [MultiIndex([('FULL',   'End', 'DESC'),\n            (  'ID', 'Start',  'ASC'),\n            (  'ID', 'Start', 'DESC')],\n           names=['Mode', 'SortBy', 'SortDir'])] are in the [columns]"

I was able to produce a diagram using

mosaic(hdf, ["count"])

But this is obviously not what I want: it shows three same-sized rectangles labelled 10, 100, 1000, whereas I was expecting the categories Mode, SortBy, SortDir arranged around the axes and the rectangles reflecting the proportions of counts.


Solution

  • After looking at several other examples it dawned on me that the problem are the levels (combinations) that are missing from the multi-index. When I reindex the hierarchical dataframe on the cartesion product of the categories and then replace the resulting NaNs with a small numerical value (zero wouldn't be accepted either), it works

    hdf=df.groupby(["Mode", "SortBy", "SortDir"])["count"].sum()
    mosaic(hdf.reindex(pd.MultiIndex.from_product(hdf.index.levels)).fillna(0.01))
    

    (note that groupby is used a little differently here to construct hdf compared to my question) enter image description here Seems like a bug to me that mosaic cannot handle this internally, but maybe I'm missing an option?