pythonmatplotlibmosaic-plot

How to adjust color and text in a statsmodels mosaic plot?


I got the following code from the last plot in the Mosaic doc page:

import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.graphics.mosaicplot import mosaic

gender = ['male', 'male', 'male', 'female', 'female', 'female']
pet = ['cat', 'dog', 'dog', 'cat', 'dog', 'cat']
data = pd.DataFrame({'gender': gender, 'pet': pet})
mosaic(data, ['pet', 'gender'],  gap=0.06, title='DataFrame')
plt.show()

However, I'd like the color of the plot to be the same across the horizontal, i.e. grouping the females together in the cat and dog category with the same color. It should also apply to the male. I also want to increase the figsize and be able to input the percent proportion in the tile.

I experimented with the parameters but could not find a way to do it.


Solution

  • The figsize can be set the standard matplotlib way: fig, ax = plt.subplots(figsize=....)) and passing the ax to the mosaic() function.

    The color can be changed via the properties= parameter. This is a function that gets a key as input (e.g. ('cat', 'female')) and outputs a dictionary with Rectangle properties such as facecolor, alpha, hatch, linestyle, ... . The example below colors all cats green-blueish and all dogs brown. To make a difference between male and female, hatching or alpha could be set different.

    The title can be passed via the title= parameter. The example uses an f-string with the percentage of cats.

    import pandas as pd
    import matplotlib.pyplot as plt
    from statsmodels.graphics.mosaicplot import mosaic
    
    gender = ['male', 'male', 'male', 'female', 'female', 'female']
    pet = ['cat', 'dog', 'dog', 'cat', 'dog', 'cat']
    third_col = [2, 3, 4, 5, 6, 7]
    data = pd.DataFrame({'gender': gender, 'pet': pet, 'third': third_col})
    
    percent_cats = f"cats: {100 * len(data[data['pet'] == 'cat']) / len(data):.1f} %"
    props = lambda key: {'color': 'turquoise' if 'cat' in key else 'sienna'}
    fig, ax = plt.subplots(figsize=(12, 4))
    mosaic(data, ['pet', 'gender'], gap=0.06, title=percent_cats, properties=props, ax=ax)
    plt.show()
    

    resulting plot

    Here is another example, with separate colors for male and female, with a changed order of the columns used and making the first layout direction horizontal.

    percent_3rd_col = 100 * data[data['gender'] == 'female']['third'].sum() / data['third'].sum()
    title = f"percent female: {percent_3rd_col:.1f} %"
    props = lambda key: {'color': 'fuchsia' if 'female' in key else 'deepskyblue'}
    mosaic(data, ['gender', 'pet'], horizontal=False, gap=0.06, title=title, properties=props, ax=ax)
    

    female colored equal