I got the following code from the last plot in the Mosaic doc page:
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.graphics.mosaicplot import mosaic
gender = ['male', 'male', 'male', 'female', 'female', 'female']
pet = ['cat', 'dog', 'dog', 'cat', 'dog', 'cat']
data = pd.DataFrame({'gender': gender, 'pet': pet})
mosaic(data, ['pet', 'gender'], gap=0.06, title='DataFrame')
plt.show()
However, I'd like the color of the plot to be the same across the horizontal, i.e. grouping the females together in the cat and dog category with the same color. It should also apply to the male. I also want to increase the figsize
and be able to input the percent proportion in the tile.
I experimented with the parameters but could not find a way to do it.
The figsize can be set the standard matplotlib way: fig, ax = plt.subplots(figsize=....))
and passing the ax
to the mosaic()
function.
The color can be changed via the properties=
parameter. This is a function that gets a key as input (e.g. ('cat', 'female')
) and outputs a dictionary with Rectangle
properties such as facecolor, alpha, hatch, linestyle, ... . The example below colors all cats green-blueish and all dogs brown. To make a difference between male and female, hatching or alpha could be set different.
The title can be passed via the title=
parameter. The example uses an f-string with the percentage of cats.
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.graphics.mosaicplot import mosaic
gender = ['male', 'male', 'male', 'female', 'female', 'female']
pet = ['cat', 'dog', 'dog', 'cat', 'dog', 'cat']
third_col = [2, 3, 4, 5, 6, 7]
data = pd.DataFrame({'gender': gender, 'pet': pet, 'third': third_col})
percent_cats = f"cats: {100 * len(data[data['pet'] == 'cat']) / len(data):.1f} %"
props = lambda key: {'color': 'turquoise' if 'cat' in key else 'sienna'}
fig, ax = plt.subplots(figsize=(12, 4))
mosaic(data, ['pet', 'gender'], gap=0.06, title=percent_cats, properties=props, ax=ax)
plt.show()
Here is another example, with separate colors for male and female, with a changed order of the columns used and making the first layout direction horizontal.
percent_3rd_col = 100 * data[data['gender'] == 'female']['third'].sum() / data['third'].sum()
title = f"percent female: {percent_3rd_col:.1f} %"
props = lambda key: {'color': 'fuchsia' if 'female' in key else 'deepskyblue'}
mosaic(data, ['gender', 'pet'], horizontal=False, gap=0.06, title=title, properties=props, ax=ax)