pythonpandasmatplotlibstacked-area-chart

How to plot a stacked area plot


I have a dataframe(df) with two columns: 'Foundation Type', which has 4 types of foundations (Shafts, Piles, Combination, Spread), and another column 'Vs30' with different values for parameter Vs30. Each row represents a bridge, with a type of foundation and a Vs30 value.

First, I create an new column 'binVs30' in df, converting each element of 'Vs30' into different bins, which has 5 different kind of ranges ([0-200],[200-400]...[800-1000]).

df['binVs30'] = pd.cut(df.Vs30, bins=np.arange(0, 1100, 200))

then, I created a stacked area plot with the code as follow:

color_table = pd.crosstab(df['binVs30'], df['Foundation Type'], dropna=False)
ax = color_table.plot(kind='area', figsize=(8, 8), stacked=True, rot=0)
display(ax)
plt.xlabel('')
plt.ylabel('Frequency', fontsize=12)
plt.legend(title='Foundation Type', loc='upper right')
plt.title('Column Database', fontsize='20')
plt.show()

Resulting Picture

The resulting picture shows some extra bins that shouldn't be there. Therefore, I had to fix the xticks by manually adding the following code:

locs, labels = plt.xticks()
    plt.xticks(locs, ['','0-200','','200-400','','400-600','','600-800','','800-1000'], fontsize=10, rotation=45)

xticks-fixed Picture

Is there a reason why Python creates those extra bins that shouldn't exist? Is that a bug that Python has? Since if I change it to a stacked bar plot, the problem just vanished. Is there a way that I could fix it by not manually adding bin code?

Also two other questions are, how to add the edgecolor for an area plot? Something like:

color_table.plot(kind='area', figsize=(8, 8), stacked=True, edgecolor='black', legend=None, rot=0)

The command edgecolor='black' doesn't work in a stacked area plot.

And, if I want to create bin for 'Vs30' like ([0-200],[200-400]...[>800]). Is there a way I can do that? Since the way I create 'binVs30' column doesn't allow me create a bin that is '>800'.


Solution

  • There are a couple of questions here. Firstly about including an open-ended bin in your pd.cut(). You can use np.inf to capture everything in the last bin and assign it a custom label. Secondly, since you're already using matplotlib, I'd recommend using its stacking plot directly rather than via pandas. Then you can use edgecolor argument without any issues.

    import numpy as np
    import matplotlib.pyplot as plt
    import seaborn as sns
    import pandas as pd
    
    df = pd.DataFrame(data={
        "foundation" : np.random.choice(list("ABCD"), 1000),
        "binVs30"    : np.random.randint(0, 1200, 1000)
    })
    
    bins = [0, 200, 400, 600, 800, np.inf]
    labels = ["0-199", "200-399", "400-599", "600-799", "800+"]
    
    df["bins"] = pd.cut(
        df["binVs30"], bins=bins, labels=labels,
        right=False, include_lowest=True)
    
    stack_data = pd.crosstab(df['bins'], df['foundation'], dropna=False)
    
    stack_array = stack_data.values.T.tolist()
    
    pal = sns.color_palette("Set1")
    plt.figure(figsize=(8,4))
    
    plt.stackplot(
        labels, stack_array, labels=list("ABCD"),
        colors=pal, alpha=0.4, edgecolor="black")
    
    plt.legend(loc='upper left')
    plt.show()
    

    enter image description here