pythonstacked-area-chart

python Stacked area chart


I am trying to create a stacked area chart, showing the evolution of courses and their numbers over time. So my data frame is (index=Year):

                    Area  Courses
Year                             
1900         Agriculture      0.0
1900        Architecture     32.0
1900           Astronomy     10.0
1900             Biology     20.0
1900           Chemistry     25.0
1900   Civil Engineering     21.0
1900           Education     14.0
1900  Engineering Design     10.0
1900             English     30.0
1900           Geography      1.0

Last year: 2011.

I tried several solutions, such as df.plot.area(), df.plot.area(x='Years'). Then I thought it would help to have the Areas as columns so I tried

df.pivot_table(index = 'Year', columns = 'Area', values = 'Courses', aggfunc = 'sum')

but instead of getting sum of courses per year, I got:

Area  Aeronautical Engineering  ...  Visual Design
Year                            ...               
1900                       NaN  ...            NaN
1901                       NaN  ...            NaN

Thanks for your help. It's my first post. Sorry if I missed something.

Update. Here is my code:

df = pd.read_csv(filepath, encoding= 'unicode_escape')
df = df.groupby(['Year','GenArea'])['Taught'].sum().to_frame(name = 'Courses').reset_index()
plt.stackplot(df['Year'], df['Courses'], labels = df['GenArea'])
plt.legend(loc='upper left')
plt.show()

And here is the link for the dataset: https://data.world/makeovermonday/2020w12


Solution

  • With the extra given information I made this. Hope you like it!

    import pandas as pd
    import matplotlib.pyplot as plt
    
    plt.close('all')
    
    df=pd.read_csv('https://query.data.world/s/djx5mi7dociacx7smdk45pfmwp3vjo',
                   encoding='unicode_escape')
    df=df.groupby(['Year','GenArea'])['Taught'].sum().to_frame(name=
                 'Courses').reset_index()
    aux1=df.duplicated(subset='GenArea', keep='first').values
    aux2=df.duplicated(subset='Year', keep='first').values
    
    n=len(aux1);year=[];courses=[]
    
    for i in range(n):
        if not aux1[i]:
            courses.append(df.iloc[i]['GenArea'])
        if not aux2[i]:
            year.append(df.iloc[i]['Year'])
        else:
            continue
    
    del aux1,aux2
    df1=pd.DataFrame(index=year)
    s=0
    
    for i in range(len(courses)):
        df1[courses[i]]=0
    for i in range(n):
        string=df.iloc[i]['GenArea']
        if any(df1.iloc[s].values==0):
            df1.at[year[s],string]=df.iloc[i]['Courses']
        else:
            s+=1
            df1.at[year[s],string]=df.iloc[i]['Courses']
    
    del year,courses,df
    df1=df1[df1.columns[::-1]]
    df1.plot.area(legend='reverse')
    

    Example