pythonpandasseabornbar-chartgrouped-bar-chart

How to bar plot the top n categories for each year


I am trying to plot a bar graph which highlights only the top 10 areas in Auckland district by the money spent on gambling. I have written the code to filter for the top 10 areas and also plot a bar plot in Seaborn.

The issue is that the x-axis is crowded with labels of every area in Auckland district from the dataframe. I only want the labels for the top 10 areas to show up. Will appreciate any help from the kind folks out here.

This is a snapshot of the dataframe I am using:

Date,AU2017_code,crime,n,Pop,AU_GMP_PER_CAPITA,Dep_Index,AU2017_name,TA2018_name,TALB
2018-02-01,500100.0,Abduction,0.0,401.0,28.890063,10.0,Awanui,Far North District,Far North District
2018-03-01,500100.0,Abduction,0.0,402.0,28.890063,10.0,Awanui,Far North District,Far North District
2018-04-01,500100.0,Abduction,0.0,408.0,28.890063,10.0,Awanui,Far North District,Far North District
2018-05-01,500100.0,Abduction,0.0,409.0,28.890063,10.0,Awanui,Far North District,Far North District
2018-06-01,500100.0,Abduction,0.0,410.0,28.890063,10.0,Awanui,Far North District,Far North District

The complete dataframe is availiable as a .csv file here: https://github.com/yyshastri/NZ-Police-Community-Dataset.git

The code for the creation of the bar plot is as follows:

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns


# Extract the year from the Date column and create a new 'Year' column
merged_data['Year'] = merged_data.index.year

# Filter data for areas that come under Auckland in the TA2018_name column
auckland_data = merged_data[merged_data['TA2018_name'] == 'Auckland']

# Calculate the average AU_GMP_PER_CAPITA for each area within Auckland
avg_gmp_per_area = auckland_data.groupby('AU2017_name')['AU_GMP_PER_CAPITA'].mean()

# Select the top 10 areas by AU_GMP_PER_CAPITA within Auckland
top_10_areas = avg_gmp_per_area.nlargest(10).index

# Further filter the auckland_data to include only the top 10 areas
filtered_data = auckland_data[auckland_data['AU2017_name'].isin(top_10_areas)]

# Use seaborn to create the barplot
sns.barplot(x='AU2017_name', y='AU_GMP_PER_CAPITA', hue='Year', data=filtered_data)

plt.title('The top 10 areas for gambling spend in Auckland')
plt.xticks(rotation=60)
plt.legend(title='Year', loc='upper right')
plt.figure(figsize = (20, 10))
plt.show()

The chart this code is generating has a garbled x-axis,as every area name in Auckland district being populated in the labels.


Solution

  • import pandas as pd
    
    # read the data from github
    df = pd.read_csv('https://raw.githubusercontent.com/yyshastri/NZ-Police-Community-Dataset/main/Merged_Community_Police_Data.xls')
    
    # select Auckland data
    auckland_data = df[df['TA2018_name'] == 'Auckland'].copy()
    
    # reshape the data with pivot table and aggregate the mean
    dfp = auckland_data.pivot_table(index='Year', columns='AU2017_name', values='AU_GMP_PER_CAPITA', aggfunc='mean')
    
    # for each year find the top 10 cities, and concat them into a single dataframe
    top10 = pd.concat([data.sort_values(ascending=False).iloc[:10].to_frame() for _, data in dfp.iterrows()], axis=1)
    
    # since the city names are long, use a horizontal bar (barh), otherwise use kind='bar'
    ax = top10.plot(kind='barh', figsize=(5, 8), width=0.8,
                    xlabel='Mean GMP PER CAPITA', ylabel='City', title='Yearly Top 10 Cities')
    

    enter image description here

    ax = top10.plot(kind='bar', figsize=(20, 6), width=0.8, rot=0,
                    ylabel='Mean GMP PER CAPITA', xlabel='City', title='Yearly Top 10 Cities')
    

    enter image description here


    import seaborn as sns
    
    # reshape the the dataframe to long form
    top10m = top10.melt(var_name='Year', value_name='Mean GMP PER CAPITA', ignore_index=False).reset_index(names=['City'])
    
    # plot
    g = sns.catplot(data=top10m, kind='bar', x='City', y='Mean GMP PER CAPITA', hue='Year', height=5, aspect=4, palette='tab10', legend='full')
    

    enter image description here


    Data Views

    df

       AU2017_code      crime  n  Pop  AU_GMP_PER_CAPITA  Dep_Index AU2017_name         TA2018_name                TALB  Year
    0       500100  Abduction  0  401          28.890063       10.0      Awanui  Far North District  Far North District  2018
    1       500100  Abduction  0  402          28.890063       10.0      Awanui  Far North District  Far North District  2018
    2       500100  Abduction  0  408          28.890063       10.0      Awanui  Far North District  Far North District  2018
    3       500100  Abduction  0  409          28.890063       10.0      Awanui  Far North District  Far North District  2018
    4       500100  Abduction  0  410          28.890063       10.0      Awanui  Far North District  Far North District  2018
    

    dfp.iloc[:, :10]

    AU2017_name  Abbotts Park  Aiguilles Island    Akarana     Albany  Algies Bay     Ambury      Aorere   Arahanga  Arch Hill    Ardmore
    Year                                                                                                                                 
    2018            41.995023               0.0  48.619904  34.953781    8.989871  57.940325  111.343778  78.498990  58.685772  40.572675
    2019            40.569120               0.0  47.898409  34.046811    9.073010  57.053751  112.236632  78.707498  57.905275  38.060297
    2020            27.936208               0.0  35.284514  25.236172    6.720755  42.324155   84.505122  57.954157  41.092557  26.683718
    

    top10

                            2018        2019        2020
    AU2017_name                                         
    Matheson Bay      214.762992  224.552738  172.133803
    Point Wells       181.298995  188.588469  143.436274
    Leigh             168.446421  172.428979  129.395604
    Papakura North    128.974569  124.977594   90.942141
    Fairburn          128.231566  127.925022   91.885721
    Otahuhu West      127.002810  125.271241   90.084230
    Otahuhu North     123.810519  123.690082   87.164136
    Dingwall          118.963782         NaN   83.436386
    Papatoetoe North  118.210508  113.328798         NaN
    Puhinui South     116.787094  113.630079   85.114301
    Papakura Central         NaN  113.442014         NaN
    Aorere                   NaN         NaN   84.505122
    

    top10m.head()

                 City  Year  Mean GMP PER CAPITA
    0    Matheson Bay  2018           214.762992
    1     Point Wells  2018           181.298995
    2           Leigh  2018           168.446421
    3  Papakura North  2018           128.974569
    4        Fairburn  2018           128.231566