I am trying to plot a bar graph which highlights only the top 10 areas in Auckland district by the money spent on gambling. I have written the code to filter for the top 10 areas and also plot a bar plot in Seaborn.
The issue is that the x-axis is crowded with labels of every area in Auckland district from the dataframe. I only want the labels for the top 10 areas to show up. Will appreciate any help from the kind folks out here.
This is a snapshot of the dataframe I am using:
Date,AU2017_code,crime,n,Pop,AU_GMP_PER_CAPITA,Dep_Index,AU2017_name,TA2018_name,TALB
2018-02-01,500100.0,Abduction,0.0,401.0,28.890063,10.0,Awanui,Far North District,Far North District
2018-03-01,500100.0,Abduction,0.0,402.0,28.890063,10.0,Awanui,Far North District,Far North District
2018-04-01,500100.0,Abduction,0.0,408.0,28.890063,10.0,Awanui,Far North District,Far North District
2018-05-01,500100.0,Abduction,0.0,409.0,28.890063,10.0,Awanui,Far North District,Far North District
2018-06-01,500100.0,Abduction,0.0,410.0,28.890063,10.0,Awanui,Far North District,Far North District
The complete dataframe is availiable as a .csv file here: https://github.com/yyshastri/NZ-Police-Community-Dataset.git
The code for the creation of the bar plot is as follows:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
# Extract the year from the Date column and create a new 'Year' column
merged_data['Year'] = merged_data.index.year
# Filter data for areas that come under Auckland in the TA2018_name column
auckland_data = merged_data[merged_data['TA2018_name'] == 'Auckland']
# Calculate the average AU_GMP_PER_CAPITA for each area within Auckland
avg_gmp_per_area = auckland_data.groupby('AU2017_name')['AU_GMP_PER_CAPITA'].mean()
# Select the top 10 areas by AU_GMP_PER_CAPITA within Auckland
top_10_areas = avg_gmp_per_area.nlargest(10).index
# Further filter the auckland_data to include only the top 10 areas
filtered_data = auckland_data[auckland_data['AU2017_name'].isin(top_10_areas)]
# Use seaborn to create the barplot
sns.barplot(x='AU2017_name', y='AU_GMP_PER_CAPITA', hue='Year', data=filtered_data)
plt.title('The top 10 areas for gambling spend in Auckland')
plt.xticks(rotation=60)
plt.legend(title='Year', loc='upper right')
plt.figure(figsize = (20, 10))
plt.show()
The chart this code is generating has a garbled x-axis,as every area name in Auckland district being populated in the labels.
matplotlib
as the default plotting backend.
pandas.DataFrame.plot
, and avoid the extra import and dataframe reshaping..pivot_table
is used to reshape the dataframe and aggregate multiple values with 'mean'
.kind='barh'
, horizontal bars, looks cleaner than using kind='bar'
.python 3.12.0
, pandas 2.1.1
, matplotlib 3.8.0
, seaborn 0.13.0
import pandas as pd
# read the data from github
df = pd.read_csv('https://raw.githubusercontent.com/yyshastri/NZ-Police-Community-Dataset/main/Merged_Community_Police_Data.xls')
# select Auckland data
auckland_data = df[df['TA2018_name'] == 'Auckland'].copy()
# reshape the data with pivot table and aggregate the mean
dfp = auckland_data.pivot_table(index='Year', columns='AU2017_name', values='AU_GMP_PER_CAPITA', aggfunc='mean')
# for each year find the top 10 cities, and concat them into a single dataframe
top10 = pd.concat([data.sort_values(ascending=False).iloc[:10].to_frame() for _, data in dfp.iterrows()], axis=1)
# since the city names are long, use a horizontal bar (barh), otherwise use kind='bar'
ax = top10.plot(kind='barh', figsize=(5, 8), width=0.8,
xlabel='Mean GMP PER CAPITA', ylabel='City', title='Yearly Top 10 Cities')
ax = top10.plot(kind='bar', figsize=(20, 6), width=0.8, rot=0,
ylabel='Mean GMP PER CAPITA', xlabel='City', title='Yearly Top 10 Cities')
seaborn
requires converting top10
from wide, to long-form, with pandas.DataFrame.melt
.sns.catplot
with kind='bar'
is used, but the axes-level function sns.barplot
will also work.
import seaborn as sns
# reshape the the dataframe to long form
top10m = top10.melt(var_name='Year', value_name='Mean GMP PER CAPITA', ignore_index=False).reset_index(names=['City'])
# plot
g = sns.catplot(data=top10m, kind='bar', x='City', y='Mean GMP PER CAPITA', hue='Year', height=5, aspect=4, palette='tab10', legend='full')
df
auckland_data
looks the same as df
except it's a subset AU2017_code crime n Pop AU_GMP_PER_CAPITA Dep_Index AU2017_name TA2018_name TALB Year
0 500100 Abduction 0 401 28.890063 10.0 Awanui Far North District Far North District 2018
1 500100 Abduction 0 402 28.890063 10.0 Awanui Far North District Far North District 2018
2 500100 Abduction 0 408 28.890063 10.0 Awanui Far North District Far North District 2018
3 500100 Abduction 0 409 28.890063 10.0 Awanui Far North District Far North District 2018
4 500100 Abduction 0 410 28.890063 10.0 Awanui Far North District Far North District 2018
dfp.iloc[:, :10]
AU2017_name Abbotts Park Aiguilles Island Akarana Albany Algies Bay Ambury Aorere Arahanga Arch Hill Ardmore
Year
2018 41.995023 0.0 48.619904 34.953781 8.989871 57.940325 111.343778 78.498990 58.685772 40.572675
2019 40.569120 0.0 47.898409 34.046811 9.073010 57.053751 112.236632 78.707498 57.905275 38.060297
2020 27.936208 0.0 35.284514 25.236172 6.720755 42.324155 84.505122 57.954157 41.092557 26.683718
top10
2018 2019 2020
AU2017_name
Matheson Bay 214.762992 224.552738 172.133803
Point Wells 181.298995 188.588469 143.436274
Leigh 168.446421 172.428979 129.395604
Papakura North 128.974569 124.977594 90.942141
Fairburn 128.231566 127.925022 91.885721
Otahuhu West 127.002810 125.271241 90.084230
Otahuhu North 123.810519 123.690082 87.164136
Dingwall 118.963782 NaN 83.436386
Papatoetoe North 118.210508 113.328798 NaN
Puhinui South 116.787094 113.630079 85.114301
Papakura Central NaN 113.442014 NaN
Aorere NaN NaN 84.505122
top10m.head()
City Year Mean GMP PER CAPITA
0 Matheson Bay 2018 214.762992
1 Point Wells 2018 181.298995
2 Leigh 2018 168.446421
3 Papakura North 2018 128.974569
4 Fairburn 2018 128.231566