I'm working on a data visualization project where I want to create a bar chart similar to the one shown in this . The image is from a story available here.
I've written Python code using pandas, seaborn, and matplotlib to visualize the data. Here's my code snippet:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
box = pd.read_csv("box_office_18_23.csv")
# Data Cleaning
box["overall_gross"] = box["overall_gross"].str.replace("$", "").str.replace(",", "").astype(int)
# Data Analysis
sns.barplot(x='year', y='overall_gross', data=box)
plt.show()
I have uploaded my Jupyter Notebook and the relevant dataset (CSV file) to this Google Drive link.
While my code runs without errors, the resulting bar chart doesn't match the desired visualization. I'm looking for guidance on how to modify my code to achieve a similar year-wise bar chart as shown in the reference image.
Also if other libraries or tools can do the job , let me know that too.
Thank you for your help!
The following is what you will create with Plotly's library. First, to prepare the data, the date is added from the year and week number to make a time series data on the x-axis. Next, weekly totals and a list of weekly movie names are created and combined to create the graph data. Graph Customization:
import pandas as pd
box = pd.read_csv('./data/box_office_18_23.csv')
box["overall_gross"] = box["overall_gross"].str.replace("$", "").str.replace(",", "").astype(int)
import datetime
box['yyyy-mm-dd'] = pd.to_datetime(box['year'].astype(str) + '-' +box['week_no'].astype(str) + "-1", format='%G-%V-%w')
box_yearWeek = box[['yyyy-mm-dd','overall_gross']].groupby(['yyyy-mm-dd']).sum()
box_topRelease = box[['yyyy-mm-dd','top_release']].groupby(['yyyy-mm-dd'])['top_release'].apply(list)
box_merge = box_yearWeek.merge(box_topRelease, left_index=True, right_index=True)
box_merge.reset_index(inplace=True)
annotations = box_merge.sort_values('overall_gross', ascending=False)
import plotly.graph_objects as go
fig = go.Figure()
fig.add_trace(go.Bar(
x=box_merge['yyyy-mm-dd'],
y=box_merge['overall_gross'],
marker=dict(color='navy'),
name='overall_gross',
showlegend=True
))
# bar_marker "red" for "The Super Mario Bros. Movie"
colors = ['navy']*len(box_merge)
colors[273] ='red' # 273 is index
fig.update_traces(marker_color=colors)
# Annotation: Overall_gross TOP 6
for row in annotations[['overall_gross','top_release']][:6].itertuples():
fig.add_annotation(
x=box_merge.loc[row[0], 'yyyy-mm-dd'],
y=row[1],
text=row[2][0],
showarrow=True,
arrowhead=1,
)
# Annotation: "Source"
fig.add_annotation(xref='paper',
x=-0.03,
yref='paper',
y=-0.08,
text='Source:xxxx',
showarrow=False)
# Yaxis tickformat custome
fig.update_yaxes(tickformat='$.2s')
# lgend position move
fig.update_layout(legend=dict(
orientation="h",
yanchor="bottom",
y=0.9,
xanchor="right",
x=0.9
))
# title font-family font-size, backgroud-color,margein etc.
fig.update_layout(template='plotly_white',
title_text='Creating a Year-wise Bar Chart <br>Visualization from CSV Data',
title_font=dict(family='Rockwell', color='lightseagreen',size=48),
width=800,
height=800,
plot_bgcolor='rgba(224,255,255,0.5)',
paper_bgcolor='rgba(224,255,255,0.5)',
margin=dict(t=150,b=60,l=0,r=0)
)
fig.show()