pythonpandasvisualizationdata-analysisgraph-visualization

Creating a Year-wise Bar Chart Visualization from CSV Data


Problem:

I'm working on a data visualization project where I want to create a bar chart similar to the one shown in this reference image. The image is from a story available here.

My Effort:

I've written Python code using pandas, seaborn, and matplotlib to visualize the data. Here's my code snippet:

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

box = pd.read_csv("box_office_18_23.csv")

# Data Cleaning
box["overall_gross"] = box["overall_gross"].str.replace("$", "").str.replace(",", "").astype(int)

# Data Analysis
sns.barplot(x='year', y='overall_gross', data=box)
plt.show()

Output:

Here's the output of my code: Output Image

Link to Code and Dataset:

I have uploaded my Jupyter Notebook and the relevant dataset (CSV file) to this Google Drive link.

Issue:

While my code runs without errors, the resulting bar chart doesn't match the desired visualization. I'm looking for guidance on how to modify my code to achieve a similar year-wise bar chart as shown in the reference image.

Also if other libraries or tools can do the job , let me know that too.

Thank you for your help!


Solution

  • The following is what you will create with Plotly's library. First, to prepare the data, the date is added from the year and week number to make a time series data on the x-axis. Next, weekly totals and a list of weekly movie names are created and combined to create the graph data. Graph Customization:

    import pandas as pd
    
    box = pd.read_csv('./data/box_office_18_23.csv')
    box["overall_gross"] = box["overall_gross"].str.replace("$", "").str.replace(",", "").astype(int)
    import datetime
    box['yyyy-mm-dd'] = pd.to_datetime(box['year'].astype(str) + '-' +box['week_no'].astype(str) + "-1", format='%G-%V-%w')
    box_yearWeek = box[['yyyy-mm-dd','overall_gross']].groupby(['yyyy-mm-dd']).sum()
    box_topRelease = box[['yyyy-mm-dd','top_release']].groupby(['yyyy-mm-dd'])['top_release'].apply(list)
    box_merge = box_yearWeek.merge(box_topRelease, left_index=True, right_index=True) 
    box_merge.reset_index(inplace=True)
    annotations = box_merge.sort_values('overall_gross', ascending=False)
    
    import plotly.graph_objects as go
    
    fig = go.Figure()
    fig.add_trace(go.Bar(
        x=box_merge['yyyy-mm-dd'],
        y=box_merge['overall_gross'],
        marker=dict(color='navy'),
        name='overall_gross',
        showlegend=True
    ))
    
    # bar_marker "red" for "The Super Mario Bros. Movie"
    colors = ['navy']*len(box_merge)
    colors[273] ='red' # 273 is index
    fig.update_traces(marker_color=colors)
    
    # Annotation: Overall_gross TOP 6 
    for row in annotations[['overall_gross','top_release']][:6].itertuples():
        fig.add_annotation(
            x=box_merge.loc[row[0], 'yyyy-mm-dd'],
            y=row[1],
            text=row[2][0],
            showarrow=True,
            arrowhead=1,
        )
    # Annotation: "Source"
    fig.add_annotation(xref='paper',
                       x=-0.03,
                       yref='paper',
                       y=-0.08,
                       text='Source:xxxx',
                       showarrow=False)
    
    # Yaxis tickformat custome
    fig.update_yaxes(tickformat='$.2s')
    
    # lgend position move
    fig.update_layout(legend=dict(
        orientation="h",
        yanchor="bottom",
        y=0.9,
        xanchor="right",
        x=0.9
    ))
    # title font-family font-size, backgroud-color,margein etc.
    fig.update_layout(template='plotly_white',
                      title_text='Creating a Year-wise Bar Chart <br>Visualization from CSV Data',
                      title_font=dict(family='Rockwell', color='lightseagreen',size=48),
                      width=800,
                      height=800,
                      plot_bgcolor='rgba(224,255,255,0.5)',
                      paper_bgcolor='rgba(224,255,255,0.5)',
                      margin=dict(t=150,b=60,l=0,r=0)
                     )
    fig.show()
    

    enter image description here