pythonpandasplotly

Gaps and inconsistent ordering in plotly express bar chart


I have a dataframe that consists of 3 columns, Date, Name, Number. With 5 dates (may change depending on time data extract is run) and 10 names per date. The same name can appear in multiple dates, or may only appear in one date. Numbers can be positive or negative. The data is ordered by Date (ascending=True) and then by Number (ascending=False).

I am trying to plot a chart using plotly express, that has number on Y axis and Date on Axis, with bars coloured by reporter. Bars should be ordered from largest to smallest number per date.

When using this code the ordering is correct for the first date, but after that there are gaps between some bars and the ordering is wrong, for example positive bars being plotted after negative ones.

fig = px.bar(df, x="Date", y="Number", color="Name", barmode="group")

I have tried using fig.update_layout(yaxis={'categoryorder': 'total ascending'}) but this doesn't seem to do anythng.

Please can someone help me format this chart so that there are no gaps and the ordering is correct for all dates.

On further investigation it appears the ordering is set on the first value of the x axis (eg. Day1) and is then kept the same. So if a name is in Day1, but not in Day2, then there will be an empty space in Day2. If a name doesn't appear in Day1, but is in Day2, then that bar will appear and the end, even if it represents a larger Number than the previous bar.

Essentially I need to force Plotly Express to order the bars for each X value independently of eachother.

The below code recreates my issue, albeit with only 2 dates rather than 5, it still demonstrates the problem.

import pandas as pd
import plotly.express as px

df = pd.DataFrame({
    "Name": ["Joe", "Tom", "Tim", "Alex", "Ben", "Steve", "Nick", "Alan", "Jack", "George", "Joe", "Tom", "Tim", "Leo", "Alex", "Ben", "Nick", "Alan", "Jack", "George"],
    "Date": (["01-01-2024"] * 10) + (["01-02-2024"] * 10),
    "Number": [0.5, 0.4, 0.3, 0.2, 0.1, -0.1, -0.2, -0.3, -0.4, -0.5, 0.5, 0.4, 0.3, 0.2, 0.1, -0.1, -0.2, -0.3, -0.4, -0.5]    
})

df["Date"] = pd.to_datetime(df["Date"])
df.sort_values(by=["Date", "Number"], ascending=[True, False], inplace=True)

print(df)

fig=px.bar(df, x="Date", y="Number", color="Name", barmode="group")

fig.show()

UPDATE

After implementing the code suggested by r-beginners, the ordering and gaps issue is now solved, but has introduced other formatting errors, such as overlapping bars and large gaps.

My input data is below:

Date Name Number
2024-02-19 B 80.0
2024-02-19 C 70.0
2024-02-19 A 40.0
2024-02-19 D 30.0
2024-02-19 E 10.0
2024-02-19 G -20.0
2024-02-19 F -40.0
2024-02-19 J -50.0
2024-02-19 I -60.0
2024-02-19 H -90.0
2024-02-20 A 140.0
2024-02-20 C 90.0
2024-02-20 B 80.0
2024-02-20 E 40.0
2024-02-20 K 10.0
2024-02-20 F -10.0
2024-02-20 G -30.0
2024-02-20 I -40.0
2024-02-20 H -90.0
2024-02-20 J -140.0
2024-02-21 C 100.0
2024-02-21 B 90.0
2024-02-21 A 80.0
2024-02-21 D 30.0
2024-02-21 E 20.0
2024-02-21 F -20.0
2024-02-21 G -40.0
2024-02-21 H -100.0
2024-02-21 I -130.0
2024-02-21 J -150.0
2024-02-22 A 30.0
2024-02-22 E 30.0
2024-02-22 B 20.0
2024-02-22 C 10.0
2024-02-22 D 10.0
2024-02-22 F -20.0
2024-02-22 G -50.0
2024-02-22 H -70.0
2024-02-22 I -70.0
2024-02-22 J -110.0
2024-02-23 B 170.0
2024-02-23 C 90.0
2024-02-23 E 50.0
2024-02-23 A 10.0
2024-02-23 D 10.0
2024-02-23 F 50.0
2024-02-23 G -10.0
2024-02-23 H -80.0
2024-02-23 I -80.0
2024-02-23 J -150.0

The code used is:

fig = go.Figure()

for d in df['Date'].unique():
    dff = df.query('Date == @d')
    for n in dff['Name'].unique():
        dfn = dff.query('Name == @n')
        fig.add_trace(go.Bar(
            x=dfn['Date'],
            y=dfn['Number'],
            marker=dict(color=color_dict[n]),
            name=n,
            width=60*60*1000
            )
    )
        names = set()
fig.for_each_trace(
    lambda trace:
        trace.update(showlegend=False)
        if (trace.name in names) else names.add(trace.name))

unique_dates = df["Date"].unique()
print(unique_dates)
min_x = unique_dates[0]
print(type(min_x))
max_x = unique_dates[:-1]
print(type(max_x))

fig.update_layout(xaxis_range=[min_x, max_x])
fig.update_layout(height=500, width=800, barmode='group')
    
fig.show()

and the output looks like this:

output chart


Solution

  • In plotly.express, I think this display is because I have set a category variable for color coding. I created the graph by using a graph object to graph each row of the data frame extracted by date. The color coding for each name is done by creating a dictionary of discrete color scale values and names and setting them to marker colors. I also use a loop process to make duplicate legends unique. Also, the x-axis is limited to a specific time period and graph size.

    update:

    The gap has increased even more as the number of days has increased. While there may be other optimal methods, we came up with an approach that addresses the time series in chronological order. If they overlap on the same day, set the x-axis to different times on the same day; we did this every two hours, so the gap would occur in the bar chart. If you want to eliminate that gap, change it to a continuous value of time.

    import plotly.graph_objects as go
    import plotly.express as px
    
    colors = px.colors.qualitative.Set3
    names = df['Name'].unique()
    color_dict = {k:v for k,v in zip(names, colors)}
    print(color_dict)
    
    time_list = np.arange(2,22,2)
    
    fig = go.Figure()
    
    for d in df['Date'].unique():
        dff = df.query('Date == @d')
        for n,h in zip(dff['Name'].unique(), time_list):
            dfn = dff.query('Name == @n')
            fig.add_trace(go.Bar(
                x=[pd.Timestamp(dfn['Date'].values[0]) + pd.Timedelta(hours=h)],#dfn['Date'],
                y=dfn['Number'],
                marker=dict(color=color_dict[n]),
                name=n,
                width=60*60*1000
                )
        )
    
    names = set()
    fig.for_each_trace(
        lambda trace:
            trace.update(showlegend=False)
            if (trace.name in names) else names.add(trace.name))
    
    unique_dates = df["Date"].unique()
    min_x = unique_dates[0]
    max_x = unique_dates[:-1]
    
    fig.update_layout(xaxis_range=[min_x, max_x])
    fig.update_layout(height=500, width=800, barmode='group')
        
    fig.show()
    

    enter image description here