pythonpandasseaborndata-analysis

Seaborn Linegraph with binned values


I have the following code and graph:

bins = [0, 5, 15, 25, 50, 75, 100, 125, 150, 175, 200, 250, 300, 400, 500, 600, 700, 850, 1000, 5000, 100000]
df['articles_binned'] = pd.cut(df['ARTICLES'], bins)

export_df = df.groupby(['articles_binned', 'SUB_TYPE'])['CANCELLATION_FLAG'].mean().mul(100).reset_index()
export_df = export_df.rename(columns={'CANCELLATION_FLAG': 'Cancellation_Percentage'})
    
# Plot the bar chart
ax = sns.barplot(x='articles_binned', 
            y='Cancellation_Percentage', 
            hue='SUB_TYPE', 
            data=export_df)

enter image description here

But now I would actually like to see this information in a line graph. With a different color for every sub_type. Seaborn doesn't accept the bins, even though they are categorical. How should I go about this?


Solution

  • The example below plots the data as a line plot, with a colour per SUB_TYPE.

    The articles_binned column is formatted as a string. If it doesn't work with your data, consider including some sample data in your question to help with debugging.

    Update: OP confirms that converting the articles_binned column to strings resolved the problem.

    enter image description here

    enter image description here

        SUB_TYPE    articles_binned Cancellation_Percentage
    0   Type A  (0, 5]  14
    1   Type A  (5, 15] 21
    ...
    14  Type D  (15, 25]    14
    15  Type D  (25, 50]    13
    

    Reproducible example

    import pandas as pd
    
    #
    # Data for testing
    #
    data = {
        'SUB_TYPE': [
            'Type A', 'Type A', 'Type A', 'Type A',
            'Type B', 'Type B', 'Type B', 'Type B',
            'Type C', 'Type C', 'Type C', 'Type C',
            'Type D', 'Type D', 'Type D', 'Type D'
        ],
        'articles_binned': [
            '(0, 5]', '(5, 15]', '(15, 25]', '(25, 50]',
            '(0, 5]', '(5, 15]', '(15, 25]', '(25, 50]',
            '(0, 5]', '(5, 15]', '(15, 25]', '(25, 50]',
            '(0, 5]', '(5, 15]', '(15, 25]', '(25, 50]'
        ],
        'Cancellation_Percentage': [
            14, 21, 14, 13,
            16, 25, 18, 17,
            21, 21, 19, 12,
            15, 16, 14, 13
        ]
    }
    
    df = pd.DataFrame(data)
    display(df)
    
    #
    # Plot
    #
    import seaborn as sns
    import matplotlib.pyplot as plt
    
    #Bar plot
    ax = sns.barplot(
        df, x='articles_binned', y='Cancellation_Percentage',
        hue='SUB_TYPE', legend=False
    )
    
    sns.despine(ax.figure)
    ax.figure.set_size_inches(5, 2.5)
    plt.show()
    
    #Line plot
    ax = sns.lineplot(
        df, x='articles_binned', y='Cancellation_Percentage',
        hue='SUB_TYPE',
        marker='s', linewidth=1,
    )
    ax.figure.set_size_inches(5, 3)
    sns.despine(ax.figure)
    plt.show()