pythonmatplotlibplotlogistic-regression

What kind of plot is this called? And how do I plot it with matplotlib?


I came across this paper which presented this plot below. Could someone share what is this kind of plot called? And how can I plot a similar chart with python, specifically matplotlib? I would need to present predictions from a log regression as well, hence the question.

Thanks!

enter image description here


Solution

  • At First Glance...

    This is almost certainly an error bar with grouped partitions. Since you asked how one can plot such a graph, let's understand some basics first.


    Error Bars Primer

    Definition

    An error bar is a graph used to model and/or illustrate variability and uncertainty as an heuristic of data analysis. It allows you to visualize the precision of data points, and it can be used to model standard deviation, standard error, confidence intervals, or range (Cumming, Fidler & Vaux, 2007). This is done through the use of markers drawn over the original graph and its data points, juxtaposed with cap-tipped lines (or caps) extending from the center of the plotted data point.

    Caps add a touch of visual aesthetics to your plot (subjective opinion), helping you quickly conceptualize the boundaries in relation to your data points. The sample you've provided however (if it is indeed an error bar), does not utilize caps. This can actually enhance visibility of the error bar endpoints which might be useful in plots with many overlapping elements.

    A relatively short error bar signifies a condensed/concentrated value distribution, meaning that the data implied average is more likely. Contrastingly, a relatively long error bar is the obvious antithesis - it suggests sparse/wide distribution and that the average value is less unlikely.

    Anatomy of an Error Bar (source)

    Anatomy of an Error Bar

    Furthermore, Error bars can be symmetrical (the same length above and below the data point) or asymmetrical (varying lengths).

    Error bars can be applied to scatterplots, dot plots, bar charts, or line graphs, to provide an additional layer of details that expands on the information presented by the initial data (The Data Visualization Catalogue article).

    The error value in such a graph is the amounts by which your data points deviates from the expected value, and can be specified as fixed value or as a percentage of the data point (the latter, I believe is what your source image has presented).

    Interpretation

    In regards to your source material which I've also studied briefly, my interpretation (of the particular cropped portion in your question) is the authors seem to be testing a probabilistic model analyzing the likelihood of vaccinated patients to be hospitalized upon experiencing severe Omicron variant infections, against actual outcomes recorded in reality. So the error bars is used there as a measure of the distribution of accuracy of the model's results. In the ML sense, I believe it's a way of asking:

    How well does the model fit the training data?


    Error Bars in Matplotlib

    A Simple Error Bar

    The official documentation has a rich collection of useful error bar examples, but let's buildup from basics all the way to plotting an identical one to your sample.

    You can draw a simple error bar using matplotlib.pyplot.errorbar() as follows:

    import matplotlib.pyplot as plt
    
    # Sample data
    x = [1, 2, 3]
    y = [10, 20, 30]
    yerr = [2, 3, 1]  # Error values for y
    
    # Create a plot with error bars
    plt.errorbar(x, y, yerr=yerr, fmt="o", capsize=5, label="Data with error bars")
    
    # Label axes
    plt.xlabel("X-axis")
    plt.ylabel("Y-axis")
    
    # Show legend
    plt.legend()
    
    # Show plot
    plt.show()
    
    

    Output:

    Error Bar Basic Example

    The key takeaway is the fact that we have our error values readily available and easily pass it to the yerr parameter of errorbar(), everything else is pretty much trivial.

    To achieve a capless error bar plot, you can always set the capsize parameter of errorbar() to 0. Similarly, you can also plot your error bar data points with markers by specifying an additional keyword argument marker to errorbar() with any of the following values: 'o', 's', '^', 'D', 'P'. We'll be doing this in a more sophisticated way when we attempt to plot an identical graph to your sample.

    So, "How can I plot a similar chart in Python with Matplotlib?"

    The code for generating the plot in your sample would look something like this. It is quite straightforward and self-explanatory, but since I'm making up (almost a close guesstimation to the actual data points but ultimately bogus) data points, I've strategically spaced out the intervals for the data points.

    import matplotlib.pyplot as plt
    import numpy as np
    
    fig, ax = plt.subplots()
    
    # Categories/x-axis data and their positions
    categories = ["Primary + booster < 1yr", "Primary + booster >= 1yr", "At most primary"]
    positions = np.arange(len(categories))
    
    # Set the y-axis label and ticks
    ax.set_ylabel("Probability hospital (%)")
    ax.set_yticks(np.arange(0, 101, 25))
    
    # Set the x-axis with categorical labels
    ax.set_xticks(positions)
    ax.set_xticklabels(categories)
    
    # Define the colors and markers
    colors = ['black', 'blue', 'green', 'yellow', 'red']
    markers = ['o', 's', '^', 'D', 'P']
    
    # Define the offsets for spacing the error bars
    offsets = np.linspace(-0.1, 0.1, len(colors))
    
    # Guesstimations of vertical lines with different ranges for each category
    y_ranges = [
        [(45, 50), (50, 55), (52, 58), (48, 53), (47, 56)],
        [(55, 60), (60, 65), (62, 68), (58, 63), (57, 66)],
        [(75, 80), (80, 85), (82, 88), (78, 83), (77, 86)]
    ]
    
    # Plot the error bars for each category
    for i, ranges in enumerate(y_ranges):
        for j, (color, marker, offset, (ymin, ymax)) in enumerate(zip(colors, markers, offsets, ranges)):
            x_position = positions[i] + offset  # Adjust x position with offset
            y = (ymin + ymax) / 2  # Center point for marker
            yerr = (ymax - ymin) / 2  # Error value for the error bar      # capsize=5
            ax.errorbar(x_position, y, yerr=yerr, fmt=marker, color=color, capsize=0, label=f'{categories[i]} - Line {j+1}')
    
    # Add internal text-label in the top-left corner
    ax.text(0.05, 0.95, "At least 60 yrs", transform=ax.transAxes,
            fontsize=12, verticalalignment='top', bbox=dict(facecolor='white', edgecolor='none'))
    
    # Bborder to the plot
    for spine in ax.spines.values():
        spine.set_edgecolor('black')
    
    # Show the plot
    plt.show()
    
    

    Output:

    Standard Error Bar


    Last Considerations: What if it's not an Error Bar (however unlikely)?

    Provided you have your data points, you can still manually plot something identical using ordinary line graph functionalities of Matplotlib.

    import matplotlib.pyplot as plt
    import numpy as np
    
    fig, ax = plt.subplots()
    
    # Categories/x-axis data and their positions
    categories = ["Primary + booster < 1yr", "Primary + booster >= 1yr", "At most primary"]
    positions = np.arange(len(categories))
    
    # Set the y-axis label and ticks
    ax.set_ylabel("Probability hospital (%)")
    ax.set_yticks(np.arange(0, 101, 25))
    
    # Set the x-axis with categorical labels
    ax.set_xticks(positions)
    ax.set_xticklabels(categories)
    
    # Colors and markers
    colors = ['black', 'blue', 'green', 'yellow', 'red']
    markers = ['o', 's', '^', 'D', 'P']
    
    # Offsets for spacing the vertical lines
    offsets = np.linspace(-0.1, 0.1, len(colors))
    
    # Guesstimations of vertical lines with different ranges for each category
    range_limits = [(45, 60), (55, 70), (75, 90)]
    y_ranges = [
        [(45, 50), (50, 55), (52, 58), (48, 53), (47, 56)],
        [(55, 60), (60, 65), (62, 68), (58, 63), (57, 66)],
        [(75, 80), (80, 85), (82, 88), (78, 83), (77, 86)]
    ]
    
    for i, (limits, ranges) in enumerate(zip(range_limits, y_ranges)):
        for j, (color, marker, offset, (ymin, ymax)) in enumerate(zip(colors, markers, offsets, ranges)):
            x_position = positions[i] + offset  # Adjust x position with offset
            y = (ymin + ymax) / 2  # Center point for marker
            ax.vlines(x_position, ymin, ymax, color=color, label=f'{categories[i]} - Line {j+1}')
            ax.scatter(x_position, y, color=color, marker=marker)
    
    # internal text-label in the top-left corner
    ax.text(0.05, 0.95, "At least 60 yrs", transform=ax.transAxes,
            fontsize=12, verticalalignment='top', bbox=dict(facecolor='white', edgecolor='none'))
    
    # Border to the plot
    for spine in ax.spines.values():
        spine.set_edgecolor('black')
    
    # Voila!
    plt.show()
    
    

    Output:

    Manual/Line Graph Style Error Bars