pythonmatplotlibplotcdf

How to plot a CDF and CCDF with lists of two variables


[Plot a CCDF graph based on updated values w.r.t years] shown in the picture.

The dataset looks like this:

Year    Updated values
(2000 - 1)
(2001 - 159)
(2002 - 140)
(2003 - 160)
(2004 - 300)
(2005 - 378)
(2006 - 419)
(2007 - 401)
(2008 - 509)
(2009 - 610)
(2010 - 789)
(2011 - 856)
(2012 - 720)
(2013 - 860)
(2014 - 901)
(2015 - 1150)
(2016 - 1130)
(2017 - 1387)
(2018 - 1578)
(2019 - 2480)
(2020 - 3120)
(2021 - 5690)

I have seen a lot of answers but couldn't find much about plotting a CCDF graph using two variables. I want to calculate the CCDF of update frequencies based on the year and want to show the year labels at the x-axis in form of a plot. Thank you


Solution

  • You can calculate a cdf as the cumulative sums of the data, and then divide to normalize the values between 0 and 1. The ccdf would be just 1 - cdf. You could display them e.g. as a curve, or as a bar plot:

    import matplotlib.pyplot as plt
    import numpy as np
    
    years = np.arange(2000, 2022)
    values = np.array(
        [1, 159, 140, 160, 300, 378, 419, 401, 509, 610, 789, 856, 720, 860, 901, 1150, 1130, 1387, 1578, 2480, 3120, 5690])
    cdf = values.cumsum() / values.sum()
    ccdf = 1 - cdf
    fig, (ax1, ax2) = plt.subplots(ncols=2, figsize=(12, 4))
    ax1.plot(years, cdf, label='cdf')
    ax1.plot(years, ccdf, label='ccdf')
    ax1.legend()
    
    ax2.bar(years, cdf, label='cdf')
    ax2.bar(years, ccdf, bottom=cdf, label='ccdf')
    ax2.margins(x=0.01)
    ax2.set_xticks(years)
    ax2.set_xticklabels([f'{y % 100:02d}' for y in years])
    ax2.legend()
    
    plt.tight_layout()
    plt.show()
    

    cdf and ccdf from data