pythonpandasmatplotlibpdfpages

Highlight dataframe having NaN (matplotlib) while writing to the PDF file(PdfPages)?


I'm trying to perform two things:

  1. Highlight 'NaN' values with red color for the dataframe.
  2. Add the dataframe to the PDF file.

I'm able to display the dataframe successfully in the PDF pages, however NaN values are not reflected with the red color inside the PDF.

I have tried following code:

    df.style.highlight_null('red') 

    with PdfPages('stale_curve_report.pdf') as pdf:
      fig, ax = plt.subplots()
      ax.axis('off')
      ax.table(cellText=df.values, colLabels=df.columns, rowLabels=df.index, loc='center',colWidths=[0.12] * 15)
      pdf.savefig(fig)
      plt.close(fig)

I have tried few other stuffs using seaborn also:

sns.heatmap(df.isna(), cmap=['red', 'white', 'white'])

I think, I need an option inside the ax.table to highlight the dataframe.


Solution

  • This can be done by creating a list of colors for cellColors in the ax.table function. To do this, we create a logical dataframe color = df.isna() , replace the received True and False with the colors we need, and convert it to a list. Example:

    import pandas as pd
    import numpy as np
    import matplotlib.pyplot as plt
    from matplotlib.backends.backend_pdf import PdfPages
    
    df = pd.DataFrame(np.random.random((10, 3)), columns=("col 1", "col 2", "col 3"))
    df.at[1, 'col 2'] = np.NaN
    df.at[8, 'col 1'] = np.NaN
    df.loc[2:4, ['col 3']] = np.NaN
    
    color = df.isna()
    color.replace({True: 'red', False: 'white'}, inplace=True)
    list_color = color.values.tolist()
    
    fig, ax = plt.subplots(figsize=(12, 4))
    ax.axis('tight')
    ax.axis('off')
    
    the_table = ax.table(cellText=df.values, colLabels=df.columns, loc='center', cellColours=list_color)
    
    pp = PdfPages("foo.pdf")
    pp.savefig(fig)
    pp.close()
    

    enter image description here