pythonexceldataframeopenpyxlxlsxwriter

Loop for filtering NaN's in dictionaries


I have opened my dictionary and I would like to write a loop so to obtain as an output only those rows for which the favorite color column equals to NaN.

My code so far:

# Importing modules
import openpyxl as op
import pandas as pd
import numpy as np
import xlsxwriter
from openpyxl import Workbook, load_workbook

# Defining the file path
my_file_path = r'C:\Users\machukovich\Desktop\stack.xlsx'
# Loading the file into a dictionary of Dataframes
my_dict = pd.read_excel(my_file_path, sheet_name=None, skiprows=2)

My_dict output:

my_dict = {'Sheet_1':         Name   Surname      Concatenation    ID_  Grade_ favourite color   
 1    Delilah  Gonzalez   Delilah Gonzalez    NaN     NaN             NaN   
 2  Christina   Rodwell  Christina Rodwell  100.0     3.0           black   
 3      Ziggy  Stardust     Ziggy Stardust   40.0     7.0             red    ,
 'Sheet_2':     Name   Surname  Concatenation    ID_  Grade_ favourite color  \
 0   Lucy  Diamonds  Lucy Diamonds   22.0     9.0           brown   
 1  Grace     Kelly    Grace Kelly   50.0     7.0           white   
 2    Uma   Thurman    Uma Thurman  105.0     7.0          purple   
 3   Lola      King      Lola King    NaN     NaN             NaN     ,
 'Sheet_3':        Name  Surname   Concatenation    ID_  Grade_ favourite color  \
 0  Eleanor     Rigby  Eleanor  Rigby  104.0     6.0            blue   
 1  Barbara       Ann    Barbara  Ann  168.0     8.0            pink   
 2    Polly   Cracker  Polly  Cracker  450.0     7.0           black   
 3   Little       Joe     Little  Joe    NaN     NaN             NaN    }

My desired output:

my_dict = {'Sheet_1':         Name   Surname      Concatenation    ID_  Grade_ favourite color  
 1    Delilah  Gonzalez   Delilah Gonzalez    NaN     NaN             NaN   
 'Sheet_2':     Name   Surname  Concatenation    ID_  Grade_ favourite color  \ 
 3   Lola      King      Lola King    NaN     NaN             NaN   
  'Sheet_3':        Name  Surname   Concatenation    ID_  Grade_ favourite color  \
 3   Little       Joe     Little  Joe    NaN     NaN             NaN   

And, finally I would like to write the desired output to a new excel file (in separate sheets). Please, enlighten me. I am new to python.


Solution

  • I would do it this way :

    with pd.ExcelWriter("output.xlsx", engine="xlsxwriter") as writer:
        for sn, df in my_dict.items():
            (df.loc[df["favourite color"].isnull()] # we use boolean indexing
                 .to_excel(writer, sheet_name=sn, index=False)) # with startrow, starcol ?
        #this is optional    
        for ws in writer.sheets:
            writer.sheets[ws].autofit() # xlsxwriter 3.0.6+
    

    Output (only Sheet_1):

    enter image description here

    Update :

    If you want to update my_dict first, you can use this :

    for sn, df in my_dict.items():
        my_dict[sn] = df.loc[df["favourite color"].isnull()]
    

    Output :

    print(my_dict)
    
    {'Sheet_1':       Name   Surname     Concatenation  ID_  Grade_  favourite color
     0  Delilah  Gonzalez  Delilah Gonzalez  NaN     NaN              NaN,
     'Sheet_2':    Name Surname Concatenation  ID_  Grade_  favourite color
     0  Lola    King     Lola King  NaN     NaN              NaN,
     'Sheet_3':      Name Surname Concatenation  ID_  Grade_  favourite color
     0  Little     Joe    Little Joe  NaN     NaN              NaN}
    

    Then (if needed) you can loop through each filtered df to store it in a spreadsheet :

    with pd.ExcelWriter("output.xlsx", engine="xlsxwriter") as writer:
        for sn, df in my_dict.items():
            df.to_excel(writer, sheet_name=sn, index=False)