python excel dataframe openpyxl xlsxwriter

Loop for filtering NaN's in dictionaries

I have opened my dictionary and I would like to write a loop so to obtain as an output only those rows for which the favorite color column equals to NaN.

My code so far:

# Importing modules
import openpyxl as op
import pandas as pd
import numpy as np
import xlsxwriter
from openpyxl import Workbook, load_workbook

# Defining the file path
my_file_path = r'C:\Users\machukovich\Desktop\stack.xlsx'
# Loading the file into a dictionary of Dataframes
my_dict = pd.read_excel(my_file_path, sheet_name=None, skiprows=2)

My_dict output:

my_dict = {'Sheet_1':         Name   Surname      Concatenation    ID_  Grade_ favourite color   
 1    Delilah  Gonzalez   Delilah Gonzalez    NaN     NaN             NaN   
 2  Christina   Rodwell  Christina Rodwell  100.0     3.0           black   
 3      Ziggy  Stardust     Ziggy Stardust   40.0     7.0             red    ,
 'Sheet_2':     Name   Surname  Concatenation    ID_  Grade_ favourite color  \
 0   Lucy  Diamonds  Lucy Diamonds   22.0     9.0           brown   
 1  Grace     Kelly    Grace Kelly   50.0     7.0           white   
 2    Uma   Thurman    Uma Thurman  105.0     7.0          purple   
 3   Lola      King      Lola King    NaN     NaN             NaN     ,
 'Sheet_3':        Name  Surname   Concatenation    ID_  Grade_ favourite color  \
 0  Eleanor     Rigby  Eleanor  Rigby  104.0     6.0            blue   
 1  Barbara       Ann    Barbara  Ann  168.0     8.0            pink   
 2    Polly   Cracker  Polly  Cracker  450.0     7.0           black   
 3   Little       Joe     Little  Joe    NaN     NaN             NaN    }

My desired output:

my_dict = {'Sheet_1':         Name   Surname      Concatenation    ID_  Grade_ favourite color  
 1    Delilah  Gonzalez   Delilah Gonzalez    NaN     NaN             NaN   
 'Sheet_2':     Name   Surname  Concatenation    ID_  Grade_ favourite color  \ 
 3   Lola      King      Lola King    NaN     NaN             NaN   
  'Sheet_3':        Name  Surname   Concatenation    ID_  Grade_ favourite color  \
 3   Little       Joe     Little  Joe    NaN     NaN             NaN

And, finally I would like to write the desired output to a new excel file (in separate sheets). Please, enlighten me. I am new to python.

Solution

I would do it this way :

with pd.ExcelWriter("output.xlsx", engine="xlsxwriter") as writer:
    for sn, df in my_dict.items():
        (df.loc[df["favourite color"].isnull()] # we use boolean indexing
             .to_excel(writer, sheet_name=sn, index=False)) # with startrow, starcol ?
    #this is optional    
    for ws in writer.sheets:
        writer.sheets[ws].autofit() # xlsxwriter 3.0.6+

Output (only Sheet_1):

Update :

If you want to update my_dict first, you can use this :

for sn, df in my_dict.items():
    my_dict[sn] = df.loc[df["favourite color"].isnull()]

Output :

print(my_dict)

{'Sheet_1':       Name   Surname     Concatenation  ID_  Grade_  favourite color
 0  Delilah  Gonzalez  Delilah Gonzalez  NaN     NaN              NaN,
 'Sheet_2':    Name Surname Concatenation  ID_  Grade_  favourite color
 0  Lola    King     Lola King  NaN     NaN              NaN,
 'Sheet_3':      Name Surname Concatenation  ID_  Grade_  favourite color
 0  Little     Joe    Little Joe  NaN     NaN              NaN}

Then (if needed) you can loop through each filtered df to store it in a spreadsheet :

with pd.ExcelWriter("output.xlsx", engine="xlsxwriter") as writer:
    for sn, df in my_dict.items():
        df.to_excel(writer, sheet_name=sn, index=False)