pandasfor-loopif-statementopenpyxlordereddictionary

How to Iterate Over a Dictionary of DataFrames Based on String Values


I have an Excel file with three sheets, each sharing the same structure but containing different data.

My original OrderedDict:

{'Sheet_1':     ID      Name  Surname  Grade favourite color    favourite sport  Dummy
 0  104  Eleanor     Rigby      6            blue  American football    NaN
 1  168  Barbara       Ann      8            pink             Hockey    NaN
 2  450    Polly   Cracker      7           black      Skateboarding    NaN
 3   90   Little      Josy     10          orange            Cycling    NaN,
 'Sheet_2':     ID       Name   Surname  Grade favourite color favourite sport  Dummy
 0  106       Lucy       Sky      8          yellow          Tennis    NaN
 1  128    Delilah     Perez      5     light green      Basketball    NaN
 2  100  Christina   Rodwell      3           black       Badminton    NaN
 3   40      Ziggy  Stardust      7             red          Squash    NaN,
 'Sheet_3':     ID   Name   Surname  Grade favourite color favourite sport  Dummy
 0   22   Lucy  Diamonds      9           brown            Judo    NaN
 1   50  Grace     Kelly      7           white       Taekwondo    NaN
 2  105    Uma   Thurman      7          purple      videogames    NaN
 3   29   Lola   McQueen      3             red            Surf    NaN}

My piece of code (not writing 'failure' for 'Miquel Angelo'):

# Importing modules
import openpyxl as op
import pandas as pd
import numpy as np
import xlsxwriter
import openpyxl
from openpyxl import Workbook, load_workbook


# Defining the file path
file_path = r'C:/Users/machukovich/Desktop/stack_2.xlsx'

# Load workbook as openpyxl
reference_workbook = load_workbook(file_path)

# We will mantain the workbook open
wb = reference_workbook.active

# Getting the sheetnames as a list using the sheetnames attribute

sheet_names = reference_workbook.sheetnames
print(sheet_names)

names_list = []
for sheet in reference_workbook.worksheets:
    student_name = sheet['B2'].value
    names_list.append(student_name)
    print(f"Student Name: {student_name}")
    
    
print(f"List of Names: {names_list}")

# Loading the file into a dictionary of Dataframes
dict_of_df = pd.read_excel(file_path, sheet_name=None, skiprows=2)

# Writing the loop itself (it fills all the 'Dummy' columns with zeros in all sheets):
for sheet_name, df in dict_of_df.items():
    if student_name =='Roberto' or  student_name =='Leonardo':
        df['Dummy'] = df['Dummy'].fillna('success')
    else:
        df['Dummy'] = df['Dummy'].fillna('failure')

Snapshots:

enter image description here enter image description here enter image description here


Solution

  • I have erased the part in which the list for B2 values was generated. For more clarity. Here is the solution I was looking for:

    # Importing modules
    import pandas as pd
    from openpyxl import load_workbook
    
    # Defining the file path
    file_path = r'C:/Users/machukovich/Desktop/stack_2.xlsx'
    
    # Load workbook as openpyxl
    reference_workbook = load_workbook(file_path)
    
    # Loading the file into a dictionary of DataFrames
    dict_of_df = pd.read_excel(file_path, sheet_name=None, skiprows=2)
    
    # Iterating through each sheet and filling the 'Dummy' column based on student names
    for sheet_name, df in dict_of_df.items():
        student_name = reference_workbook[sheet_name]['B2'].value
        df['Dummy'] = 'success' if student_name in ['Roberto', 'Leonardo'] else 'failure'
    

    Thank you all who have tried to help me.