pandasfor-loopxlsxwriterordereddictionaryordereddict

How to create a new OrderedDict with repeated values from existing OrderedDict ? [ Excel ]


I would like to obtain the original OrderedDict's value list as an output, but my attempt using pandas.DataFrame.duplicated merely returns a boolean list with the repeated values marked as 'True'.

{'Sheet_1':     ID      Name  Surname  Grade
 0  104  Eleanor     Rigby      6
 1  104  Eleanor     Rigby      6
 2  168  Barbara       Ann      8
 3  450    Polly   Cracker      7
 4   90   Little       Joe     10
 5   90   Little       Joe     10,
 'Sheet_2':     ID       Name   Surname  Grade
 0  106       Lucy       Sky      8
 1  128    Delilah  Gonzalez      5
 2  100  Christina   Rodwell      3
 3  100  Christina   Rodwell      3
 4   40      Ziggy  Stardust      7,
 'Sheet_3':     ID   Name   Surname  Grade
 0   22   Lucy  Diamonds      9
 1   50  Grace     Kelly      7
 2   50  Grace     Kelly      7
 3  105    Uma   Thurman      7
 4  105    Uma   Thurman      7
 5   29   Lola      King      3}
{'Sheet_1':     ID      Name  Surname  Grade  
 1  104  Eleanor     Rigby      6
 5   90   Little       Joe     10,
 'Sheet_2':     ID       Name   Surname  Grade
 3  100  Christina   Rodwell      3,
 'Sheet_3':     ID   Name   Surname  Grade
 2   50  Grace     Kelly      7
 4  105    Uma   Thurman      7}
# Importing modules

import openpyxl as op
import pandas as pd
import numpy as np
import xlsxwriter
from openpyxl import Workbook, load_workbook

# Defining the file path

path_excel_file = r'C:\Users\machukovich\Desktop\stack.xlsx'

# Loading the files into a dictionary of Dataframes

dfs = pd.read_excel(path_excel_file, sheet_name=None, skiprows=2)

# Looping through the different sheets so to

for sheet_name, df in dfs.items():
    duplicated_values_df = df.duplicated(subset='ID', keep='last')
    
### At this point I am obtaining a list of booleans only for one sheet, while I would like the loop to run all of the sheets of the excel file.

# Then, I would create a new excel file with the duplicated_values_df data

Path_new_file = r'C:\Users\machukovich\Desktop\new_file.xlsx'

# Create a Pandas Excel writer using XlsxWriter as the engine.

with pd.ExcelWriter(Path_new_file, engine='xlsxwriter') as writer:
    for sheet_name, df in duplicated_values_df.items():
        df.to_excel(writer, sheet_name=sheet_name, startrow=2, index=False)
        
        

I looked through past responses, yet I struggled to find a clear solution. Looking forward your apreciated help.


Solution

  • Assuming dic your input dictionary, you could combine duplicated and drop_duplicates:

    out = {k: d[d.duplicated()].drop_duplicates()
           for k, d in dic.items() }
    

    Output:

    {'Sheet_1':     ID     Name Surname  Grade
     1  104  Eleanor   Rigby      6
     5   90   Little     Joe     10,
     'Sheet_2':     ID       Name  Surname  Grade
     3  100  Christina  Rodwell      3,
     'Sheet_3':     ID   Name  Surname  Grade
     2   50  Grace    Kelly      7
     4  105    Uma  Thurman      7}