pandasdataframenumpyfor-loopordereddictionary

How to write a loop for repeating the rows in df's of an ordereddict?


I have opened an excel file consisting of three sheets as an OrderedDict.

  1. I would like all of the rows of my original dataframe to repeat three times.
  2. I would like to use numpy ?
  3. Could you present another solution using pandas ?

My original ordered dictionary has the following shape:

    {'Sheet_1':     ID      Name  Surname  Grade
     0  104  Eleanor     Rigby      6
     1  168  Barbara       Ann      8
     2  450    Polly   Cracker      7
     3   90   Little       Joe     10,
     'Sheet_2':     ID       Name   Surname  Grade
     0  106       Lucy       Sky      8
     1  128    Delilah  Gonzalez      5
     2  100  Christina   Rodwell      3
     3   40      Ziggy  Stardust      7,
     'Sheet_3':     ID   Name   Surname  Grade
     0   22   Lucy  Diamonds      9
     1   50  Grace     Kelly      7
     2  105    Uma   Thurman      7
     3   29   Lola      King      3}

My desired ordered dictionary has the following shape:

{'Sheet_1':      ID      Name  Surname  Grade
 0   104  Eleanor     Rigby      6          
 1   104  Eleanor     Rigby      6    
 2   104  Eleanor     Rigby      6            
 3   168  Barbara       Ann      8            
 4   168  Barbara       Ann      8      
 5   168  Barbara       Ann      8              
 6   450    Polly   Cracker      7          
 7   450    Polly   Cracker      7    
 8   450    Polly   Cracker      7            
 9    90   Little       Joe     10             
 10   90   Little       Joe     10       
 11   90   Little       Joe     10              ,
 'Sheet_2':      ID       Name   Surname  Grade        \
 0   106       Lucy       Sky      8      
 1   106       Lucy       Sky      8    
 2   106       Lucy       Sky      8       
 3   128    Delilah  Gonzalez      5       
 4   128    Delilah  Gonzalez      5    
 5   128    Delilah  Gonzalez      5        
 6   100  Christina   Rodwell      3      
 7   100  Christina   Rodwell      3    
 8   100  Christina   Rodwell      3        
 9    40      Ziggy  Stardust      7       
 10   40      Ziggy  Stardust      7    
 11   40      Ziggy  Stardust      7         ,
 'Sheet_3':      ID   Name   Surname  Grade                 
 0    22   Lucy  Diamonds      9     
 1    22   Lucy  Diamonds      9  
 2    22   Lucy  Diamonds      9      
 3    50  Grace     Kelly      7     
 4    50  Grace     Kelly      7  
 5    50  Grace     Kelly      7      
 6   105    Uma   Thurman      7     
 7   105    Uma   Thurman      7  
 8   105    Uma   Thurman      7      
 9    29   Lola      King      3    
 10   29   Lola      King      3  
 11   29   Lola      King      3      }

The code I have tried with so far:

# Importing modules
import openpyxl as op
import pandas as pd
import numpy as np
import xlsxwriter
from openpyxl import Workbook, load_workbook

# Defining the two file paths
path_excel_file = r'C:\Users\machukovich\Desktop\stack.xlsx'

# Loading the files into a dictionary of Dataframes
dfs = pd.read_excel(path_excel_file, sheet_name=None, skiprows=2)

# Trying to repeat each row in every dataframe three times
for sheet_name, df in dfs.items():
    df = pd.DataFrame(np.repeat(df.values, 3, axis=0), columns = df.columns))
    
# Adding up the list as a new column (opinion) in each sheet.
mylist = ['good song','average song', 'bad song']
for sheet_name, df in dfs.items():
    df = dfs['opinion'] = np.resize(mylist, len(dfs))
    
# Creating a new column for the concatenation
for sheet_name, df in dfs.items():
    df = dfs.insert(5, 'concatenation', dfs['Name'].map(str)  + dfs['Surname'].map(str) + dfs['opinion'].map(str))
    
# We try to create a new excel file with the manipulated data

Path_new_file = r'C:\Users\machukovich\Desktop\new_file.xlsx'

# Create a Pandas Excel writer using XlsxWriter as the engine.
with pd.ExcelWriter(Path_new_file, engine='xlsxwriter') as writer:
    for sheet_name, df in dfs.items():
        df.to_excel(writer, sheet_name=sheet_name, startrow=2, index=False)
        
        
# I am not obtaining my desired output but an excel file on which each sheet is equal to one single column of one sheet out of my three excel sheets.

Edit: I am not obtaining my desired output, I believe there must be something wrong with the line of code where I am repeating each row three times. Any help is appreciated.


Solution

  • Numpy Solution

    You appear to be using np.repeat correctly in your solution. The problem is that

    for sheet_name, df in dfs.items():
        df = pd.DataFrame(np.repeat(df.values, 3, axis=0), columns = df.columns)
    

    overwriting df inside the loop doesn't modify dfs, because dfs.items() creates a "view" of dfs to iterate through. The solution is to set the values of dfs directly:

    for sheet_name, df in dfs.items():
        dfs[sheet_name] = pd.DataFrame(np.repeat(df.values, 3, axis=0), columns = df.columns)
    

    Edit: also it looks like you have an extra ) at the end of that line inside the loop

    Pandas Solution

    You can do this with pandas using pd.concat, giving it a list of copies of the dataframe:

    dfs[sheet_name] = pd.concat([df, df, df])
    

    or

    dfs[sheet_name] = pd.concat([df for _ in range(3)])
    

    If you try either of these you'll notice that the index values are also duplicated (numpy doesn't track those like pandas does), and the rows aren't in your desired order because we literally just connected copies of the dataframe end-to-end. We can solve this with a classic pandas method chain, where we sort, and then reset the index:

    dfs[sheet_name] = pd.concat([df for _ in range(3)]).sort_index().reset_index(drop = True)