I have opened an excel file consisting of three sheets as an OrderedDict
.
dataframe
to repeat three times.numpy
?pandas
?My original ordered dictionary has the following shape:
{'Sheet_1': ID Name Surname Grade
0 104 Eleanor Rigby 6
1 168 Barbara Ann 8
2 450 Polly Cracker 7
3 90 Little Joe 10,
'Sheet_2': ID Name Surname Grade
0 106 Lucy Sky 8
1 128 Delilah Gonzalez 5
2 100 Christina Rodwell 3
3 40 Ziggy Stardust 7,
'Sheet_3': ID Name Surname Grade
0 22 Lucy Diamonds 9
1 50 Grace Kelly 7
2 105 Uma Thurman 7
3 29 Lola King 3}
My desired ordered dictionary has the following shape:
{'Sheet_1': ID Name Surname Grade
0 104 Eleanor Rigby 6
1 104 Eleanor Rigby 6
2 104 Eleanor Rigby 6
3 168 Barbara Ann 8
4 168 Barbara Ann 8
5 168 Barbara Ann 8
6 450 Polly Cracker 7
7 450 Polly Cracker 7
8 450 Polly Cracker 7
9 90 Little Joe 10
10 90 Little Joe 10
11 90 Little Joe 10 ,
'Sheet_2': ID Name Surname Grade \
0 106 Lucy Sky 8
1 106 Lucy Sky 8
2 106 Lucy Sky 8
3 128 Delilah Gonzalez 5
4 128 Delilah Gonzalez 5
5 128 Delilah Gonzalez 5
6 100 Christina Rodwell 3
7 100 Christina Rodwell 3
8 100 Christina Rodwell 3
9 40 Ziggy Stardust 7
10 40 Ziggy Stardust 7
11 40 Ziggy Stardust 7 ,
'Sheet_3': ID Name Surname Grade
0 22 Lucy Diamonds 9
1 22 Lucy Diamonds 9
2 22 Lucy Diamonds 9
3 50 Grace Kelly 7
4 50 Grace Kelly 7
5 50 Grace Kelly 7
6 105 Uma Thurman 7
7 105 Uma Thurman 7
8 105 Uma Thurman 7
9 29 Lola King 3
10 29 Lola King 3
11 29 Lola King 3 }
The code I have tried with so far:
# Importing modules
import openpyxl as op
import pandas as pd
import numpy as np
import xlsxwriter
from openpyxl import Workbook, load_workbook
# Defining the two file paths
path_excel_file = r'C:\Users\machukovich\Desktop\stack.xlsx'
# Loading the files into a dictionary of Dataframes
dfs = pd.read_excel(path_excel_file, sheet_name=None, skiprows=2)
# Trying to repeat each row in every dataframe three times
for sheet_name, df in dfs.items():
df = pd.DataFrame(np.repeat(df.values, 3, axis=0), columns = df.columns))
# Adding up the list as a new column (opinion) in each sheet.
mylist = ['good song','average song', 'bad song']
for sheet_name, df in dfs.items():
df = dfs['opinion'] = np.resize(mylist, len(dfs))
# Creating a new column for the concatenation
for sheet_name, df in dfs.items():
df = dfs.insert(5, 'concatenation', dfs['Name'].map(str) + dfs['Surname'].map(str) + dfs['opinion'].map(str))
# We try to create a new excel file with the manipulated data
Path_new_file = r'C:\Users\machukovich\Desktop\new_file.xlsx'
# Create a Pandas Excel writer using XlsxWriter as the engine.
with pd.ExcelWriter(Path_new_file, engine='xlsxwriter') as writer:
for sheet_name, df in dfs.items():
df.to_excel(writer, sheet_name=sheet_name, startrow=2, index=False)
# I am not obtaining my desired output but an excel file on which each sheet is equal to one single column of one sheet out of my three excel sheets.
Edit: I am not obtaining my desired output, I believe there must be something wrong with the line of code where I am repeating each row three times. Any help is appreciated.
Numpy Solution
You appear to be using np.repeat
correctly in your solution. The problem is that
for sheet_name, df in dfs.items():
df = pd.DataFrame(np.repeat(df.values, 3, axis=0), columns = df.columns)
overwriting df
inside the loop doesn't modify dfs
, because dfs.items()
creates a "view" of dfs
to iterate through. The solution is to set the values of dfs
directly:
for sheet_name, df in dfs.items():
dfs[sheet_name] = pd.DataFrame(np.repeat(df.values, 3, axis=0), columns = df.columns)
Edit: also it looks like you have an extra )
at the end of that line inside the loop
Pandas Solution
You can do this with pandas using pd.concat
, giving it a list of copies of the dataframe:
dfs[sheet_name] = pd.concat([df, df, df])
or
dfs[sheet_name] = pd.concat([df for _ in range(3)])
If you try either of these you'll notice that the index values are also duplicated (numpy doesn't track those like pandas does), and the rows aren't in your desired order because we literally just connected copies of the dataframe end-to-end. We can solve this with a classic pandas method chain, where we sort, and then reset the index:
dfs[sheet_name] = pd.concat([df for _ in range(3)]).sort_index().reset_index(drop = True)