I am generating different Excel files with a loop in Knime and at every iteration I see something strange. As you can see in the picture below, I am generating the output and then with a python script I am making some adjustment (and then save it again with python as a new file).
The final files are quite heavy, like 40-50 MB, and it takes a while to open them with Excel. However, after I open one of these files and save (autosave is on), it gets reduced to about 2-3 MB. I thought that it was due to Knime which was duplicating rows, but actually I have the right data inside the final output.
Do you have any clues why this is happening? I'd like to understand this thing as the first time I open each file it takes a while.
UPDATE
I have double checked it and this part of the script is drastically increasing the .xlsx size (in particular, the loop). I do not see why. It seems like it is scanning all the rows in the file, also the empty ones at the bottom of the table.
len2 = sheet.max_row
from openpyxl.worksheet.datavalidation import DataValidation
#Create new sheet for Drop-down list
mapping= wb2.create_sheet('mapping')
mapping.cell(row=1, column=1).value = 'EAN'
mapping.cell(row=2, column=1).value = 'RBU Comparable SKU'
mapping.cell(row=1, column=3).value = 'DNP'
mapping.cell(row=2, column=3).value = 'RAP'
data_val_met = DataValidation(type="list", formula1='=mapping!$C$1:$C$2')
data_val_map = DataValidation(type="list", formula1='=mapping!$A$1:$A$2')
for i in range (3, len2+1):
sheet.add_data_validation(data_val_map)
data_val_map.add(sheet.cell(row=i, column=41))
sheet.add_data_validation(data_val_met)
data_val_met.add(sheet.cell(row=i, column=42))
Seems to be related to what the Python Script is doing. To verify this, you can look at the files generated by the Excel Writer and then compare it with the 40-50 MB files your Python script is generating.
If this is verified to be true, then the next thing to do is work on what that script is doing.