I have a Python script, that basically looks like this:
import mypackage
# this function generates always the same pandas.DataFrame
df = mypackage.create_the_dataframe()
# write the DataFrame to xlsx and csv
df.to_excel("the_dataframe_as.xlsx", index=False, engine="openpyxl")
df.to_csv("the_dataframe_as.csv", index=False)
I was trying to write a test for the create_the_dataframe
function. So I checked the hash of the resulting xlsx and csv files and found that for two different runs of the script, the hash and file size of the resulting xlsx file changes. The hash for the csv remains the same.
Although I can live with this, I am very curious to understand why this is the case?
XLSX files contain metadata like the creation timestamp, which change with every newly written file. Plaintext CSV files do not contain such variable metadata, and thus their contents are entirely predictable.