I'm writing data to a CSV file in Python. The data is in Unicode-8. I've written several rows successfully. However, a part of the data from column B is written to column A. The data already includes commas and numbers in English along with Arabic text.
I also checked in NotePad that the text is between double quotations, and it seems OK, but in MS Office and LibreOffice preview, it doesn't look OK. When it's open after the preview in LibreOffice, it looks fine. Here are my trials:
with open(df_path, "w", newline="", encoding="utf-8") as csv_file:
writer = csv.DictWriter(csv_file, fieldnames=["File Name", "Content"], quoting=csv.QUOTE_ALL)
writer.writeheader()
writer.writerow({"File Name": file, "Content": txt})
with open(df_path, "w", newline="", encoding="utf-8") as csv_file:
writer = csv.writer(csv_file)
writer.writerow(["File Name", "Content"])
writer.writerow([file, '"' + txt + '"'])
with open(df_path, "w", newline="", encoding="utf-8") as csv_file:
writer = csv.DictWriter(csv_file, fieldnames=["File Name", "Content"])
writer.writeheader()
writer.writerow({"File Name": file, "Content": txt})
with open(df_path, "w", newline="", encoding="utf-8") as csv_file:
writer = csv.DictWriter(csv_file, fieldnames=["File Name", "Content"], quoting=csv.QUOTE_ALL)
writer.writeheader()
writer.writerow({"File Name": file, "Content": txt})
with open(df_path, "w", newline="", encoding="utf-8") as csv_file:
writer = csv.DictWriter(csv_file, fieldnames=["File Name", "Content"], delimiter=",")
writer.writeheader()
writer.writerow({"File Name": file, "Content": txt})
What should I try to keep the data as is in column B? I can't use Pandas because I want to keep the file open instead of writing all data at once. Thanks in advance!
Edit: I cannot share the text because when I copy and paste the text, the issue disappears.
What I concluded until now is that when I add certain data to the file and check in the terminal, It shows "Unicode text, UTF-8 text, with CRLF, LF line terminators" instead of "CSV text". The first code snippet is what I use and it's what produced both.
So, unlike LibreOffice Calc, NotePad, and Google Sheets, the text was being split and appearing in the wrong columns in MS Excel only. This is even though the delimiters are the same.
The reason is that MS Excel splits text that exceeds 32767 even if it's a .csv not .xlsx file. This doesn't apply to other software. This is also why all code changes didn't solve the issue. Also, using dialect = 'excel'
doesn't solve the problem. The only solution is to split the text in the code if it exceeds the limit.