I am attempting to create a module (or two) to convert a file from dat to csv and back again. The issue I am running into is that the conversion adds a number of quotation marks to each "cell" of data.
I am currently using the following code to do this:
with open(file_dat_new, 'r') as dat_file:
with open(file_csv_new, 'w', newline='') as csv_file:
csv_writer = csv.writer(csv_file)
for row in dat_file:
row = [value.strip() for value in row.split(',')]
csv_writer.writerow(row)
Here is an example of the first line of input:
"TOA5","STA332","CR6","10318","CR6.Std.12.02 CR6-WIFI.05.03","CPU:Sta-332_2022-10-03.cr6","3367","FSDATA"
and the output I am getting:
"""""""TOA5""""""","""""""STA332""""""","""""""CR6""""""","""""""10318""""""","""""""CR6.Std.12.02 CR6-WIFI.05.03""""""","""""""CPU:Sta-332_2022-10-03.cr6""""""","""""""3367""""""","""""""FSDATA"""""""
So my question is this: Why are the extra quotations being added and how can I remove them upon conversions?
When I run your program as-is I get:
"""TOA5""","""STA332""","""CR6""","""10318""","""CR6.Std.12.02 CR6-WIFI.05.03""","""CPU:Sta-332_2022-10-03.cr6""","""3367""","""FSDATA"""
Which looks less extreme than the output you shared:
"""""""TOA5""""""","""""""STA332""""""","""""""CR6""""""","""""""10318""""""","""""""CR6.Std.12.02 CR6-WIFI.05.03""""""","""""""CPU:Sta-332_2022-10-03.cr6""""""","""""""3367""""""","""""""FSDATA"""""""
When I treat the DAT file as CSV:
with open("input.dat", newline="") as f:
reader = csv.reader(f)
rows = list(reader)
with open("output.csv", "w", newline="") as f:
writer = csv.writer(f)
writer.writerows(rows)
then I get:
TOA5,STA332,CR6,10318,CR6.Std.12.02 CR6-WIFI.05.03,CPU:Sta-332_2022-10-03.cr6,3367,FSDATA
Your sample DAT file is a CSV with quoted fields. Usually the outside quotes are there to protect a comma in the field data, or another double quote in the field data. Some programs will write the double quotes even if they aren't needed (like your sample data).
When you tried to parse the DAT file yourself, splitting on the comma, you left the quotes, which got quoted when you passed them to csv.writer.
For me, if the input looks remotely like CSV, I treat it as CSV and use csv.reader.
If I send the output of your program back in as the input, then I get the more extreme quoting you shared:
"""""""TOA5""""""","""""""STA332""""""","""""""CR6""""""","""""""10318""""""","""""""CR6.Std.12.02 CR6-WIFI.05.03""""""","""""""CPU:Sta-332_2022-10-03.cr6""""""","""""""3367""""""","""""""FSDATA"""""""
Quoting turns double-quotes-as-data, like:
['"Foo, Bar"', 'Baz']
into this CSV:
"""Foo, Bar""",Baz
A set of double quotes marks the field as being quoted, then each double-quote-as-data (") becomes "".
So, "TOA5"
becomes """TOA5"""
(1 set of double quotes on the outside, then each of the 2 double-quotes-as-data gets doubled). Run that through again and we get """""""TOA5"""""""
(1 set of double quotes on the outside, then each of the six double-quotes-as-data gets doubled).