I'd like to distinguish between None
and empty strings (''
) when going back and forth between Python data structure and csv representation using Python's csv
module.
My issue is that when I run:
import csv, cStringIO
data = [['NULL/None value',None],
['empty string','']]
f = cStringIO.StringIO()
csv.writer(f).writerows(data)
f = cStringIO.StringIO(f.getvalue())
data2 = [e for e in csv.reader(f)]
print "input : ", data
print "output: ", data2
I get the following output:
input : [['NULL/None value', None], ['empty string', '']]
output: [['NULL/None value', ''], ['empty string', '']]
Of course, I could play with data
and data2
to distinguish None
and empty strings with things like:
data = [d if d!=None else 'None' for d in data]
data2 = [d if d!='None' else None for d in data2]
But that would partly defeat my interest of the csv
module (quick deserialization/serialization implemented in C, specially when you are dealing with large lists).
Is there a csv.Dialect
or parameters to csv.writer
and csv.reader
that would enable them to distinguish between ''
and None
in this use-case?
If not, would there be an interest in implementing a patch to csv.writer
to enable this kind of back and forth? (Possibly a Dialect.None_translate_to
parameter defaulting to ''
to ensure backward compatibility.)
This has actually been fixed in Python 3.12 using csv.QUOTE_STRINGS
which you pass in to your csv reader.
l = [
['a', 'b', 'c', 'd', 'e', 'f', 'g'],
['Example', '', None, 42, 3.5, r'\n', ' , ']]
for quoting in [csv.QUOTE_STRINGS, csv.QUOTE_MINIMAL, csv.QUOTE_ALL, csv.QUOTE_NONNUMERIC, csv.QUOTE_NONE]:
with open(filename, 'r') as f:
reader = csv.reader(f, quoting=quoting)
print(f'Reading {filename}')
for row in reader:
print(row)