[SOLVED] CSV reader behavior with None and empty string

CSV reader behavior with None and empty string

I'd like to distinguish between None and empty strings ('') when going back and forth between Python data structure and csv representation using Python's csv module.

My issue is that when I run:

import csv, cStringIO

data = [['NULL/None value',None],
        ['empty string','']]

f = cStringIO.StringIO()
csv.writer(f).writerows(data)

f = cStringIO.StringIO(f.getvalue())
data2 = [e for e in csv.reader(f)]

print "input : ", data
print "output: ", data2

I get the following output:

input :  [['NULL/None value', None], ['empty string', '']]
output:  [['NULL/None value', ''], ['empty string', '']]

Of course, I could play with data and data2 to distinguish None and empty strings with things like:

data = [d if d!=None else 'None' for d in data]
data2 = [d if d!='None' else None for d in data2]

But that would partly defeat my interest of the csv module (quick deserialization/serialization implemented in C, specially when you are dealing with large lists).

Is there a csv.Dialect or parameters to csv.writer and csv.reader that would enable them to distinguish between '' and None in this use-case?

If not, would there be an interest in implementing a patch to csv.writer to enable this kind of back and forth? (Possibly a Dialect.None_translate_to parameter defaulting to '' to ensure backward compatibility.)

Solution

This has actually been fixed in Python 3.12 using csv.QUOTE_STRINGS which you pass in to your csv reader.

l = [
    ['a', 'b', 'c', 'd', 'e', 'f', 'g'], 
    ['Example', '', None, 42, 3.5, r'\n', ' , ']]

for quoting in [csv.QUOTE_STRINGS, csv.QUOTE_MINIMAL, csv.QUOTE_ALL, csv.QUOTE_NONNUMERIC, csv.QUOTE_NONE]:
    with open(filename, 'r') as f: 
        reader = csv.reader(f, quoting=quoting)
        print(f'Reading {filename}')
        for row in reader: 
            print(row)