I have a piece of code that writes a csv response to a StringIO()
object. I return this object through and endpoint and also use this output to generate a csv file that will be stored on the server.
@app.post('/csv-data')
def get_csv(data: DataRequest):
if not data.headers:
raise HTTPException(status_code=400, detail="Headers cannot be empty")
if data.number_of_records <= 0:
raise HTTPException(status_code=400, detail="Number of requested records must be more than 0")
timestamp = datetime.now().strftime("%Y%m%d%H%M%S")
file_name = f"data_{timestamp}.csv"
file_path = os.path.join(STORAGE_DIR, file_name)
headers = [
{"name": header.name, "description": header.description, "sample_data": header.sample_data}
for header in data.headers
]
response = open_ai_interface.generate_data(headers, data.number_of_records)
output = StringIO()
csvwriter = csv.writer(output, quoting=csv.QUOTE_MINIMAL)
for row in response:
csvwriter.writerow(row.split(','))
output.seek(0)
with open(file_path, 'w') as f:
f.write(output.getvalue())
output.seek(0)
return StreamingResponse(output, media_type="text/csv", headers={"Content-Disposition": f"attachment; filename={file_name}"})
The issue I'm facing is that the generated .csv file contains an extra line between each record. When I send a request through the browser or Postman I get the response with now new lines between records. This leads me to assume that there could be a better way to write the csv data to a file. I would appreciate suggestions
The TextIOWrapper
-like object passed to csv.writer
is documented to require including newline=''
as a parameter to prevent translations of . Unfortunately, while io.StringIO
has the newline
parameter, it doesn't support it in the same way as TextIOWrapper
does:
StringIO:
The newline argument works like that of
TextIOWrapper
, except that when writing output to the stream, if newline isNone
, newlines are written as\n
on all platforms.
Without newline=None
, csv.writer
directly writes its default "excel" dialect (newlines are \r\n
) to the stream. Later, when writing the stream to a file, opening the file in text mode (mode='r'
) on a Windows system translates \n
to \r\n
, which results in \r\n
becoming \r\r\n
. That's the extra blank line, although it depends on the viewer if you see it. Viewing in Excel will treat the extra \r
as another newline, but not all text viewers will do that.
TL;DR: To fix, use:
output = StringIO(newline=None)
One caveat: If you have columns with mult-line data, e.g. writerow(['abc\ndef', 'def\nghi'])
and you want to preserve the type of line ending in the data, but use the "excel" \r\n
line terminator for rows, then use:
output = StringIO()
But write to the output file in untranslated mode:
with open(file_path, 'w', newline='') as f:
Full example:
import io
import csv
out = io.StringIO()
writer = csv.writer(out) # default "excel" dialect with `\r\n` line termination.
for _ in range(3):
writer.writerow(['___\n___']) # want to preserve data exactly.
print(repr(out.getvalue()))
with open('out.txt', 'w', newline='') as f: # no newline translation
f.write(out.getvalue())
with open('out.txt', 'rb') as f: # view bytes written
print(f.read())
Output:
'"___\n___"\r\n"___\n___"\r\n"___\n___"\r\n'
b'"___\n___"\r\n"___\n___"\r\n"___\n___"\r\n'