I'm using the Django module django-chunked-upload to receive potentially large CSV files. I can assume the CSVs are properly formatted, but I can't assume what the delimiter is.
Upon completion of the upload, an UploadedFile object is returned. I need to validate that the correct columns are included in the uploaded CSV and that the data types in each column are correct.
loading the file with csv.reader()
doesn't work:
reader = csv.reader(uploaded_file)
next(reader)
>>> _csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)
This might be because uploaded_file.content_type
and uploaded_file.charset
are both coming through as None
.
I've come up with a fairly inelegant solution to grab the header and iterate over the rows:
i = 0
header = ""
for line in uploaded_file:
if i == 0:
header = line.decode('utf-8')
header_list = list(csv.reader(StringIO(header)))
print(header_list[0])
#validate column names
else:
tiny_csv = StringIO(header + line.decode('utf-8'))
reader = csv.DictReader(tiny_csv)
print(next(reader))
#validate column types
I also considered trying to load the path of the actual saved file:
path = #figure out the path of the temp file
f = open(path,"r")
reader = csv.reader(f)
But I wasn't able to get the temp file path from the UploadedFile object.
Ideally I would like to create a normal reader or DictReader out of the UploadedFile object, but it seems to be eluding me. Anyone have any ideas? - Thanks
The answer lies in chunked_upload/models.py which has the line:
def get_uploaded_file(self):
self.file.close()
self.file.open(mode='rb') # mode = read+binary
return UploadedFile(file=self.file, name=self.filename,
size=self.offset)
So when you create your file model you can choose to open the file with mode='r'
instead:
#myapp/models.py
from django.db import models
from chunked_upload.models import ChunkedUpload
from django.core.files.uploadedfile import UploadedFile
class FileUpload(ChunkedUpload):
def get_uploaded_file(self):
self.file.close()
self.file.open(mode='r') # mode = read+binary
return UploadedFile(file=self.file, name=self.filename,
size=self.offset)
This allows you to take the returned UploadedFile instance and parse it as a csv:
def on_completion(self, uploaded_file, request):
reader = csv.reader(uploaded_file)
...