I am working on a django project that requires updating database with bulk data provided in an excelsheet format. So basically, a user can upload a .csv file if it is in a correct format (by correct format i mean, well-formed file with data i expect). I know how to import a file using django-import-export, but the problem is , i don't know how to perform checks like checking if the .csv file has correct column names and information before updating database. I am new to django, please help.
If you are programmatically importing a file, then if you can load a Dataset object, without any errors being raised, then it is a well-formed csv file. So something like:
import tablib
try:
with open('data.csv', 'r') as fh:
imported_data = tablib.Dataset().load(fh, headers=False)
except Exception as e:
# you can add additional error handling / logging here if you like
print("import fail")
raise e
Before the import process, there is a hook you can use to check for valid headers. So you could do something like the following to check for missing columns:
class YourResource(resources.ModelResource):
fields = ('author', 'email')
def before_import(self, dataset, using_transactions, dry_run, **kwargs):
for field_name in self.fields:
col_name = self.fields[field_name].column_name
if col_name not in dataset.headers:
raise ValueError(f"'{col_name}' field not in data file")
You can use the in-built widgets to supply additional validation at the field level. You can extend these as much as you like to enable additional domain-specific validation. For example, if you only want to allow '1' or 0' as your boolean values, you could implement the following:
class StrictBooleanWidget(widgets.BooleanWidget):
TRUE_VALUES = ["1"]
FALSE_VALUES = ["0"]
NULL_VALUES = [""]
def clean(self, value, row=None, *args, **kwargs):
if value in self.NULL_VALUES:
return None
if value in self.TRUE_VALUES:
return True
if value in self.FALSE_VALUES:
return False
raise ValueError("Invalid boolean: value must be 1 or 0.")
Then refer to this in your resource:
class YourResource(resources.ModelResource):
is_active = fields.Field(
attribute="active",
column_name="active",
default=False,
widget=upload.widgets.StrictBooleanWidget(),
)
You can also use this approach to check for missing or empty values in the data.
django-import-export
really helps a lot with the use-case you describe, but it can be confusing when you are new to it. Read the documentation in depth, and I suggest to download and run the example application (if you haven't already).
The core import logic is fairly straightforward, but it will save a ton of time if you can set breakpoints and step through during dev. That will really help you to understand what is going on.