pythondjangodjango-import-export

Checking if the .csv file is formatted correctly before importing to avoid embedding wrong data into database


I am working on a django project that requires updating database with bulk data provided in an excelsheet format. So basically, a user can upload a .csv file if it is in a correct format (by correct format i mean, well-formed file with data i expect). I know how to import a file using django-import-export, but the problem is , i don't know how to perform checks like checking if the .csv file has correct column names and information before updating database. I am new to django, please help.


Solution

  • Checking for a well formed file

    If you are programmatically importing a file, then if you can load a Dataset object, without any errors being raised, then it is a well-formed csv file. So something like:

    import tablib
    try:
      with open('data.csv', 'r') as fh:
        imported_data = tablib.Dataset().load(fh, headers=False)
    except Exception as e:
      # you can add additional error handling / logging here if you like
      print("import fail")
      raise e
    

    Checking for correct headers

    Before the import process, there is a hook you can use to check for valid headers. So you could do something like the following to check for missing columns:

    class YourResource(resources.ModelResource):
        fields = ('author', 'email')
    
        def before_import(self, dataset, using_transactions, dry_run, **kwargs):
            for field_name in self.fields:
                col_name = self.fields[field_name].column_name
                if col_name not in dataset.headers:
                    raise ValueError(f"'{col_name}' field not in data file")
    

    Data Validation

    You can use the in-built widgets to supply additional validation at the field level. You can extend these as much as you like to enable additional domain-specific validation. For example, if you only want to allow '1' or 0' as your boolean values, you could implement the following:

    class StrictBooleanWidget(widgets.BooleanWidget):
        TRUE_VALUES = ["1"]
        FALSE_VALUES = ["0"]
        NULL_VALUES = [""]
    
        def clean(self, value, row=None, *args, **kwargs):
            if value in self.NULL_VALUES:
                return None
            if value in self.TRUE_VALUES:
                return True
            if value in self.FALSE_VALUES:
                return False
            raise ValueError("Invalid boolean: value must be 1 or 0.")
    

    Then refer to this in your resource:

    class YourResource(resources.ModelResource):
        is_active = fields.Field(
            attribute="active",
            column_name="active",
            default=False,
            widget=upload.widgets.StrictBooleanWidget(),
        )
    

    You can also use this approach to check for missing or empty values in the data.

    django-import-export really helps a lot with the use-case you describe, but it can be confusing when you are new to it. Read the documentation in depth, and I suggest to download and run the example application (if you haven't already).

    The core import logic is fairly straightforward, but it will save a ton of time if you can set breakpoints and step through during dev. That will really help you to understand what is going on.