djangopostgresqlbulk-load

Django bulk_create with ignore rows that cause IntegrityError?


I am using bulk_create to loads thousands or rows into a postgresql DB. Unfortunately some of the rows are causing IntegrityError and stoping the bulk_create process. I was wondering if there was a way to tell django to ignore such rows and save as much of the batch as possible?


Solution

  • This is now possible on Django 2.2

    Django 2.2 adds a new ignore_conflicts option to the bulk_create method, from the documentation:

    On databases that support it (all except PostgreSQL < 9.5 and Oracle), setting the ignore_conflicts parameter to True tells the database to ignore failure to insert any rows that fail constraints such as duplicate unique values. Enabling this parameter disables setting the primary key on each model instance (if the database normally supports it).

    Example:

    Entry.objects.bulk_create([
        Entry(headline='This is a test'),
        Entry(headline='This is only a test'),
    ], ignore_conflicts=True)
    

    P.S. This will not affect all IntegrityErrors. For example, if you model has ForeignKey, and you try insert by id that does not exists, then you will still face up IntegrityError