pythondjangosqlitepysqlite

Change text_factory in Django/sqlite


I have a django project that uses a sqlite database that can be written to by an external tool. The text is supposed to be UTF-8, but in some cases there will be errors in the encoding. The text is from an external source, so I cannot control the encoding. Yes, I know that I could write a "wrapping layer" between the external source and the database, but I prefer not having to do this, especially since the database already contains a lot of "bad" data.

The solution in sqlite is to change the text_factory to something like: lambda x: unicode(x, "utf-8", "ignore")

However, I don't know how to tell the Django model driver this.

The exception I get is:

'Could not decode to UTF-8 column 'Text' with text' in /var/lib/python-support/python2.5/django/db/backends/sqlite3/base.py in execute

Somehow I need to tell the sqlite driver not to try to decode the text as UTF-8 (at least not using the standard algorithm, but it needs to use my fail-safe variant).


Solution

  • The solution in sqlite is to change the text_factory to something like: lambda x: unicode(x, "utf-8", "ignore")

    However, I don't know how to tell the Django model driver this.

    Have you tried

    from django.db import connection
    connection.connection.text_factory = lambda x: unicode(x, "utf-8", "ignore")
    

    before running any queries?