Sometimes, other websites use incorrectly percent-encoded URLs to link to our Django powered site. Disqus.com and Twitter.com do have the same issue, so it's nothing special about our use case: http://disqus.com/%C3A4. In this URL, a second % is missing. The valid URL looks like this: http://disqus.com/%C3%A4
Django returns an empty error 400 (Bad request) page. However, we'd like to catch the error, and instead of returning a plain non-informative page, we'd like to show our users at least our custom 404 page. Better even, we'd like to check the input URL on missing %-characters or anything alike to validate its format. MiddleWare/process_request gets called even with our present 400-errors, so we do have a hook to catch the error.
We'd like to address the issue on our site. Is there any best practice ...? A handler400 would be great - is it possible to create one on your own?
As posted already in form of a comment in Cathy's aswer that is really good, but doesn't work in this particular case, I publish our current, slightly hackish solution here as another answer:
Apparently, this error cannot be overridden inside Django's MiddleWares. It's a UNICODE decode error that is triggered inside WSGIHandler in \django\core\handlers\wsgi.py. To be precise, it is
path_info = force_unicode(environ.get('PATH_INFO', u'/'))
inside WSGIRequest, which is causing the issue. It's basically correct behavior by Django, but as described in my question, we simply want to show our users something more useful, than an empty error page. Therefore, we check incoming URL requests upon valid UNICODE characters before passing them on to our WSGIHandler. This blog post pointed us into the right direction: http://codeinthehole.com/writing/django-nginx-wsgi-and-encoded-slashes/
Thus, we reroute invalid URLs inside our wsgi.py like so:
os.environ['DJANGO_SETTINGS_MODULE'] = 'settings'
import django.core.handlers.wsgi
_application = django.core.handlers.wsgi.WSGIHandler()
# for Django 1.7+
# from django.core.wsgi import get_wsgi_application
# _application = get_wsgi_application()
from django.utils.encoding import force_unicode
def application(environ, start_response):
try:
path_info = force_unicode(environ.get('PATH_INFO', u'/'))
except:
environ['PATH_INFO'] = u'/'
return _application(environ, start_response)
Subclassing WSGHandler instead, should also work. In this example, we simply redirect invalid URLs to our site root "/". But you could also redirect to any custom error page URL, or you could try sanitizing your URL ... It works for us, but maybe there's a better solution out there.