dockerlxmlgunicornubuntu-22.04xmlsec

Signing XML in Django view renders Nginx 502 bad gateway for newly built docker images, but not for old images


All views of my Django application work as intended, except views relating to signed XML's. These views, in particular my SAML metadata view, returns a 502 Bad Gateway error, when running on new Ubuntu docker images.

Nginx error log shows:

2024/01/15 16:04:04 [error] 14#14: *219 upstream prematurely closed connection while reading response header from upstream, client: 172.17.0.1, server: [my application].azurewebsites.net, request: "GET /sso/metadata/ HTTP/1.1", upstream: "
http://unix:/code/[my application socket file].sock:/sso/metadata/"
, host: "localhost:801"

The upstream socket location refers to my gunicorn server running my django application, and here the error log shows:

[2024-01-15 16:04:04 +0000] [10] [WARNING] Worker with pid 25 was terminated due to signal 11

Other logs show the same except signal 7 or 8.

All my images built before January 2024 (last one was December 19th) use the same packages and the relevant views are unchanged, but these (the old ones) do not render the 502 Bad Gateway error.

I am using Ubuntu 22.04, Python 3.9.18, nginx version: nginx/1.22.1, gunicorn (version 20.1.0) and Django==3.0.14

All of the relevant files, remain unchanged on both the working and non working docker images, and both images runs the same version of ubuntu (22.04), python, nginx and gunicorn.

When I run my code locally in a python environment built by Visual Studio, the view in question has no issues. But when I build my image locally in Docker for Windows the issue perists, and if I run the Image from December 19th (or any earlier image) locally on Docker for Windows the issue is gone again. This means I have no way of building new docker images where this issue does not appear.

The python view returning 502 Bad Gateway instead of signed metadata

from onelogin.saml2.auth import OneLogin_Saml2_Auth

def meta(request):
    req = {
            'https': 'on' if request.is_secure() else 'off',
            'http_host': request.META['HTTP_HOST'],
            'script_name': request.META['PATH_INFO'],
            'server_port': request.META['SERVER_PORT'],
            'get_data': request.GET.copy(),
            'post_data': request.POST.copy(),
            'query_string': request.META['QUERY_STRING']
        }
    auth = OneLogin_Saml2_Auth(req, custom_base_path=settings.SAML_FOLDER)

    saml_settings = auth.get_settings()
    metadata = saml_settings.get_sp_metadata()
    errors = saml_settings.validate_metadata(metadata)

    if len(errors) == 0:
        return HttpResponse(content=metadata, content_type='text/xml')
    else:
        return HttpResponseServerError(content=', '.join(errors))

Here settings.SAML_FOLDER is the path of my SAML configuration folder.

Steps taken to fix the issue I have tried diving into the view in question, inserting breaks along the view, in order to figure out exactly how far in the python code I can get before creating the bad gateway response.

Doing this I have found that the issue occurs when including the step metadata = saml_settings.get_sp_metadata() and within this method imported from the python3-saml package onelogin.saml2.auth.OneLogin_Saml2_Auth -> get_settings -> get_sp_metadata I found that the issue occurs at line 740 of the settings.py file from 'https://github.com/SAML-Toolkits/python3-saml/blob/master/src/onelogin/saml2/settings.py':

metadata = self.metadata_class.sign_metadata(metadata, key_metadata, cert_metadata, signature_algorithm, digest_algorithm)

Unpacking this it only contains the OneLogin_Saml2_Utils.add_sign() method, and finally unpacking this I get the issue when including line 738 of the utils.py file from 'https://github.com/SAML-Toolkits/python3-saml/blob/master/src/onelogin/saml2/utils.py':

signature = xmlsec.template.create(elem, xmlsec.Transform.EXCL_C14N, sign_algorithm_transform, ns='ds')

Stopping after this line does not always render my gateway error, but approximately 80% of my requests will render the bad gateway error, if I stop the view after this line. The more lines I include in my view from here on out the higher the percentage goes, and when I include all lines up to line 777 from 'https://github.com/SAML-Toolkits/python3-saml/blob/master/src/onelogin/saml2/utils.py' I have only gotten through once out of about 100 calls, and including further lines always render a bad gateway error.

This view will spike my CPU, which might be the source of the issue, but I am at a loss for why it only spikes the CPU for new images.

I have further tried using the latest version of gunicorn==21.2.0 with no effect, i.e. same issue at the same line of code. I use the latest version of xmlsec and python3-saml on both the working and non working image, versions 1.3.13 and 1.16.0 respectively. I tried building the image on Ubuntu 20.04 rather than 22.04, again with no effect, getting the same issue at the same line of code.


Solution

  • The python lxml package has been installed with the latest version from some package dependancy, if this package version >=5.0.0 it causes my gunicorn server to crash when generating signatures with xmlsec, and thereby nginx renders a 502 Bad Gateway error.

    Using the latest lxml package does not crash my virtual python environment, when it is Windows based, but only when running on Ubuntu.

    This means adding lxml==4.9.4 at the bottom of my requirements.txt file fixed the issue.