pythonurluriurlparse

urlparse doesn't return params for custom schema


I am trying to use urlparse Python library to parse some custom URIs.

I noticed that for some well-known schemes params are parsed correctly:

>>> from urllib.parse import urlparse
>>> urlparse("http://some.domain/some/nested/endpoint;param1=value1;param2=othervalue2?query1=val1&query2=val2#fragment")
ParseResult(scheme='http', netloc='some.domain', path='/some/nested/endpoint', params='param1=value1;param2=othervalue2', query='query1=val1&query2=val2', fragment='fragment')
>>> urlparse("ftp://some.domain/some/nested/endpoint;param1=value1;param2=othervalue2?query1=val1&query2=val2#fragment")
ParseResult(scheme='ftp', netloc='some.domain', path='/some/nested/endpoint', params='param1=value1;param2=othervalue2', query='query1=val1&query2=val2', fragment='fragment')

...but for custom ones - they are not. params field remains empty. Instead, params are treated as a part of path:

>>> urlparse("scheme://some.domain/some/nested/endpoint;param1=value1;param2=othervalue2?query1=val1&query2=val2#fragment")
ParseResult(scheme='scheme', netloc='some.domain', path='/some/nested/endpoint;param1=value1;param2=othervalue2', params='', query='query1=val1&query2=val2', fragment='fragment')

Why there is a difference in parsing depending on schema? How can I parse params within urlparse library using custom schema?


Solution

  • This is because urlparse assumes that only a set of schemes will uses parameters in their URL format. You can see that check with in the source code.

    if scheme in uses_params and ';' in url:
            url, params = _splitparams(url)
        else:
            params = ''
    

    Which means urlparse will attempt to parse parameters only if the scheme is in uses_params (which is a list of known schemes).

    uses_params = ['', 'ftp', 'hdl', 'prospero', 'http', 'imap',
                   'https', 'shttp', 'rtsp', 'rtspu', 'sip', 'sips',
                   'mms', 'sftp', 'tel']
    

    So to get the expected output you can append your custom scheme into uses_params list and perform the urlparse call again.

    >>> from urllib.parse import uses_params, urlparse
    >>>
    >>> uses_params.append('scheme')
    >>> urlparse("scheme://some.domain/some/nested/endpoint;param1=value1;param2=othervalue2?query1=val1&query2=val2#fragment")
    ParseResult(scheme='scheme', netloc='some.domain', path='/some/nested/endpoint', params='param1=value1;param2=othervalue2', query='query1=val1&query2=val2', fragment='fragment')