pythonurlrequestbroken-links

How to handle links containing space between them in Python


I am trying to extract links from a webpage and then open them in my web browser. My Python program is able to successfully extract the links, but some links have spaces between them which cannot be open using request module.

For example example.com/A, B C it will not open using the request module. But if I convert it into example.com/A,%20B%20C it will open. Is there a simple way in python to fill the spaces with %20 ?

`http://example.com/A, B C` ---> `http://example.com/A,%20B%20C`

I want to convert all links which have spaces between them into the above format.


Solution

  • urlencode actually takes a dictionary, for example:

    >>> urllib.urlencode({'test':'param'})
    'test=param'`
    

    You actually need something like this:

    import urllib
    import urlparse
    
    def url_fix(s, charset='utf-8'):
        if isinstance(s, unicode):
            s = s.encode(charset, 'ignore')
        scheme, netloc, path, qs, anchor = urlparse.urlsplit(s)
        path = urllib.quote(path, '/%')
        qs = urllib.quote_plus(qs, ':&=')
        return urlparse.urlunsplit((scheme, netloc, path, qs, anchor))
    

    Then:

    >>>url_fix('http://example.com/A, B C')    
    'http://example.com/A%2C%20B%20C'
    

    Taken from How can I normalize a URL in python