
Python urlparse -- extract domain name without subdomain

Need a way to extract a domain name without the subdomain from a url using Python urlparse.

For example, I would like to extract "" from a full url like "".

The closest I can seem to come with urlparse is the netloc attribute, but that includes the subdomain, which in this example would be

I know that it is possible to write some custom string manipulation to turn into, but I want to avoid by-hand string transforms or regex in this task. (The reason for this is that I am not familiar enough with url formation rules to feel confident that I could consider every edge case required in writing a custom parsing function.)

Or, if urlparse can't do what I need, does anyone know any other Python url-parsing libraries that would?


  • You probably want to check out tldextract, a library designed to do this kind of thing.

    It uses the Public Suffix List to try and get a decent split based on known gTLDs, but do note that this is just a brute-force list, nothing special, so it can get out of date (although hopefully it's curated so as not to).

    >>> import tldextract
    >>> tldextract.extract('')
    ExtractResult(subdomain='', domain='cnn', suffix='com')

    So in your case:

    >>> extracted = tldextract.extract('')
    >>> "{}.{}".format(extracted.domain, extracted.suffix)