gotldpublic-suffix-list

Is there a way to extract only valid domains from the publicsuffix library?


I was taking a look at the publicsuffix library in Go and found it pretty useful in extracting domains out of strings. This is what I have:

package main

import (
    "fmt"

    "golang.org/x/net/publicsuffix"
)

func main() {
    url := "a.very.complex-domain.co.uk"
    u, _ := publicsuffix.EffectiveTLDPlusOne(url)
    fmt.Printf(u)
}

This works fine yeilding complex-domain.co.uk as the valid domain. However, the problem I am facing is when any random string is passed to the function (containing a dot), the library gives out a valid domain name anyhow (even if the TLD doesn't exist in the publicsuffix list).

package main

import (
    "fmt"

    "golang.org/x/net/publicsuffix"
)

func main() {
    url := "a.very.complex-domain.someinvalidtld"
    u, _ := publicsuffix.EffectiveTLDPlusOne(url)
    fmt.Printf(u)
}

Gives: complex-domain.someinvalidtld

My understanding is that the publicsuffix package assumes that it is a local domain and parses it anyhow. Is there a way to avoid this behavior and extract only valid ones out?


Solution

  • I figured it out, you can easily do it using the same library:

    func checkForValidTLD(str string) bool {
        etld, im := publicsuffix.PublicSuffix(str)
        var validtld = false
        if im { // ICANN managed
            validtld = true
        } else if strings.IndexByte(etld, '.') >= 0 { // privately managed
            validtld = true
        }
        return validtld
    }
    

    So calling the function like:

    if checkForValidTLD("a.very.complex-domain.someinvalidtld") {
        fmt.Println("Valid")
    } else {
        fmt.Println("Invalid")
    }
    

    Returns: Invalid.

    The logic behind this is: For all TLDs that aren't ICANN managed, if they have a . in them, it means that they are privately managed (e.g. blogspot.co.uk), otherwise it is invalid TLD.