pythonlinuxdnswhois

determine availability (and price?) of 50k domains


I have a list of 50k possible domain names. I'd like to find out which ones are available and if possible how much they cost. the list looks like this

presumptuous.ly
principaliti.es
procrastinat.es
productivene.ss
professional.ly
profession.ally
professorshi.ps
prognosticat.es
prohibitioni.st

I've tried whois but that runs way too slow to complete in the next 100 years.

def check_domain(domain):
try:
    # Get the WHOIS information for the domain
    w = whois.whois(domain)
    if w.status == "free":
        return True
    else:
        return False
except Exception as e:
    print("Error: ", e)
    print(domain+" had an issue")
    return False

def check_available(matches):
    print('checking availability')
    available=[]
    for match in matches:
        if(check_domain(match)):
            print("found "+match+" available!")
            available.append(match)
    return available

I've also tried names.com/names bulk upload tool but that doesn't seem to work at all.

How do I determine the availability of these domains?


Solution

  • You can use for example multiprocessing package to speed-up the process, i.e.:

    import os
    import sys
    from multiprocessing import Pool
    
    import pandas as pd
    from tqdm import tqdm
    from whois import whois
    
    
    # https://stackoverflow.com/a/8391735/10035985
    def blockPrint():
        sys.stdout = open(os.devnull, "w")
    
    
    def enablePrint():
        sys.stdout = sys.__stdout__
    
    
    def check_domain(domain):
        try:
            blockPrint()
            result = whois(domain)
        except:
            return domain, None
        finally:
            enablePrint()
        return domain, result.status
    
    
    if __name__ == "__main__":
        domains = [
            "google.com",
            "yahoo.com",
            "facebook.com",
            "xxxnonexistentzzz.domain",
        ] * 100
    
        results = []
        with Pool(processes=16) as pool:  # <-- select here how many processes do you want
            for domain, status in tqdm(
                pool.imap_unordered(check_domain, domains), total=len(domains)
            ):
                results.append((domain, not bool(status)))
    
        df = pd.DataFrame(results, columns=["domain", "is_free"])
        print(df.drop_duplicates())
    

    Prints:

    100%|██████████████████████████████████████████████| 400/400 [00:07<00:00, 55.67it/s]
    
                          domain  is_free
    0   xxxnonexistentzzz.domain     True
    5               facebook.com    False
    11                google.com    False
    14                 yahoo.com    False
    

    You can see it checks ~55 domains per second.