I need a reliable way to check in Python if a domain of any TLD has been registered or is available. The bold phrases are the key points that I'm struggling with.
As I don't really need to parse the results, I ripped the code out of the whois library and tried to do the query by calling Linux's whois myself:
p = subprocess.Popen(['whois', 'example.com'], stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
r = p.communicate()[0]
print(r.decode())
That works much better. Except it's not that reliable either. I tried one particular domain and got "Your connection limit exceeded. Please slow down and try again later." Well, it's not me who is exceeding the limit. Being behind a single IP in a huge office means that somebody else might hit the limit before I make a query.
There are already similar questions:
...but they either deal only with a limited set of TLDs or are not that bothered by reliability.
If you do not have specific access (like being a registrar), and if you do not target a specific TLD (as some TLDs do have a specific public service called domain availability), the only tool that makes sense is to query whois servers.
You have then at least the following two problems:
For the second point the usual methods apply (handling delays on your side, using multiple endpoints, etc.)
For the first point, in another of my reply here: https://unix.stackexchange.com/a/407030/211833 you could find some explanations of what you observe depending on the wrapper around whois you use and some counter measures. See also my other reply here: https://webmasters.stackexchange.com/a/111639/75842 and specifically point 2.
Note that depending on your specific requirements and if you are able to at least change part of them, you may have other solutions. For example, for gTLDs, if you tolerate 24 hours delay, you may use the published zonefiles of registries to find domain names registered (those published so not all of them).
Also, why you are right in a generic sense that using a third party has its weaknesses, if you find a worthy registrar that both has access to many registries and that provides you with an API, you could then use it for your needs.
In short, I do not believe you can achieve this task with all cases (100% reliability, 100% TLDs, etc.). You will need some compromises but they depend on your initial needs.
Also very important: do not shell out to run a whois command, this will create many security and performance problems. Use the appropriate libraries from your programming language to do whois queries or just open a TCP socket on port 43 and send your queries on one line terminated by CR+LF, reading back a blob of text, this is basically only what is defined in RFC3912.