I'm working a PHP script to try and resolve a vague URL (for example typing in facebook.com) as an absolute url (such as https://www.facebook.com); similar to what your browser does on a daily basis using PHP.
So far I've got the following code:
$link = gethostbyname("facebook.com");
This provides an IPV4 address, which works, but then when I reverse lookup using:
$link2 = gethostbyaddr($link);
I'm expecting to receive a valid URL like "https://www.facebook.com", but instead, I get garbage such as "'edge-star-mini-shv-13-atn1.facebook.com'"
This then breaks any hope of using fopen or curl to try and read the contents of the webpage.
Can anyone explain what's gone wrong here and how I can resolve it?
EDIT: Attempting an insecure URL like "google.co.uk" returns "'lhr25s10-in-f3.1e100.net'", so it's not something to do with secure HTTP (HTTPS)
gethostbyaddr
gets a hostname, not a URL, for an IP address.
Multiple hostnames can be assigned to a single IP address.
gethostbyaddr
will get the default one.
An HTTP server listening on that IP address will handle requests to all the hostnames.
An HTTP request includes a request header called Host
which specifies which hostname you are asking for.
The HTTP server can pay attention to that header and serve up different content for different hostnames. This allows multiple websites to be hosted on a single IP address. This is very useful since IPv4 addresses are in limited supply and there are many, many websites.
You are getting the default hostname for the computer hosting facebook.com
, but the webserver isn't hosting the website you want on that hostname.