dnsreverse-proxyweb-hostingcnamehostheaders

How does Host header help on a physical host hosting multiple Servers?


I have 1 single machine with an IP 1.2.3.4. This machine has 2 web servers and an ftp server:

This is what the DNS mapping looks like:

ws1.example.com CNAME example.com
ws2.example.com CNAME example.com
ftp.example.com CNAME example.com
example.com A 1.2.3.4

Case 1: I make a request at the browser URL ws1.example.com:82 and the DNS redirects me to example.com but with the Host header: ws1.example.com.

Case 2: I make a request at the browser URL ws2.example.com:83 and the DNS redirects me to example.com but with the Host header: ws2.example.com.

In both the cases:

The Host header, as I understand, is used to inform the receiving host to identify which server (from the multiple servers that this IP has been hosting) is this request meant for and accordingly directs the request to the appropriate application.

My question is:


Solution

  • No DNS redirections

    First an important terminology fix:

    There are no "redirects" in the DNS. In your case, the DNS is just use to map a name to an IP. Sometimes, because of CNAME, a name is mapped to another name which is then mapped to an IP. It does not matter if there are intermediate steps like that, at the end a name maps to an IP (or there is a DNS resolution failure)

    This also means that if the URL has a specific port, then that is not changed, the final IP will be queried over the port mentioned in the URL.

    Redirections are an HTTP level feature: when querying a webserver for https://www.mygreatsite.example/foo it will reply with an HTTP return code of 301, 302, 303, 307 or 308 and giving you (the HTTP client, aka the browser) the new URL to go to.

    HTTP virtual hosting

    In the good old days, IP addresses were plenty. If you were hosting both www.site1.example and www.site2.example on the same physical box you could attach one different IP address to each. Hence, in that specific case, in a way, the HTTP host header is useless, the mere fact of connecting either to 192.0.2.37 or 192.0.2.42 already lets you know which site you want. In fact in HTTP/0.9 there was no host header, as there were no headers at all.

    But then, with mass virtual hosting coming into play, and IPv4 addresses becoming scarce, you could not anymore attach one single IP address per site, since it was also a waste. So you had, through the DNS, either directly or indirectly (CNAME records), both websites resolving to the same IP.

    Hence when the HTTP client connected to the server, the server by default has no way to know which website do you want. That is why the HTTP host header filled by the client lets the server know which website you want to access, irrespective to its IP address, that was resolved earlier through the DNS.

    By default HTTP uses port 80, so it is often not visible in the URLs. Of course if you forced your clients to use http://www.site1.example:4569 on one side and http://www.anothersite2.com:9873 on another side, then you are right the host header would not be really needed. Except that the plan falls down for many reasons:

    1. Port numbers are not an infinite space either and many of them are already used typically for other things; so even if you extend this scheme at one point you could not attach new websites to the same IP
    2. But more important than the previous technical point, for humans this will be a nightmare and many people will use forget the port number and then not coming to the appropriate website.

    Hence typically it is not done like that, if you want to expose some given service over HTTP but in a non default port you typically install a reverse proxy in front of it. Or you do an HTTP redirection from http://www.coolpublicname.example/ to http://www.complicatedinternalname.example:9713, but then the client sees this naked truth.

    HTTPS virtual hosting

    In passing note that HTTPS added a level of complexity because the HTTPS webserver needs to send its certificate to the client, but since each website can have a different certificate it needs to know which website the client wants to use, which it could learn through the host HTTP header but then comes after the TLS handshake is finished, so in the early stage of the server sending a certificate this is not available yet.

    So at the earliest times of HTTPS we were forced again to do IP-based virtual hosting and not name-based virtual hosting like it was possible in pure HTTP thanks to the host header.

    The solution was found with a TLS extension, the Server Name Indication (SNI), something that the client sends early to the server and gives the website name, so that the server can send the appropriate certificate, and hence we are back in business in the name-based case where you can theoretically have an infinite number of names resolving to the same IP for them to be served by one given webserver.