I have 1 single machine with an IP 1.2.3.4. This machine has 2 web servers and an ftp server:
Web Server 1 listens to port 82; the domain for it: ws1.example.com
Web Server 2 listens to port 83; the domain for it: ws2.example.com
FTP Server listens to port 21; the domain for it: ftp.example.com
This is what the DNS mapping looks like:
ws1.example.com CNAME example.com
ws2.example.com CNAME example.com
ftp.example.com CNAME example.com
example.com A 1.2.3.4
Case 1: I make a request at the browser URL ws1.example.com:82 and the DNS redirects me to example.com but with the Host header: ws1.example.com.
Case 2: I make a request at the browser URL ws2.example.com:83 and the DNS redirects me to example.com but with the Host header: ws2.example.com.
In both the cases:
the request ultimately reaches the same physical machine
when the request arrives:
In Case 1, the request arrives at this machine and the request is attended to by the application that is listening on port 82 i.e. Web Server 1.
In Case 2, the request arrives at this machine and the request is attended to by the application that is listening on port 83 i.e. Web Server 2.
The Host header, as I understand, is used to inform the receiving host to identify which server (from the multiple servers that this IP has been hosting) is this request meant for and accordingly directs the request to the appropriate application.
My question is:
In this example, what is the purpose of the Host header as the same physical machine with the same IP has multiple applications listening at their corresponding ports. Once the request reaches this machine, the appropriate port will anyway pick up and the other applications will ignore the request as the port does not match the request. So, what purpose is the Host header serving here when apprpriate ports are anyway doing their job, right and well?
Can I infer that
make sense only when you are using something like a Reverse Proxy e.g. 1 machine interfaces with the client and redirects user requests to the appropriate web server on separate machines all listening on the same port e.g. 80, each in the network behind the reverse proxy in which case you have ws1.example.com and ws2.exmple.com both be redirected to the reverse proxy example.com and this reverse proxy now forwards it to the appropriate host based on the Host header?
First an important terminology fix:
There are no "redirects" in the DNS. In your case, the DNS is just use to map a name to an IP. Sometimes, because of CNAME, a name is mapped to another name which is then mapped to an IP. It does not matter if there are intermediate steps like that, at the end a name maps to an IP (or there is a DNS resolution failure)
This also means that if the URL has a specific port, then that is not changed, the final IP will be queried over the port mentioned in the URL.
Redirections are an HTTP level feature: when querying a webserver for https://www.mygreatsite.example/foo
it will reply with an HTTP return code of 301, 302, 303, 307 or 308 and giving you (the HTTP client, aka the browser) the new URL to go to.
In the good old days, IP addresses were plenty. If you were hosting both www.site1.example
and www.site2.example
on the same physical box you could attach one different IP address to each.
Hence, in that specific case, in a way, the HTTP host
header is useless, the mere fact of connecting either to 192.0.2.37
or 192.0.2.42
already lets you know which site you want.
In fact in HTTP/0.9
there was no host
header, as there were no headers at all.
But then, with mass virtual hosting coming into play, and IPv4 addresses becoming scarce, you could not anymore attach one single IP address per site, since it was also a waste.
So you had, through the DNS, either directly or indirectly (CNAME
records), both websites resolving to the same IP.
Hence when the HTTP client connected to the server, the server by default has no way to know which website do you want. That is why the HTTP host
header filled by the client lets the server know which website you want to access, irrespective to its IP address, that was resolved earlier through the DNS.
By default HTTP uses port 80, so it is often not visible in the URLs.
Of course if you forced your clients to use http://www.site1.example:4569
on one side and http://www.anothersite2.com:9873
on another side, then you are right the host
header would not be really needed.
Except that the plan falls down for many reasons:
Hence typically it is not done like that, if you want to expose some given service over HTTP but in a non default port you typically install a reverse proxy in front of it. Or you do an HTTP redirection from http://www.coolpublicname.example/
to http://www.complicatedinternalname.example:9713
, but then the client sees this naked truth.
In passing note that HTTPS added a level of complexity because the HTTPS webserver needs to send its certificate to the client, but since each website can have a different certificate it needs to know which website the client wants to use, which it could learn through the host
HTTP header but then comes after the TLS handshake is finished, so in the early stage of the server sending a certificate this is not available yet.
So at the earliest times of HTTPS we were forced again to do IP-based virtual hosting and not name-based virtual hosting like it was possible in pure HTTP thanks to the host
header.
The solution was found with a TLS extension, the Server Name Indication (SNI), something that the client sends early to the server and gives the website name, so that the server can send the appropriate certificate, and hence we are back in business in the name-based case where you can theoretically have an infinite number of names resolving to the same IP for them to be served by one given webserver.