Given that the TCP connection is already established when the HTTP request is sent, the IP address and port are implicitly known -- a TCP connection is an IP + Port.
So, why do we need the Host
header? Is this only needed for the case where there are multiple hosts mapped to the IP address implied in the TCP connection?
The Host
Header tells the webserver which virtual host to use (if set up). You can even have the same virtual host using several aliases (= domains and wildcard-domains). In this case, you still have the possibility to read that header manually in your web app if you want to provide different behavior based on different domains addressed. This is possible because in your webserver you can (and if I'm not mistaken you must) set up one vhost to be the default host. This default vhost is used whenever the host
header does not match any of the configured virtual hosts.
That means: You get it right, although saying "multiple hosts" may be somewhat misleading: The host (the addressed machine) is the same, what really gets resolved to the IP address are different domain names (including subdomains) that are also referred to as hostnames (but not hosts!).
Although not part of the question, a fun fact: This specification led to problems with SSL in the early days because the web server has to deliver the certificate that corresponds to the domain the client has addressed. However, in order to know what certificate to use, the webserver should have known the addressed hostname in advance. But because the client sends that information only over the encrypted channel (which means: after the certificate has already been sent), the server had to assume you browsed the default host. That meant one SSL-secured domain per IP address / port-combination.
This has been overcome with Server Name Indication; however, that again breaks some privacy, as the server name is now transferred in plain text again, so every man-in-the-middle would see which hostname you are trying to connect to.
Although the webserver would know the hostname from Server Name Indication, the Host
header is not obsolete, because the Server Name Indication information is only used within the TLS handshake. With an unsecured connection, there is no Server Name Indication at all, so the Host
header is still valid (and necessary).
Another fun fact: Most HTTP-server implementations (if not all) reject your HTTP request if it does not contain exactly one Host
header, even if it could be omitted because there is only the default vhost configured. That means the minimum required information in an HTTP-(GET-)request is the first line containing METHOD
RESOURCE
and PROTOCOL VERSION
and at least the Host
header, like this:
GET /someresource.html HTTP/1.1
Host: www.example.com
In the MDN Documentation on the "Host" header they actually phrase it like this:
A Host header field must be sent in all HTTP/1.1 request messages. A 400 (Bad Request) status code will be sent to any HTTP/1.1 request message that lacks a Host header field or contains more than one.
As mentioned by Darrel Miller, the complete specs can be found in RFC7230.