I am trying to implement an HTTP proxy server, aimed for low latency.
From what I understand, an HTTP proxy request is either a raw request with the HOST
header set to the destination, or a CONNECT
request that forwards the subsequential bytes as a raw TCP connection.
So I may implement it with the following behavior:
Read until the first space.
If it is CONNECT
, consume the head, establish a TCP connection to the destination, and response a head, then forward the subsequential data in both way.
Otherwise, forward the whole request data to the destination and the whole response data to the client in parallel.
This would simply cover HTTP 1.1 and HTTPS connections.
But I am not sure if it would also work when it comes to HTTP2, due to the following questions:
Is a cleartext HTTP2 request sent via raw request or a CONNECT
? Is it guaranteed by some standard or commonly used client implementations?
An HTTP2 connection would contain multiple streams, that may contain multiple HOST
s. Is the client guaranteed to send requests that contain different HOST
s via different proxy connection?.
Does the client assume the specified proxy server is not HTTP2-awared and support the case of 'HTTP2 over HTTP 1.1 proxy'? Is it guaranteed by some standard or commonly used client implementations?
I could not even find a formal protocol specification about the detailed proxying behavior that may answer these questions.
Regarding your HTTP/1.1 coverage, it is a bit naive, not entirely correct, but more or less it would work.
For example, for clear-text HTTP/1.1 a proxy must convert the target, which arrives in absolute-form, to origin-form.
A proxy should also drop hop headers, add its own headers, etc.
Regarding HTTP/2, it is rare that a proxy receives clear-text (towards the server) HTTP/2, but if it does, it is a normal request (not a CONNECT), and the proxy should process the headers (again, drop hop headers, add its own) and forward the HTTP/2 frames to the origin indicated by the authority.
In case of secure (towards the server) HTTP/2, the proxy will receive a CONNECT request for each HTTP/2 stream, and each such stream may have a different origin. The client may use the same connection to the proxy to send requests to different origin servers.
How the client communicates with the proxy is orthogonal.
Even the client-to-proxy communication could be secure; while in the past this was rarely the case, it is becoming more common and sometimes even a requirement.
If the client-to-proxy communication is secure, they can negotiate what protocol to speak, either HTTP/1.1 or HTTP/2, via ALPN.
You can have a situation where a client speaks encrypted HTTP/2 to the proxy (using certificate "proxy"), but then send a CONNECT to the proxy for a server, which would establish the tunnel from the client to the origin server.
Once the tunnel is established, the data that flows inside could be encrypted HTTP/1.1 or encrypted HTTP/2 (using certificate "server").
You therefore have double encryption, and this is how it must be.
In case of clear-text communication between client and proxy, the client must know a priori what protocols the proxy speaks.
In case of secure communication between client and proxy, the client must not assume the protocol, but instead negotiate it with the proxy via ALPN.
You can therefore have all the permutations: client speaking HTTP/1.1 to the proxy, but either HTTP/1.1 or HTTP/2 to the origin server; and client speaking HTTP/2 to the proxy, but either HTTP/1.1 or HTTP/2 to the origin server.
[Disclaimer, I have implemented the above in the Jetty Project]
You can see in this test how we set up a proxy matrix with all the permutations discussed above.
Jetty already offers a full-blown proxy implementation, fully non-blocking, via the Jetty class ProxyHandler
(which you can use as inspiration).