import ssl
import socket
ssl_context = ssl.create_default_context()
target = 'swapi.co'
port = 443
resource = '/api/people/1/'
client = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
secure_client = ssl_context.wrap_socket(client, server_hostname=target)
send_str = 'GET {} HTTP/1.1\r\nHost: {}:{}\r\n\r\n'.format(resource, target, str(port))
secure_client.connect((target, port))
secure_client.send(send_str.encode())
print(send_str)
print(len(secure_client.recv(8192))) # 1282
print(len(secure_client.recv(8192))) # 5. Why?
Above is a simple Python program that sends an HTTP request to Star Wars API using TCP sockets.
This is the request sent:
GET /api/people/1/ HTTP/1.1
Host: swapi.co:443
The response header has Transfer-Encoding: chunked
in it. When the first recv is executed the header and the first chunk is obtained. However, to get the last chunk with terminator sequence ("0\r\n\r\n"), a second recv must be called. What is the underlying cause of this behavior?
TCP is a protocol that provides a stream of bytes. It doesn't provide any way to "glue" bytes together into messages. The actual number of bytes you will receive when you call recv
is arbitrary and will depend on all kinds of factors that vary such as the exact implementation of the other side, how quickly you got around to calling recv
, the network's maximum message size, and so on. It doesn't mean anything.
Since you indicated in your query that you support HTTP version 1.1, the server is permitted to use any encoding HTTP 1.1 clients are required to support. That includes this form of chunked encoding which uses one or more "chunks" of data, each preceded by a size indicator. This is convenient for cases where the output is generated by a script and the server won't know how big it is until the entire response is generated. This encoding scheme allows sending to begin immediately.
Don't claim HTTP 1.1 compliance in an HTTP query unless your code supports everything the HTTP 1.1 standard says a client "MUST" support.