postgresqlnetwork-programmingprotocols

What does "losing synchronization with the message stream" mean in application protocol, PostgreSQL?


I'm reading about how PostgreSQL (and similar network protocols) handle message parsing, and I found this statement:

To avoid losing synchronization with the message stream, both servers and clients typically read an entire message into a buffer (using the byte count) before attempting to process its contents.

But I'm confused about two things:

  1. What exactly does "losing synchronization with the message stream" mean here? Is it about reading the wrong bytes, or missing message boundaries? What kind of synchronization is being referred to?

  2. What could go wrong if I start parsing the message before reading the entire message? I can't think of concrete examples where parsing part of the message early would cause a real issue. Are there practical cases where people actually parse in chunks and get into trouble?


Solution

  • What exactly does "losing synchronization with the message stream" mean here? Is it about reading the wrong bytes, or missing message boundaries? What kind of synchronization is being referred to?

    Disagreement over where the message boundaries are. For example, if a 50-byte message is sent but the server only reads 20 bytes of it, for some reason returning to the mainloop early, then reads the remaining bytes as if they were the beginning of a new message.

    What could go wrong if I start parsing the message before reading the entire message? I can't think of concrete examples where parsing part of the message early would cause a real issue. Are there practical cases where people actually parse in chunks and get into trouble?

    It can cause an issue if your parser code returns early due to some issue, and fails to read the remainder of the message. When that happens, the remainder of the message is still buffered in the socket and will be read as a new message – whichever data was in the middle of the payload will be then parsed as the message's "header".

    In the best case the server will detect it as a malformed message and close the connection, but if the remainder of the payload was deliberately prepared to look like a proper PostgreSQL request packet, then the client (e.g. if you have a trusted client – a web app submitting an INSERT with a forum message from an untrusted visitor) could be tricked into submitting arbitrary DB commands from the user.

    See "HTTP Request Smuggling" for one real-world example of losing sync, where it becomes an actual security issue – HTTP proxy (being the client in this case) and HTTP server become desynchronized, leading to the server interpreting the request's payload as a separate request, bypassing security measures that the proxy might have. (For example, the proxy might be adding X-Original-IP headers that the server trusts, and the original client becomes able to spoof them in this way.)