postgresqlkaitai-struct

Kaitai Struct: Any way to make entire body type dependent on presence/type of first byte?


I'm trying to write a Katai definition for the Postgres Wire Protocol V3:

The issue I've run into is that every message except for the StartupMessage follows the same format. The StartupMessage is shaped differently.

So I need to somehow say "The object can be one of these two types", but I'm unsure how to do it.

The layout for most messages is:

|-----------------------------------------------|
| Type  | Length | (Rest of payload)
|-----------------------------------------------|
| Char  | Int32  |  Bytes
|-----------------------------------------------|

But for the startup message, there's no Type char at the beginning identifying it:

 |-----------------------------------------------|
 | Length | Protocol Version | (Rest of payload)
 |-----------------------------------------------|
 |  Int32 |      Int32       |  Bytes
 |-----------------------------------------------|

So far, I've tried something like this:

meta:
    id: postgres_wire_protocol_frontend_v3
    file-extension: postgres_wire_protocol_frontend_v3
    endian: be

seq:
    - id: type
      type: str
      encoding: ASCII
      size: 1

    - id: length
      type: u4

    - id: body
      size: length
      type:
          switch-on: type
          cases:
              '"B"': bind_message
              '"E"': execute_message
              '"Q"': query_message
              _: startup_message

But this doesn't seem to work unfortunately =/

Is there some way to encode this in Kaitai?


Solution

  • Affiliate disclaimer: I'm a Kaitai Struct maintainer (see my GitHub profile).

    Looking into the PostgreSQL docs, it seems that the StartupMessage can only be "the very first message":

    The first byte of a message identifies the message type, and the next four bytes specify the length of the rest of the message (this length count includes itself, but not the message-type byte). The remaining contents of the message are determined by the message type. For historical reasons, the very first message sent by the client (the startup message) has no initial message-type byte.

    I don't know how your application using the Kaitai Struct-generated parser will operate, so I'm not sure how to use this information in a way that works for you. The .ksy snippet in your question indicates that you'll be processing each message with a new instance of the parser class PostgresWireProtocolFrontendV3, so I suggest making a new .ksy file just for the StartupMessage:

    meta:
      id: postgres_protocol_startup_message
    seq:
      - id: len_message
        type: u4
      - id: body
        size: len_message - len_message._sizeof
        type: message_body
    types:
      message_body:
        seq:
          - id: version_major
            type: u2
            valid: 3
          - id: version_minor
            type: u2
            valid: 0
          # ...
    

    Note: len_message._sizeof will be translated to 4 at compile time (this is needed because the field is described in PostgreSQL docs as "Length of message contents in bytes, including self."). The virtual sizeof operator is a 0.9 feature:

    • Implement compile-time sizeof and bitsizeof operators (#84)
      • Type-based: sizeof<u4>, bitsizeof<b13>, sizeof<user_type>
      • Value-based: file_header._sizeof, flags._bitsizeof (file_header, flags are fields defined in the current type)

    The valid was also introduced in 0.9 and there is no proper documentation for it yet (sorry), but you can read its description in #435.

    In your application code, you'll probably know the state of the communication (i.e. whether you're processing the first message or not), so I assume you will do something like this:

    message_raw = b'...'  # TODO: receive from the socket (probably)
    
    if is_first_message:
        startup_message = PostgresProtocolStartupMessage(KaitaiStream(BytesIO(message_raw)))
        # ...
        is_first_message = False
    else:
        message = PostgresWireProtocolFrontendV3(KaitaiStream(BytesIO(message_raw)))
    

    Of course, I don't know your use case so I just guessed what it might be, but hopefully at least some of this will be useful to you.


    So I need to somehow say "The object can be one of these two types", but I'm unsure how to do it.

    This is not how Kaitai Struct works. Kaitai Struct has (intentionally) no backtracking, it's designed to handle non-ambiguous binary formats (see https://stackoverflow.com/a/55111070/12940655). While it is possible to use instances to do some sort of lookahead to decide what follows, it's best to avoid it unless you really need it.