I'm trying to write a Katai definition for the Postgres Wire Protocol V3:
The issue I've run into is that every message except for the StartupMessage
follows the same format. The StartupMessage
is shaped differently.
So I need to somehow say "The object can be one of these two types", but I'm unsure how to do it.
The layout for most messages is:
|-----------------------------------------------|
| Type | Length | (Rest of payload)
|-----------------------------------------------|
| Char | Int32 | Bytes
|-----------------------------------------------|
But for the startup message, there's no Type
char at the beginning identifying it:
|-----------------------------------------------|
| Length | Protocol Version | (Rest of payload)
|-----------------------------------------------|
| Int32 | Int32 | Bytes
|-----------------------------------------------|
So far, I've tried something like this:
meta:
id: postgres_wire_protocol_frontend_v3
file-extension: postgres_wire_protocol_frontend_v3
endian: be
seq:
- id: type
type: str
encoding: ASCII
size: 1
- id: length
type: u4
- id: body
size: length
type:
switch-on: type
cases:
'"B"': bind_message
'"E"': execute_message
'"Q"': query_message
_: startup_message
But this doesn't seem to work unfortunately =/
Is there some way to encode this in Kaitai?
Affiliate disclaimer: I'm a Kaitai Struct maintainer (see my GitHub profile).
Looking into the PostgreSQL docs, it seems that the StartupMessage
can only be "the very first message":
The first byte of a message identifies the message type, and the next four bytes specify the length of the rest of the message (this length count includes itself, but not the message-type byte). The remaining contents of the message are determined by the message type. For historical reasons, the very first message sent by the client (the startup message) has no initial message-type byte.
I don't know how your application using the Kaitai Struct-generated parser will operate, so I'm not sure how to use this information in a way that works for you. The .ksy
snippet in your question indicates that you'll be processing each message with a new instance of the parser class PostgresWireProtocolFrontendV3
, so I suggest making a new .ksy
file just for the StartupMessage
:
meta:
id: postgres_protocol_startup_message
seq:
- id: len_message
type: u4
- id: body
size: len_message - len_message._sizeof
type: message_body
types:
message_body:
seq:
- id: version_major
type: u2
valid: 3
- id: version_minor
type: u2
valid: 0
# ...
Note: len_message._sizeof
will be translated to 4
at compile time (this is needed because the field is described in PostgreSQL docs as "Length of message contents in bytes, including self."). The virtual sizeof
operator is a 0.9 feature:
- Implement compile-time
sizeof
andbitsizeof
operators (#84)
- Type-based:
sizeof<u4>
,bitsizeof<b13>
,sizeof<user_type>
- Value-based:
file_header._sizeof
,flags._bitsizeof
(file_header
,flags
are fields defined in the current type)
The valid
was also introduced in 0.9 and there is no proper documentation for it yet (sorry), but you can read its description in #435.
In your application code, you'll probably know the state of the communication (i.e. whether you're processing the first message or not), so I assume you will do something like this:
message_raw = b'...' # TODO: receive from the socket (probably)
if is_first_message:
startup_message = PostgresProtocolStartupMessage(KaitaiStream(BytesIO(message_raw)))
# ...
is_first_message = False
else:
message = PostgresWireProtocolFrontendV3(KaitaiStream(BytesIO(message_raw)))
Of course, I don't know your use case so I just guessed what it might be, but hopefully at least some of this will be useful to you.
So I need to somehow say "The object can be one of these two types", but I'm unsure how to do it.
This is not how Kaitai Struct works. Kaitai Struct has (intentionally) no backtracking, it's designed to handle non-ambiguous binary formats (see https://stackoverflow.com/a/55111070/12940655). While it is possible to use instances
to do some sort of lookahead to decide what follows, it's best to avoid it unless you really need it.