I'm dealing with files from two versions of a video game - one for the PC, one for the PS3. It's possible to tell which version of the game that a certain file comes from if the first four 4 bytes of the header - if struct.unpack_from("<f", data)
says one number, it's from the PC, but if it doesn't, then struct.unpack_from(">f", data)
should give that number. From there, the rest of the data is read accordingly.
I'm trying to write a parser for these files using Kaitai struct, but it seems like my options are to generate two separate KSY files for the LE and BE versions of the files, or two separate types, something like
seq:
- id: sample_rate
type: u4le
- id: header
type: header_le
if: sample_rate == 1234
- id: header
type: header_be
if: sample_rate == 4321
types:
header_le:
- id: sample_count
type: u4le
- id: channel_count
type: u4le
header_be:
- id: sample_count
type: u4be
...
Either option works in the end, but I was hoping for something a bit less repetitive. Does Kaitai struct support this?
Affiliate disclaimer: I'm a Kaitai Struct maintainer (see my GitHub profile).
Does Kaitai struct support this?
Yes, see https://doc.kaitai.io/user_guide.html#calc-endian. In the top-level seq
, you typically directly include only the field indicating the endianness, and the rest of the format (affected by the selected endianness) needs to be moved to a subtype where you will use meta/endian/{switch-on,cases}
.
seq:
- id: sample_rate
type: u4le
- id: header
type: header_type
types:
header_type:
meta:
endian:
switch-on: _root.sample_rate
cases:
'0x0102_0304': le
'0x0403_0201': be
seq:
- id: sample_count
type: u4 # this will be parsed as 'le' or 'be' as decided in `meta/endian`
- id: channel_count
type: u4
Note that any user-defined types in which you want to inherit the endianness decided in /types/header_type/meta/endian
must be defined somewhere under /types/header_type/types/...
. It's suggested in the User Guide example (note the ifd
type):
types:
tiff_body:
meta:
endian:
switch-on: _root.indicator
cases:
'[0x49, 0x49]': le
'[0x4d, 0x4d]': be
seq:
- id: version
type: u2
# ...
types:
ifd:
# inherits endianness of `tiff_body`
If you define them at the top level (same as header_type
), they would not inherit the endianness from header_type
and you'll probably get something similar to error: unable to use type 'u4' without default endianness
.
For more examples, check out the .ksy specs in the format gallery that use it - image/exif.ksy, executable/elf.ksy or database/gettext_mo.ksy.