kaitai-struct

Kaitai struct - change default endianness based on a condition in the file


I'm dealing with files from two versions of a video game - one for the PC, one for the PS3. It's possible to tell which version of the game that a certain file comes from if the first four 4 bytes of the header - if struct.unpack_from("<f", data) says one number, it's from the PC, but if it doesn't, then struct.unpack_from(">f", data) should give that number. From there, the rest of the data is read accordingly.

I'm trying to write a parser for these files using Kaitai struct, but it seems like my options are to generate two separate KSY files for the LE and BE versions of the files, or two separate types, something like

seq:
  - id: sample_rate
    type: u4le
  - id: header
    type: header_le
    if: sample_rate == 1234
  - id: header
    type: header_be
    if: sample_rate == 4321


types:
  header_le:
    - id: sample_count
      type: u4le
    - id: channel_count
      type: u4le
  header_be:
    - id: sample_count
      type: u4be
    ...    

Either option works in the end, but I was hoping for something a bit less repetitive. Does Kaitai struct support this?


Solution

  • Affiliate disclaimer: I'm a Kaitai Struct maintainer (see my GitHub profile).

    Does Kaitai struct support this?

    Yes, see https://doc.kaitai.io/user_guide.html#calc-endian. In the top-level seq, you typically directly include only the field indicating the endianness, and the rest of the format (affected by the selected endianness) needs to be moved to a subtype where you will use meta/endian/{switch-on,cases}.

    seq:
      - id: sample_rate
        type: u4le
      - id: header
        type: header_type
    
    types:
      header_type:
        meta:
          endian:
            switch-on: _root.sample_rate
            cases:
              '0x0102_0304': le
              '0x0403_0201': be
        seq:
          - id: sample_count
            type: u4 # this will be parsed as 'le' or 'be' as decided in `meta/endian`
          - id: channel_count
            type: u4
    

    Note that any user-defined types in which you want to inherit the endianness decided in /types/header_type/meta/endian must be defined somewhere under /types/header_type/types/.... It's suggested in the User Guide example (note the ifd type):

    types:
      tiff_body:
        meta:
          endian:
            switch-on: _root.indicator
            cases:
              '[0x49, 0x49]': le
              '[0x4d, 0x4d]': be
        seq:
          - id: version
            type: u2
          # ...
        types:
          ifd:
            # inherits endianness of `tiff_body`
    

    If you define them at the top level (same as header_type), they would not inherit the endianness from header_type and you'll probably get something similar to error: unable to use type 'u4' without default endianness.

    For more examples, check out the .ksy specs in the format gallery that use it - image/exif.ksy, executable/elf.ksy or database/gettext_mo.ksy.