I need to extract and process data (variably-sized binary messages) from a very large message log. Using the Gif example and the online documentation, I have defined and compiled the variably-sized message layout into msg_log.py. Calling msg_log.from_file("small_logfile") enables me to inspect and verify field values from the first message in the logfile.
For small logfiles that fit in memory, how do I get msg_log.py to inspect the 2nd, 3rd, and subsequent messages in the log?
For very large logfiles, I would expect to page the input through a byte buffer. I haven't done that yet and haven't found examples or discussion on how to go about it. How do I keep msg_log.py in sync with the paged byte buffer as the content changes?
My message structure is currently defined as follows. (I have also used "seq" instead of "instances", but still could only inspect the first message.)
meta:
id: message
endian: be
instances:
msg_header:
pos: 0x00
type: message_header
dom_header:
pos: 0x06
type: domain_header
body:
pos: 0x2b
size: msg_header.length - 43
types:
message_header:
seq:
- id: length
type: u1
<other fixed-size fields - 5 bytes>
domain_header:
seq:
<fixed-size fields - 37 bytes>
message_body:
seq:
- id: body
size-eos: true
Parsing multiple structures in a row from a single stream can be achieved by something like:
from msg_log import Message
from kaitaistruct import KaitaiStream
f = open("yourfile.bin", "rb")
stream = KaitaiStream(f)
obj1 = Message(stream)
obj2 = Message(stream)
obj3 = Message(stream)
# etc
stream.close()
I'm not sure what you mean by "paging through a byte buffer". The method above by itself does not load whole file into memory, it reads it using normal read()
-like calls as requested.
If you want somewhat better performance, and you deal with a large file of a fixed size, you can opt to do a memory mapping. This way you would be just using a region of memory, and OS would take care of input/output required to load relevant parts of the file into actual physical memory. For Python, there is a PR for runtime that implements helpers for that, or, you can just do it yourself by doing:
from kaitaistruct import KaitaiStream
import mmap
f = open("yourfile.bin", "rb")
with mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ) as buf:
stream = KaitaiStream(BytesIO(buf))
obj1 = Message(stream)
obj2 = Message(stream)
obj3 = Message(stream)
# etc