pythonstructbinaryfilesnamedtuple

Is there an elegant way to use struct and namedtuple instead of this?


I'm reading a binary file made up of records that in C would look like this:

typedef _rec_t
{
  char text[20];
  unsigned char index[3];
} rec_t;

Now I'm able to parse this into a tuple with 23 distinct values, but would prefer if I could use namedtuple to combine the first 20 bytes into text and the three remaining bytes into index. How can I achieve that? Basically instead of one tuple of 23 values I'd prefer to have two tuples of 20 and 3 values respectively and access these using a "natural name", i.e. by means of namedtuple.

I am currently using the format "20c3B" for struct.unpack_from().

Note: There are many consecutive records in the string when I call parse_text.


My code (stripped down to the relevant parts):

#!/usr/bin/env python
import sys
import os
import struct
from collections import namedtuple

def parse_text(data):
    fmt = "20c3B"
    l = len(data)
    sz = struct.calcsize(fmt)
    num = l/sz
    if not num:
        print "ERROR: no records found."
        return
    print "Size of record %d - number %d" % (sz, num)
    #rec = namedtuple('rec', 'text index')
    empty = struct.unpack_from(fmt, data)
    # Loop through elements
    # ...

def main():
    if len(sys.argv) < 2:
        print "ERROR: need to give file with texts as argument."
        sys.exit(1)
    s = os.path.getsize(sys.argv[1])
    f = open(sys.argv[1])
    try:
        data = f.read(s)
        parse_text(data)
    finally:
        f.close()

if __name__ == "__main__":
    main()

Solution

  • According to the docs: http://docs.python.org/library/struct.html

    Unpacked fields can be named by assigning them to variables or by wrapping the result in a named tuple:

    >>> record = 'raymond   \x32\x12\x08\x01\x08'
    >>> name, serialnum, school, gradelevel = unpack('<10sHHb', record)
    
    >>> from collections import namedtuple
    >>> Student = namedtuple('Student', 'name serialnum school gradelevel')
    >>> Student._make(unpack('<10sHHb', record))
    Student(name='raymond   ', serialnum=4658, school=264, gradelevel=8)
    

    so in your case

    >>> import struct
    >>> from collections import namedtuple
    >>> data = "1"*23
    >>> fmt = "20c3B"
    >>> Rec = namedtuple('Rec', 'text index') 
    >>> r = Rec._make([struct.unpack_from(fmt, data)[0:20], struct.unpack_from(fmt, data)[20:]])
    >>> r
    Rec(text=('1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1'), index=(49, 49, 49))
    >>>
    

    slicing the unpack variables maybe a problem, if the format was fmt = "20si" or something standard where we don't return sequential bytes, we wouldn't need to do this.

    >>> import struct
    >>> from collections import namedtuple
    >>> data = "1"*24
    >>> fmt = "20si"
    >>> Rec = namedtuple('Rec', 'text index') 
    >>> r = Rec._make(struct.unpack_from(fmt, data))
    >>> r
    Rec(text='11111111111111111111', index=825307441)
    >>>