pythoncpython-cffi

Is python's CFFI an adequate tool to parse C definitions from a header file


From python, I want to fetch the details of structures/arrays/enums defined in C headers: the list of defined types, the list and types of struct members, the names and values defined in enums, the size of arrays, etc.

I don't plan to link a C lib in python, but I wanted to use a battle-tested tool to "parse" C definitions so I picked CFFI and tried the following:

Start with a dummy test.h file

typedef struct {

    int a;
    int b[3];
    float c;

} other_struct_T;

typedef struct {

    bool i;
    bool j;
    other_struct_T k;
    
}  main_struct_T;

preprocess it once to be sure to resolve #includes, #defines, etc.

gcc -E -P -xc test.h -o test.preprocessed.h

Then load it with CFFI like this


from pathlib import Path
from cffi import FFI

u = FFI()
txt = Path("test.preprocessed.h").read_text()
u.cdef(txt)
k = u.typeof("main_struct_T")
print(k)
print(k.elements)

which prints <ctype 'main_struct_T'> first. But fails at the second one (and k seems to contains neither .length, not .item, nor .relements, as one could expect from a ctype instance, as mentioned here)

Traceback (most recent call last):
  File "./parse_header.py", line 14, in <module>
    print(k.elements)
          ^^^^^^^^^^^
AttributeError: elements

What do I miss ? How would you do it differently ?


Solution

  • Self reply here! I made progress!

    The CFFI documentation didn't helped much :D using dir() on objects returned did.

    Two options were found, the easiest one is this snippet (more complete answer at the end) :

    if k.kind == 'struct' :
        for f in k.fields :
            name, obj = f
            print(name, obj.type, obj.offset)
    

    where k is obtained exactly as explained in the question. This gives:

    i <ctype '_Bool'> 0
    j <ctype '_Bool'> 1
    k <ctype 'other_struct_T'> 4
    

    recursion can be used to dig for other_struct_T

    The other option is derived from another question (Using Python cffi for introspection) and lead to this partial snippet:

    for k in u._parser._declarations :
        v = u._parser._declarations[k][0]
        if isinstance(v, cffi.model.EnumType) :
            z[v.name] = list() 
            print(v.enumerators, v.enumvalues, v.get_c_name(), v.get_official_name())
            for name, value in zip(v.enumerators, v.enumvalues) :
                z[v.name].append((name, value))
        elif isinstance(v, cffi.model.StructType) :
            print(v.fldnames, v.fldtypes, v.fldquals, v.fldbitsize)
            z[v.name] = list()
            for name, ctype, qual, size in zip(v.fldnames, v.fldtypes, v.fldquals, v.fldbitsize) :
                z[v.name].append((name, ctype, qual, size))
    
    ...
    

    classes are different, methods and properties are different... the information inside should be the same, using u._parser._declarations feels ugly though

    Update

    Here is an (unperfect but functionnal) code:

    #!/usr/bin/env python3
    
    import collections
    import inspect
    
    from cc_pathlib import Path
    
    import cffi
    
    class ExtraFace() :
        def __init__(self, h_pth) :
            self.ffi = cffi.FFI()
            self.ffi.cdef(h_pth.read_text())
    
            self.e_map = dict() # map of types already parsed
    
        def parse(self, name, recurse=False) :
            e = self.ffi.typeof(name)
            e_set = {e,} # set of types to be parsed
            while e_set :
                e = e_set.pop()
                if e in self.e_map :
                    continue
                if e.kind == 'struct' :
                    e_set |= self.parse_struct(e)
                if e.kind == 'enum' :
                    self.parse_enum(e)
    
        def parse_struct(self, e) :
            s_map = collections.OrderedDict()
            e_set = set()
            for f in e.fields :
                name, m = f
                if m.type.kind == 'array' :
                    s_map[name] = (m.type.item.cname, m.type.length, m.offset)
                else :
                    s_map[name] = (m.type.cname, 0, m.offset)
                if m.type.kind != 'primitive' :
                    e_set.add(m.type)
            self.e_map[e.cname] = s_map
            return e_set
    
        def parse_enum(self, e) :
            self.e_map[e.cname] = e.relements
    
    if __name__ == '__main__' :
        u = ExtraFace(Path("test.preprocessed.h"))
        u.parse("main_struct_T")
        Path("e.json").save(u.e_map, verbose=True)