Is python's CFFI an adequate tool to parse C definitions from a header file

From python, I want to fetch the details of structures/arrays/enums defined in C headers: the list of defined types, the list and types of struct members, the names and values defined in enums, the size of arrays, etc.

I don't plan to link a C lib in python, but I wanted to use a battle-tested tool to "parse" C definitions so I picked CFFI and tried the following:

Start with a dummy test.h file

typedef struct {

    int a;
    int b[3];
    float c;

} other_struct_T;

typedef struct {

    bool i;
    bool j;
    other_struct_T k;
    
}  main_struct_T;

preprocess it once to be sure to resolve #includes, #defines, etc.

gcc -E -P -xc test.h -o test.preprocessed.h

Then load it with CFFI like this


from pathlib import Path
from cffi import FFI

u = FFI()
txt = Path("test.preprocessed.h").read_text()
u.cdef(txt)
k = u.typeof("main_struct_T")
print(k)
print(k.elements)

which prints <ctype 'main_struct_T'> first. But fails at the second one (and k seems to contains neither .length, not .item, nor .relements, as one could expect from a ctype instance, as mentioned here)

Traceback (most recent call last):
  File "./parse_header.py", line 14, in <module>
    print(k.elements)
          ^^^^^^^^^^^
AttributeError: elements

What do I miss ? How would you do it differently ?

Solution

Self reply here! I made progress!

The CFFI documentation didn't helped much :D using dir() on objects returned did.

Two options were found, the easiest one is this snippet (more complete answer at the end) :

if k.kind == 'struct' :
    for f in k.fields :
        name, obj = f
        print(name, obj.type, obj.offset)

where k is obtained exactly as explained in the question. This gives:

i <ctype '_Bool'> 0
j <ctype '_Bool'> 1
k <ctype 'other_struct_T'> 4

recursion can be used to dig for other_struct_T

The other option is derived from another question (Using Python cffi for introspection) and lead to this partial snippet:

for k in u._parser._declarations :
    v = u._parser._declarations[k][0]
    if isinstance(v, cffi.model.EnumType) :
        z[v.name] = list() 
        print(v.enumerators, v.enumvalues, v.get_c_name(), v.get_official_name())
        for name, value in zip(v.enumerators, v.enumvalues) :
            z[v.name].append((name, value))
    elif isinstance(v, cffi.model.StructType) :
        print(v.fldnames, v.fldtypes, v.fldquals, v.fldbitsize)
        z[v.name] = list()
        for name, ctype, qual, size in zip(v.fldnames, v.fldtypes, v.fldquals, v.fldbitsize) :
            z[v.name].append((name, ctype, qual, size))

...

classes are different, methods and properties are different... the information inside should be the same, using u._parser._declarations feels ugly though

Update

Here is an (unperfect but functionnal) code:

#!/usr/bin/env python3

import collections
import inspect

from cc_pathlib import Path

import cffi

class ExtraFace() :
    def __init__(self, h_pth) :
        self.ffi = cffi.FFI()
        self.ffi.cdef(h_pth.read_text())

        self.e_map = dict() # map of types already parsed

    def parse(self, name, recurse=False) :
        e = self.ffi.typeof(name)
        e_set = {e,} # set of types to be parsed
        while e_set :
            e = e_set.pop()
            if e in self.e_map :
                continue
            if e.kind == 'struct' :
                e_set |= self.parse_struct(e)
            if e.kind == 'enum' :
                self.parse_enum(e)

    def parse_struct(self, e) :
        s_map = collections.OrderedDict()
        e_set = set()
        for f in e.fields :
            name, m = f
            if m.type.kind == 'array' :
                s_map[name] = (m.type.item.cname, m.type.length, m.offset)
            else :
                s_map[name] = (m.type.cname, 0, m.offset)
            if m.type.kind != 'primitive' :
                e_set.add(m.type)
        self.e_map[e.cname] = s_map
        return e_set

    def parse_enum(self, e) :
        self.e_map[e.cname] = e.relements

if __name__ == '__main__' :
    u = ExtraFace(Path("test.preprocessed.h"))
    u.parse("main_struct_T")
    Path("e.json").save(u.e_map, verbose=True)