From python, I want to fetch the details of structures/arrays/enums defined in C headers: the list of defined types, the list and types of struct members, the names and values defined in enums, the size of arrays, etc.
I don't plan to link a C lib in python, but I wanted to use a battle-tested tool to "parse" C definitions so I picked CFFI and tried the following:
Start with a dummy test.h
file
typedef struct {
int a;
int b[3];
float c;
} other_struct_T;
typedef struct {
bool i;
bool j;
other_struct_T k;
} main_struct_T;
preprocess it once to be sure to resolve #includes, #defines, etc.
gcc -E -P -xc test.h -o test.preprocessed.h
Then load it with CFFI like this
from pathlib import Path
from cffi import FFI
u = FFI()
txt = Path("test.preprocessed.h").read_text()
u.cdef(txt)
k = u.typeof("main_struct_T")
print(k)
print(k.elements)
which prints <ctype 'main_struct_T'>
first.
But fails at the second one (and k seems to contains neither .length, not .item, nor .relements, as one could expect from a ctype instance, as mentioned here)
Traceback (most recent call last):
File "./parse_header.py", line 14, in <module>
print(k.elements)
^^^^^^^^^^^
AttributeError: elements
What do I miss ? How would you do it differently ?
Self reply here! I made progress!
The CFFI documentation didn't helped much :D
using dir()
on objects returned did.
Two options were found, the easiest one is this snippet (more complete answer at the end) :
if k.kind == 'struct' :
for f in k.fields :
name, obj = f
print(name, obj.type, obj.offset)
where k
is obtained exactly as explained in the question. This gives:
i <ctype '_Bool'> 0
j <ctype '_Bool'> 1
k <ctype 'other_struct_T'> 4
recursion can be used to dig for other_struct_T
The other option is derived from another question (Using Python cffi for introspection) and lead to this partial snippet:
for k in u._parser._declarations :
v = u._parser._declarations[k][0]
if isinstance(v, cffi.model.EnumType) :
z[v.name] = list()
print(v.enumerators, v.enumvalues, v.get_c_name(), v.get_official_name())
for name, value in zip(v.enumerators, v.enumvalues) :
z[v.name].append((name, value))
elif isinstance(v, cffi.model.StructType) :
print(v.fldnames, v.fldtypes, v.fldquals, v.fldbitsize)
z[v.name] = list()
for name, ctype, qual, size in zip(v.fldnames, v.fldtypes, v.fldquals, v.fldbitsize) :
z[v.name].append((name, ctype, qual, size))
...
classes are different, methods and properties are different... the information inside should be the same, using u._parser._declarations
feels ugly though
Here is an (unperfect but functionnal) code:
#!/usr/bin/env python3
import collections
import inspect
from cc_pathlib import Path
import cffi
class ExtraFace() :
def __init__(self, h_pth) :
self.ffi = cffi.FFI()
self.ffi.cdef(h_pth.read_text())
self.e_map = dict() # map of types already parsed
def parse(self, name, recurse=False) :
e = self.ffi.typeof(name)
e_set = {e,} # set of types to be parsed
while e_set :
e = e_set.pop()
if e in self.e_map :
continue
if e.kind == 'struct' :
e_set |= self.parse_struct(e)
if e.kind == 'enum' :
self.parse_enum(e)
def parse_struct(self, e) :
s_map = collections.OrderedDict()
e_set = set()
for f in e.fields :
name, m = f
if m.type.kind == 'array' :
s_map[name] = (m.type.item.cname, m.type.length, m.offset)
else :
s_map[name] = (m.type.cname, 0, m.offset)
if m.type.kind != 'primitive' :
e_set.add(m.type)
self.e_map[e.cname] = s_map
return e_set
def parse_enum(self, e) :
self.e_map[e.cname] = e.relements
if __name__ == '__main__' :
u = ExtraFace(Path("test.preprocessed.h"))
u.parse("main_struct_T")
Path("e.json").save(u.e_map, verbose=True)