I am doing my dissertation and I have to parse and tokenize the source code into individual functions. For every function I would like to extract names of types, called function names and type casts. Is the clang the right tool for that kind of job? If yes how can i do this?
Below is a simple C function. With bold are the extracted items I want:
static char func1(unsigned int a, struct foo *b) { int c = 0; struct bar *d; if (a == 0) { d = func2((int) a); } else { c = func3((struct bar *) b); } return c; }
Yes, Clang is the right tool to do this job.
You should take a look a libclang
.
You can find enough information on internet, but I personally can recommend two great articles:
Parsing C++ in Python with Clang by Eli Bendersky
Introduction to libclang by Mike Ash
If you prefer to watch videos, then I can recommend to look at the presentation on libclang
here: 2010 LLVM Developers' Meeting, look for libclang: Thinking Beyond the Compiler