cparsingrustabstract-syntax-tree

Rust Crates that generates AST from C source file without performing error check


I am trying to write a silly tool using Rust that transforms C code like this

int sys_close(int fd) {
  file_t *f = proc_getfile(proc_curr(), fd);
  if (!f){
    return -1;
  }else{
    fclose(f);
    proc_curr()->files[fd] = NULL;
    return 0;
  }
}

to this

int sys_close(int fd) {
  return ({
    file_t *f = proc_getfile(proc_curr(), fd);
    !f ? -1 : (fclose(f), proc_curr()->files[fd] = NULL, 0);
  });
}

To make this transformation, I want to first generate an AST from C source code, then reorder elements in the AST, and finally print transformed code by visiting the tree.

I do not want to write a parser myself, so I am looking for a Rust crate to generate the AST for me. Becasue this tool only cares about tranformation of code in a small range of code, so it's better that the crate does not perform semantic checks and only focus on generating an AST.

I tried lang-c, but it seemed to performed some kind of type check, because undeclared types such as uint32_t is regarded as a SyntaxError. Also lang-c uses either clang or gcc as its frontend so include file will be checked. I want a library that does not perform any kind of check or try to read include files, only to parse input source code and generates an AST. I want to know whether there is a crate matches my need.

Thanks in advance.


Solution

  • If you want to parse complex grammars the C programming language, I suggest you try using ANTLR4. It is a popular grammar/parser library and it has been ported to most of the major programming languages. There is an unofficial Rust port of ANTLR4 called antlr_rust. The code is a bit rough around the edges, but it does the job.

    One benefit of using this approach is the ANTLR4 maintainers have an example grammar for C. It will probably be out of date, but it should give you a decent head start.

    However, one downside is that ANTLR was originally created in Java following object oriented design principals. This means that it does not make use of features like rust's enums or pattern matching. If Rust is not a design requirement, I would suggest doing this project in Java or Python as it will make the process a fair bit easier.