Is there an OCaml tool that allows filtering comments in source files, similar to gcc -E
?
Ideally, I'm looking for something that will remove everything but comments, but the other way around would also be useful.
For instance, if there is a way to use camlp4/campl5/ppx to obtain OCaml comments (including non-OCamldoc comments defined with a single asterisk), I would like to know. I haven't had much success looking for comment nodes in Camlp4's AST (though I know it must exist, because there are even bugs related to the fact that Camlp4 modifies their placement).
Here's an example: in the following file:
(*** three asterisks *)
let f () =
Format.printf "end"
let () =
(* one asterisk (* nested comment *) *)
Printf.printf "hello world\n";
(** two asterisks *)
f();
()
I'd like to ideally obtain:
(*** three asterisks *)
(* one asterisk (* nested comment *) *)
(** two asterisks *)
The whitespace between them and the presence or absence of (* *)
are mostly irrelevant, but it should preserve comments of all kinds. My immediate purpose is to be able to filter it to a spell checker, but cleaning comments (i.e. having a filter that strips comments only) could also be useful: I could clean the comments and then use diff
to obtain what has been removed.
Well, there is now a lexer based on ocamlwc that strips everything but the comments in the code, called ocaml-comment-sieve. It is based on the simple lexer used in ocamlwc
.
However, this tool is GPL-licensed (because it is derived from ocamlwc
, which is GPL-licensed), so it cannot be posted here. Still, it does satisfy my requirements, so until someone suggests a better way, I'll consider it as an answer.