pythoncbashparsingcmakelists-options

How to convert a cmake header file with comments into a TSV/CSV file?


I have a number of cmake header files from which I would like to extract the comments and the cmakedefine, into a CSV (or TSV) file.

The typical input looks like this:

/**
 * 1st Multi-line brief description of what the following 
 * cmakedefine does.
 *
 * Second more complicated multi-line full description, of <c>SOMETHING</c> to be enabled in the
 * configuration.
 *
 * Possibly additional lines of full description.
 */
#cmakedefine SOMETHING

The 1st step output is to get is this:

SOMETHING
1st Multi-line brief description of what the following cmakedefine does.
Second more complicated multi-line full description , of <c>SOMETHING</c> to be enabled in the configuration. Possibly additional lines of full description.
...

Ultimately the output I am looking to get is this:

SOMETHING, "1st Multi-line brief description of what the following cmakedefine does.", "Second more complicated multi-line full description, of <c>SOMETHING</c> to be enabled in the configuration. Possibly additional lines of full description."

SOMETHING_ELSE, "Brief description", "Long Description"

(The columns headers can be implied to be: cmakedefine, Brief_Description, Long_Description.)

I have tried unsuccessfully to do this in sed which was not a good way to spend my time. I have also tried with awk without success. At this point I don't care what tools to use, and just want to get the job done. But I think maybe Python could be better used for this.

Things to Note:


UPDATE:

The cmake file was more complicated than I fist expected, because:

  1. There are many irrelevant stand-alone comments that has nothing to do with the ones just preceding the #cmakedefine.
  2. There are some comments that are followed by several #cmakedefine.
  3. Sometimes the comments even include the string #cmakedefine, commas and other characters, like <c>.
  4. Sometimes there are single (') and double (") quotes in the comments.

A more complicated file could look like this:

/**
 * Only a "Brief" comment
 */ 
#cmakedefine SIMPLE

/**
 * 1st multi-line Brief description of what the following 
 * cmakedefine does.
 *
 * 2nd more complicated multi-line Full description, of <c>SOMETHING</c> to be enabled in the
 * configuration.
 *
 * [Sometimes] additional paragraph-1 of full description, 
 * going on several lines.
 *
 * [Sometimes] additional paragraph-2 of full "description", 
 * going on several lines. (double quoted)
 *
 * ...
 * [Sometimes] additional paragraph-N of full 'description', 
 * going on several lines. (single quoted)
 */
#cmakedefine SOMETHING

/**
 * Some useless unrelated comment
 */ 

/**
 * 1st Multi-line brief description of what the following 
 * cmakedefine does.
 *
 * Second more complicated multi-line full description, of <c>SOMETHING</c> to be enabled in the
 * configuration.
 *
 * Possibly additional lines of full description.
 */
#cmakedefine SOMETHING_ELSE
#cmakedefine ANOTHER_SOMETHING

Solution

  • Looks nice in AWK, overall the complexity depends on how different inputs there are. Here's something to get you started:

    awk '
    gsub(/^ \* */, "") {
        if ($0 == "/") {
            // end - to nothing
        } else if ($0) {
            comment = comment (comment ? " " : "") $0;
        } else if (!firstline) {
            firstline = comment
            comment = ""
        }
    }
    gsub(/^#cmakedefine /, ""){
        print $0 ", \"" firstline "\", \"" comment "\"";
        fistline = 0
        comment = ""
    }
    ' <<EOF
    /**
     * 1st Multi-line brief description of what the following 
     * cmakedefine does.
     *
     * Second more complicated multi-line full description <c>SOMETHING</c> to be enabled in the
     * configuration.
     *
     * Possibly additional lines of full description.
     */
    #cmakedefine SOMETHING
    EOF
    

    outputs:

    SOMETHING, "1st Multi-line brief description of what the following  cmakedefine does.", "Second more complicated multi-line full description <c>SOMETHING</c> to be enabled in the configuration. Possibly additional lines of full description."
    

    Here is a nice awk tutorial https://www.grymoire.com/Unix/Awk.html

    The next step would be adding some state, like not ouptutting an empty comment, maybe clearing the input more with some regex, and properly quoting for csv or tsv or what you want your format to be and as complexity goes up answering why not python and json.