flex-lexerjflex

Explanation of JFlex Block Comment rule


I was looking on how to implement block comments in JFlex for custom language support in intellij and found that it can be described as

"/*" !([^]* "*/" [^]*) ("*/")?

I don't quite understand how to read this and would like it if it were explained in plain English.

At the moment I'm reading this as


Solution

  • You've basically deciphered it correctly. Here's a few explanatory notes:

    [^]* matches an arbitrary sequence of characters. It's similar to .* except that . doesn't match newlines or unpaired surrogates; [^] matches absolutely anything.

    So ([^]* "*/" [^]*) matches any sequence which includes */. And therefore !([^]* "*/" [^]*) matches anything except a sequence containing */. In other words, it matches anything up to but not including */, which is the rest of the comment.

    Now what happens if the user makes a mistake and forgets to close the last comment? In that case, there is no */ and will match up to the end of input. Since there's no way to know where the comment should have ended (without being able to read the programmer's mind), the best we can do is to stop trying to parse. Thus, we accept the unterminated comment as a comment. That's why the final "*/"? is optional. It will match the comment terminator if there is one, and otherwise it will match an empty sequence at the end of the input.