parsingcode-generationpeg

Implement heredocs with trim indent using PEG.js


I working on a language similar to ruby called gaiman and I'm using PEG.js to generate the parser.

Do you know if there is a way to implement heredocs with proper indentation?

xxx =  <<<END
       hello
       world
       END

the output should be:

"hello
world"

I need this because this code doesn't look very nice:

def foo(arg) {
  if arg == "here" then
     return <<<END
xxx
  xxx
END
  end
end

this is a function where the user wants to return:

"xxx
  xxx"

I would prefer the code to look like this:

def foo(arg) {
  if arg == "here" then
     return <<<END
            xxx
              xxx
            END
  end
end

If I trim all the lines user will not be able to use a string with leading spaces when he wants. Does anyone know if PEG.js allows this?

I don't have any code yet for heredocs, just want to be sure if something that I want is possible.

EDIT:

So I've tried to implement heredocs and the problem is that PEG doesn't allow back-references.

heredoc = "<<<" marker:[\w]+ "\n" text:[\s\S]+ marker {
    return text.join('');
}

It says that the marker is not defined. As for trimming I think I can use location() function


Solution

  • Here is the implementation of heredocs in Peggy successor to PEG.js that is not maintained anymore. This code was based on the GitHub issue.

    heredoc = "<<<" begin:marker "\n" text:($any_char+ "\n")+ _ end:marker (
        &{ return begin === end; }
      / '' { error(`Expected matched marker "${begin}", but marker "${end}" was found`); }
    ) {
        const loc = location();
        const min = loc.start.column - 1;
        const re = new RegExp(`\\s{${min}}`);
        return text.map(line => {
            return line[0].replace(re, '');
        }).join('\n');
    }
    any_char = (!"\n" .)
    marker_char = (!" " !"\n" .)
    marker "Marker" = $marker_char+
    
    _ "whitespace"
      = [ \t\n\r]* { return []; }
    

    EDIT: above didn't work with another piece of code after heredoc, here is better grammar:

    { let heredoc_begin = null; }
    
    heredoc = "<<<" beginMarker "\n" text:content endMarker {
        const loc = location();
        const min = loc.start.column - 1;
        const re = new RegExp(`^\\s{${min}}`, 'mg');
        return {
            type: 'Literal',
            value: text.replace(re, '')
        };
    }
    __ = (!"\n" !" " .)
    marker 'Marker' = $__+
    beginMarker = m:marker { heredoc_begin = m; }
    endMarker = "\n" " "* end:marker &{ return heredoc_begin === end; }
    content = $(!endMarker .)*