functional-programmingpostscriptparser-combinators

Suppress empty result from 'many' in 'seq(p, many(p))' construct with parser combinators


I'm trying to build parser combinators following Hutton and Meijer, "Monadic Parser Combinators". My implementation is in PostScript, but I think my issue is general to combinator parsers and not my specific implementation.

As a small exercise, I'm using the parsers to recognize regular expressions.

(pc9.ps)run

/Dot         (.) char         def
/Meta        (*+?) anyof      def
/Character   (*+?.|()) noneof def

/Atom        //Dot
             //Character  plus  def
/Factor      //Atom  //Meta maybe  seq   def
/Term        //Factor  //Factor many  seq  def
/Expression  //Term  (|) char //Term xthen  many  seq  def

/regex { string-input //Expression exec ps } def

(abc|def|ghi) regex 

quit

It's working, but the output has lots of [] empty arrays that really get in the way when I try to bind handlers to process the values.

$ gsnd -q -dNOSAFER pc9re2.ps
stack:
[[[[[97 []] [[98 []] [[99 []] []]]] [[[100 []] [[101 []] [[102 []]
[]]]] [[[103 []] [[104 []] [[105 []] []]]] []]]] null]]

These are happening whenever a seq sequencing combinator accepts the result from maybe or many (which uses maybe) that had zero occurrences.

What is the normal way of excluding this extra noise in the output with Parser Combinators?

github repo


Solution

  • Sigh. It's seems I can just implement around it. I added special code in seq to detect an empty right-hand-side and just discard it. On to other problems...

    Edit: I encountered the same problem again in version 11 (and a half). Now I've got a better solution IMO:

    https://groups.google.com/g/comp.lang.functional/c/MbJxrJSk8Mw/m/MoT3Dr0IAwAJ

    Ugh. I think it wasn't even an X/Y problem. It was a "doctor it hurts when I move my arm like this; ... so don't move your arm like that" problem.

    I want the "result" part of the "reply" structure (using new terms following usage from the Parsec document) to be any of the /usual/ PostScript types: integer, real, string, boolean, array, dictionary.

    But I also need some way to arbitrarily combine or concatenate two objects regardless of type. My then (aka seq) combinator needs to do this. So I made a hack-y function that does the combining. If it has two arrays, it composes the contents into a longer array. If it has one array and some other object it extends the array by one and stuffs the object in the front or back as appropriate. If it has two non-array objects it makes a new 2-element array to contain them.

    So, instead of building xthen and thenx off of then and needing to cons, car, and cdr the stuff, I can write all 3 of these as a more general parameterized function.

    sequence{ p q u }{
      { /p exec +is-ok {
          next x-xs force /q exec +is-ok {
            next x-xs 3 1 roll /u exec exch consok
          }{
            x-xs 3 2 roll ( after ) exch cons exch cons cons
          } ifelse
        } if } ll }  @func
    then { {append} sequence }
    xthen { {exch pop} sequence }
    thenx { {pop} sequence }
    
    append { 1 index zero eq { exch pop }{
                      dup zero eq { pop }{
             1 index type /arraytype eq {
                 dup type /arraytype eq { compose }{ one compose } ifelse
             }{ dup type /arraytype eq { curry }{ cons } ifelse } ifelse } ifelse } ifelse }
    

    (@func is my own non-standard extension to PostScript that takes a procedure body and list of parameters and wraps the procedure with code that defines the arguments in a local dictionary. ll is my hack-y PostScript way of making lambdas with hard-patched parameters, it's short for load all literals.)

    The code also treats executable arrays (ie. PostScript procedures) as a non-array for the purpose of combining sequences of results. This allows the parser to be used as a syntax-directed compiler producing procedures as output.