I'm trying to build parser combinators following Hutton and Meijer, "Monadic Parser Combinators". My implementation is in PostScript, but I think my issue is general to combinator parsers and not my specific implementation.
As a small exercise, I'm using the parsers to recognize regular expressions.
(pc9.ps)run
/Dot (.) char def
/Meta (*+?) anyof def
/Character (*+?.|()) noneof def
/Atom //Dot
//Character plus def
/Factor //Atom //Meta maybe seq def
/Term //Factor //Factor many seq def
/Expression //Term (|) char //Term xthen many seq def
/regex { string-input //Expression exec ps } def
(abc|def|ghi) regex
quit
It's working, but the output has lots of []
empty arrays that really get in the way when I try to bind
handlers to process the values.
$ gsnd -q -dNOSAFER pc9re2.ps
stack:
[[[[[97 []] [[98 []] [[99 []] []]]] [[[100 []] [[101 []] [[102 []]
[]]]] [[[103 []] [[104 []] [[105 []] []]]] []]]] null]]
These are happening whenever a seq
sequencing combinator accepts the result from maybe
or many
(which uses maybe
) that had zero occurrences.
What is the normal way of excluding this extra noise in the output with Parser Combinators?
Sigh. It's seems I can just implement around it. I added special code in seq
to detect an empty right-hand-side and just discard it. On to other problems...
Edit: I encountered the same problem again in version 11 (and a half). Now I've got a better solution IMO:
https://groups.google.com/g/comp.lang.functional/c/MbJxrJSk8Mw/m/MoT3Dr0IAwAJ
Ugh. I think it wasn't even an X/Y problem. It was a "doctor it hurts when I move my arm like this; ... so don't move your arm like that" problem.
I want the "result" part of the "reply" structure (using new terms following usage from the Parsec document) to be any of the /usual/ PostScript types: integer, real, string, boolean, array, dictionary.
But I also need some way to arbitrarily combine or concatenate two objects regardless of type. My
then
(akaseq
) combinator needs to do this. So I made a hack-y function that does the combining. If it has two arrays, it composes the contents into a longer array. If it has one array and some other object it extends the array by one and stuffs the object in the front or back as appropriate. If it has two non-array objects it makes a new 2-element array to contain them.So, instead of building
xthen
andthenx
off ofthen
and needing to cons, car, and cdr the stuff, I can write all 3 of these as a more general parameterized function.sequence{ p q u }{ { /p exec +is-ok { next x-xs force /q exec +is-ok { next x-xs 3 1 roll /u exec exch consok }{ x-xs 3 2 roll ( after ) exch cons exch cons cons } ifelse } if } ll } @func then { {append} sequence } xthen { {exch pop} sequence } thenx { {pop} sequence } append { 1 index zero eq { exch pop }{ dup zero eq { pop }{ 1 index type /arraytype eq { dup type /arraytype eq { compose }{ one compose } ifelse }{ dup type /arraytype eq { curry }{ cons } ifelse } ifelse } ifelse } ifelse }
(
@func
is my own non-standard extension to PostScript that takes a procedure body and list of parameters and wraps the procedure with code that defines the arguments in a local dictionary.ll
is my hack-y PostScript way of making lambdas with hard-patched parameters, it's short forload all literals.
)
The code also treats executable arrays (ie. PostScript procedures) as a non-array for the purpose of combining sequences of results. This allows the parser to be used as a syntax-directed compiler producing procedures as output.