macroscommon-lispinfix-notationinfix-operatorreader-macro

Common Lisp Read Macro for "lazy infix or," to destructure keywords


I have a Common Lisp reader macro to parse lazy/delayed declarations of an "or" relation, using infix syntax separated by pipe chacaters ("|") as well as standard list parentheses and keyword literals. Consider the form (:a :b|:c) -- it represents a 2 part tuple where the first element is certainly :a, the second element is either :b or :c. One can infer for instance that valid forms for the whole tuple are (:a :b) or (:a :c).

I have function-encapsulated logic already to destructure these tuple list forms subsequent to the read macro. But at read time I need to parse a form like :a|:b|:c and tag it with the pipes removed, like (:lazy-or :a :b :c). Use of infix syntax is purely for the reader-facing forms; infix forms are ephemeral and are discarded as soon as possible, immediately at the read stage, in favor of equivalent legal lisp forms tagged with :lazy-or.

So I made a read macro that is nearly working as I'd like it, but presently an extra pipe needs to be used before the first or-form keyword element as a sort of reader sigil (I'd like that to be not necessary at all), and it can't presently infer similar forms using either nested-parentheses or splice-notation as equivalent (like in arithmetic, 2+(3*4) being the same order-of-operations, an equivalent form, as 2+3*4).

The macro (derived from "slash reader" here: http://www.lispworks.com/documentation/HyperSpec/Body/f_rd_rd.htm):

 (defun pipe-reader (stream char)                                                                   
   (declare (ignore char))                                                                          
   `(:lazy-or .  ,(loop for dir = (read stream t nil t)                       
                   then (progn (read-char stream t nil t)                                           
                               (read stream t nil t))                         
                   collect dir                                                                      
                   while (eql (peek-char nil stream nil nil t) #\|))))                              
(set-macro-character #\| #'pipe-reader)  

The intended goal is for this rewrite rule to be applied/substituted at read time: (macroexpand '(:a (:b | :c))) -> (:a (:lazy-or :b :c)) or, the same for the variant form not included in parenthesis (a sort of default order-of-operations; spaces won't make a difference): (macroexpand '(:a :b|:c)) -> (:a (:lazy-or :b :c)) (macroexpand '(:a :b | :c)) -> (:a (:lazy-or :b :c)) And nested forms should render intuitively, recursively: (macroexpand '(:a (:b | (:c | :d)))) -> (:a (:lazy-or :b (:lazy-or :c :d)))

Note that in the intended rewrite rule of the basic form -- (macroexpand '(:a (:b | :c))) -> (:a (:lazy-or :b :c)) -- :a will not get joined in the or-form as it has no infix pipe operator to join it there; it exists alongside the result of the or-form in a sort of tuple. As noted, this is such that upon further evaluation of the lazy form the tuple conceivably could yield (:a :b) or (:a :c) - the above implies that both are valid forms to de-structure later.

I am very close.

The problem is I can't get that quite, only the following (with the above macro): (macroexpand '(:a |:b|:c )) -> (:A (:LAZY-OR :B :C)) (macroexpand '(:a :b |:d|:e|:f|:g)) -> (:A :B '(:LAZY-OR :D :E :F :G)) It actually does most of what I intend it to do, it's a functional basic solution: slightly modifying the rules of execution to allow an extra pipe at the opening of an infix or-form at read time, without joining the form to the left of that pipe, thus it works at making :b through :c (first form), or :d through :g (second form), into valid cases listed for the :lazy-or form, and making that inner list itself a member of the outer list/tuple alongside non-variant values :a and :b. It's close to what I'd like: (macroexpand '(:a :b :d|:e|:f|:g)) -> should, doesn't -> (:A :B '(:LAZY-OR :D :E :F :G))

There are 3 bugs, in order of importance:

  1. Extra "prefix" pipe needed

    The extra opening pipe, at the start of each or-form being read, I'd like to do without. I want to cut that pipe for cleanliness and my own preferences at readability, and only use the pipe in a fully infix position for this read macro.

  2. Extra parentheses added to nested forms on recursive parse

    It adds extra parentheses to nested forms, though it otherwise handles nested forms well recursively.

  3. Whitespace not accepted arbitrarily

    It doesn't accept spaces between pipes. I tried using read-preserving-whitespace instead of read, to no avail here. It should accept spaces between pipes and keyword forms as if they weren't there, as the forms :a|:b and :a | :b are equivalent.

The read macro does mostly encapsulate the working logic, and is good at recursion and nested forms: (macroexpand '(:a |(|:b|:c)|(|:e|:f))) -> yields -> (:A (:LAZY-OR ((:LAZY-OR :B :C)) ((:LAZY-OR :E :F))))
;(almost perfect, recursively expands exactly as intended, the only issue in this form besides needing the opening pipe is the double parentheses generated around the final :lazy-or forms). So that one would be generated the same from this form (currently an "unbalanced" read parentheses error): (macroexpand '(:a (:b | :c) | (:e | :f))) -> should, does not yield -> (:A (:LAZY-OR (:LAZY-OR :B :C) (:LAZY-OR :E :F)))

Aside from the extra parentheses added by the read macro, and its failure to allow spacing in the infix form, the really crucial bug is not being able to write the or-forms as infix-piped forms without including the first, non-infix pipe (prefix incidentally). I have really run up against a brick wall trying to match the stream without needing to use the first pipe character as a sort of sigil for the read parser. Perhaps an extra call or two to one of the "peek" functions would give me a more specialized form to match against, but I haven't been able to figure out just how to parse that.

I looked at building this around existing and comprehensive solutions such as NKF ("definfix" macro, https://www.cliki.net/infix) and CMU infix (https://github.com/rigetti/cmu-infix/blob/master/src/cmu-infix.lisp), but as those are more generalized and large codebases, I don't think I'll need to reuse infix logic for more kinds of forms/operators, just this one. And from how close I am on a rather small macro, I'd definitely prefer to nail it with a small and succinct solution, provided it's still recursive and not error-prone.

Any perspective on using read macros more effectively for this purpose would be much appreciated.


Solution

  • I would try a different approach and use a different character for list of alternatives, like [a|b], so that you don't need to have a prefix bar.

    For example:

    ;; a bar is now the symbol 'or
    (set-macro-character #\| (constantly 'or))
    
    ;; read a list of forms for alternative forms
    (set-macro-character #\[
                         (lambda (stream char)
                           (declare (ignore char))
                           `(lazy-or-parse
                             ,@(read-delimited-list #\] stream t))))
    
    ;; this one is required too so that e.g. `a]` is not read as a single symbol.
    (set-macro-character #\]
                         (lambda (stream char)
                           (declare (ignore char))
                           (error "Unmatched closing bracket")))
    

    With the above definition, the form below:

    [:a|:b|:c]
    

    Is read as:

    (LAZY-OR-PARSE :A OR :B OR :C)
    

    It is expected that lazy-or-parse is a macro. Its role is to interpret a list of tokens, like a regular parser; you should be able to group them as expressions (e.g. like a Pratt parser with defined associativities/priorities).

    Or, you could still do that parsing in the reader macro, but I personally prefer to do the least possible work at this stage.

    NB. you need to tweak Emacs to make it understand this syntax, for tests I use read-from-string:

    (read-from-string "[a b (c|d)]")
    (LAZY-OR-PARSE A B (C OR D))
    11
    

    I looked a bit and this is what I changed in the syntax-table (this is emacs-lisp code):

    (modify-syntax-entry ?\[ "(]" lisp-mode-syntax-table)
    (modify-syntax-entry ?\] ")[" lisp-mode-syntax-table)
    (modify-syntax-entry ?| "_" lisp-mode-syntax-table)
    

    With these modifications, the syntax seems to be recognized by the editor, I can even write this, put the cursor after the closing bracket, eval the last-sexp and it correctly find the beginning of the expression:

    '[ a | (c | d)]
    => (LAZY-OR-PARSE A OR (C OR D))