schemeguiledefine-syntax

Scheme macro where I need an index number for each repeated element


I am trying to make a macro similar to register-groups-bind in the Common Lisp cl-ppcre library. The idea is that you make a regex with groups, and give it a list of variables, statements to execute. It binds each regex group with one of the variables, and then executes the statements with those bindings. For example:

(register-groups-bind 
    (s1 s2 s3) 
    ("(\\w+) +(\\w+) +(\\w+)" "moe  larry  curly")   
    (list s2 s3 s1)) 

This would return ("larry" "curly" "moe"). An additional complication is that in the list of variables, instead of just a variable, you can specify a function to run on the matched string, like (register-groups-bind ((string->number x) ... So that it would run string->number on the matched value before binding it to x. When you specify a function like that, you can specify multiple variables, like: (string->number x y z).

I am trying to set it up where I create a let binding like this:

(let ((s1 (match:substring m 1))
      (s2 (match:substring m 2))
      (s3 (match:substring m 3)))
   ... statements ...

There are two things that I just can't see how to do with any of the Scheme hygienic macro facility. First, how do I handle the fact that the var list can contain a mixture of var names, or lists with a function name and vars. Second, for each let binding, I need a sequence number indicating which regex group to fetch.

I can do this in Guile Scheme with defmacro, like this:

(defmacro regex-group-bind (vars re-match . statements)
    (let* ((bind-var (gensym))
       (converted (convert-var-list vars bind-var)))
     `(let ((,bind-var ,re-match))
        (if (regexp-match? ,bind-var)
          (let (,@converted)
            ,@statements)
          #f))))

Whereconvert-var-list is a function that converts the var list into the list of let bindings. When I call convert-var-list I need to pass in the variable containing the results of the regex match. An example of the output from convert-var-list would be:

scheme@(guile-user)> (define match-var (gensym))
scheme@(guile-user)> (convert-var-list '(foo bar (string->number x y z)) match-var)

$9 = ((foo (match:substring #{ g5228}# 1))
      (bar (match:substring #{ g5228}# 2))
      (x (string->number (match:substring #{ g5228}# 3)))
      (y (string->number (match:substring #{ g5228}# 4)))
      (z (string->number (match:substring #{ g5228}# 5))))
scheme@(guile-user)> 

I would like to be able to do this with the hygienic macros, and I am assuming that I probably need to use syntax-case, and maybe still have a function like convert-var-list, but I'm not sure if that is cheating. Would I need to use syntax->datum and datum->syntax in order to do that?


Solution

  • You can implement a Scheme version of register-groups-bind using syntax-rules macros, but I think it's better in syntax-case. The following uses a helper macro that builds up a list of let assignments, consuming one variable from the given list at a time. It has cases for the syntax that calls a function on the match group string and binds that result, and for just a bare variable name. It doesn't have the cl-ppcre behavior of a nil variable just skipping binding the corresponding match group to anything, though (But it wouldn't be hard to add, using #f instead since this is Scheme). That's left as an exercise for the reader, with the hint that syntax patterns can include literal values.

    (use-modules (ice-9 regex))
    
    (define-syntax rgb-helper
      (lambda (stx)
        (syntax-case stx ()
          ((_ () (transformed ...) match counter regex target-string body ...) ; used up all variable bindings
           #'(let ((match (string-match regex target-string)))
               (when match
                 (let (transformed ...) body ...))))
          ((_ ((fn) bindings ...) (transformed ...) match counter regex target-string body ...) ; no more variables to wrap in a function; on to the next binding
           #'(rgb-helper (bindings ...) (transformed ...) match counter regex target-string body ...))
          ((_ ((fn var vars ...) bindings ...) (transformed ...) match counter regex target-string body ...) ; bind a variable to the result of passing the group to a function
           #`(rgb-helper ((fn vars ...) bindings ...)
                         ((var (fn (match:substring match counter))) transformed ...)
                         match #,(+ (syntax->datum #'counter) 1) regex target-string body ...))
          ((_ (var bindings ...) (transformed ...) match counter regex target-string body ...) ; bare variable binding
           #`(rgb-helper (bindings ...)
                         ((var (match:substring match counter)) transformed ...)
                         match #,(+ (syntax->datum #'counter) 1) regex target-string body ...)))))
    
    (define-syntax register-groups-bind
      (lambda (stx)
        (syntax-case stx ()
          ((_ (bindings ...) (regex target-string) body ...)
           #'(rgb-helper (bindings ...) () match 1 regex target-string body ...)))))
    
    
    ;;; examples of use
    (write (register-groups-bind (first second third fourth)
                                 ("((a)|(b)|(c))+" "abababc")
                                 (list first second third fourth)))
    (newline)
    
    (write (register-groups-bind (first (string->number a b))
                                 ("(.)(.)(.)" "a12")
                                 (list first a b)))
    (newline)
    

    The trick here, and the purpose for using syntax-case, is each time a variable is assigned, the counter of the current match group is incremented directly as part of the macro expansion; a syntax-rules one using the same basic idea, but replacing #,(+ (syntax->datum #'counter) 1) bits with (+ counter 1) would have a lot of (match:substring match (+ (+ (+ 1 1) 1) 1) etc. calls, because syntax-rules doesn't let you directly execute code during the expansion.