I am trying to make a macro similar to register-groups-bind in the Common Lisp cl-ppcre library. The idea is that you make a regex with groups, and give it a list of variables, statements to execute. It binds each regex group with one of the variables, and then executes the statements with those bindings. For example:
(register-groups-bind
(s1 s2 s3)
("(\\w+) +(\\w+) +(\\w+)" "moe larry curly")
(list s2 s3 s1))
This would return ("larry" "curly" "moe")
. An additional complication is that in the list of variables, instead of just a variable, you can specify a function to run on the matched string, like (register-groups-bind ((string->number x) ...
So that it would run string->number on the matched value before binding it to x. When you specify a function like that, you can specify multiple variables, like:
(string->number x y z)
.
I am trying to set it up where I create a let binding like this:
(let ((s1 (match:substring m 1))
(s2 (match:substring m 2))
(s3 (match:substring m 3)))
... statements ...
There are two things that I just can't see how to do with any of the Scheme hygienic macro facility. First, how do I handle the fact that the var list can contain a mixture of var names, or lists with a function name and vars. Second, for each let binding, I need a sequence number indicating which regex group to fetch.
I can do this in Guile Scheme with defmacro, like this:
(defmacro regex-group-bind (vars re-match . statements)
(let* ((bind-var (gensym))
(converted (convert-var-list vars bind-var)))
`(let ((,bind-var ,re-match))
(if (regexp-match? ,bind-var)
(let (,@converted)
,@statements)
#f))))
Where
convert-var-list is a function that converts the var list into the list of let
bindings. When I call convert-var-list
I need to pass in the variable containing the results of the regex match. An example of the output from convert-var-list
would be:
scheme@(guile-user)> (define match-var (gensym))
scheme@(guile-user)> (convert-var-list '(foo bar (string->number x y z)) match-var)
$9 = ((foo (match:substring #{ g5228}# 1))
(bar (match:substring #{ g5228}# 2))
(x (string->number (match:substring #{ g5228}# 3)))
(y (string->number (match:substring #{ g5228}# 4)))
(z (string->number (match:substring #{ g5228}# 5))))
scheme@(guile-user)>
I would like to be able to do this with the hygienic macros, and I am assuming that I probably need to use syntax-case
, and maybe still have a function like convert-var-list
, but I'm not sure if that is cheating. Would I need to use syntax->datum
and datum->syntax
in order to do that?
You can implement a Scheme version of register-groups-bind
using syntax-rules
macros, but I think it's better in syntax-case
. The following uses a helper macro that builds up a list of let
assignments, consuming one variable from the given list at a time. It has cases for the syntax that calls a function on the match group string and binds that result, and for just a bare variable name. It doesn't have the cl-ppcre
behavior of a nil
variable just skipping binding the corresponding match group to anything, though (But it wouldn't be hard to add, using #f
instead since this is Scheme). That's left as an exercise for the reader, with the hint that syntax patterns can include literal values.
(use-modules (ice-9 regex))
(define-syntax rgb-helper
(lambda (stx)
(syntax-case stx ()
((_ () (transformed ...) match counter regex target-string body ...) ; used up all variable bindings
#'(let ((match (string-match regex target-string)))
(when match
(let (transformed ...) body ...))))
((_ ((fn) bindings ...) (transformed ...) match counter regex target-string body ...) ; no more variables to wrap in a function; on to the next binding
#'(rgb-helper (bindings ...) (transformed ...) match counter regex target-string body ...))
((_ ((fn var vars ...) bindings ...) (transformed ...) match counter regex target-string body ...) ; bind a variable to the result of passing the group to a function
#`(rgb-helper ((fn vars ...) bindings ...)
((var (fn (match:substring match counter))) transformed ...)
match #,(+ (syntax->datum #'counter) 1) regex target-string body ...))
((_ (var bindings ...) (transformed ...) match counter regex target-string body ...) ; bare variable binding
#`(rgb-helper (bindings ...)
((var (match:substring match counter)) transformed ...)
match #,(+ (syntax->datum #'counter) 1) regex target-string body ...)))))
(define-syntax register-groups-bind
(lambda (stx)
(syntax-case stx ()
((_ (bindings ...) (regex target-string) body ...)
#'(rgb-helper (bindings ...) () match 1 regex target-string body ...)))))
;;; examples of use
(write (register-groups-bind (first second third fourth)
("((a)|(b)|(c))+" "abababc")
(list first second third fourth)))
(newline)
(write (register-groups-bind (first (string->number a b))
("(.)(.)(.)" "a12")
(list first a b)))
(newline)
The trick here, and the purpose for using syntax-case
, is each time a variable is assigned, the counter of the current match group is incremented directly as part of the macro expansion; a syntax-rules
one using the same basic idea, but replacing #,(+ (syntax->datum #'counter) 1)
bits with (+ counter 1)
would have a lot of (match:substring match (+ (+ (+ 1 1) 1) 1)
etc. calls, because syntax-rules
doesn't let you directly execute code during the expansion.