I'm trying to use Lua pattern matching in a Wikipedia module to locate instances of Mediawiki parameter syntax (e.g. {{{parameter1-a|defaultValue}}}
or {{{parameter1-a|{{{alias1-a|defaultValue}}}}}}
) so they can be converted into Lua-compatible argument syntax. (Yes, I am fully aware that using pattern matching for this is an unforgivable crime against humanity, but whatever.)
So far, I have this relatively simple pattern, which works fine for the most part:
"{{{([^{}<>|]+)(|?([^{}|]*))}}}"
(Regex equivalent [hopefully], if you want to test on regex101: /{{{([^{}<>|]+)(?:\|([^{}|]+)?)?}}}/g
)
However, this can't properly match anything in the "default" part that itself contains curly braces, so I can't include aliases for the parameter or template wikitext in the default. More specifically:
{{{parameter|{{{alias|default}}}}}}
, just match/capture {{{parameter|{{{alias
|default
}}}}}}."{{{([^{}<>|]+)(|?([^{}|]*))}}}"
) will succeed with {{{parameter|{{{alias}}}}}}
, yielding {{{parameter
|{{{alias}}}
}}} as intended, but with a default on the alias it'll give {{{parameter|{{{alias
|default}}}
}}}"{{{([^{}<>|]+)(|?(.*))}}}"
) works perfectly with one parameter, but with two it'll "spill" if the first has a default: {{{parameter1|default}}} {{{parameter2}}}
will yield {{{parameter1
|default}}} {{{parameter2
}}}How do I solve this?
This seems like a perfect use case for Lua's special bracket matching pattern item %b
! Using %b{}
, you can match a pair of matching curly braces. By surrounding this with two curly braces on each side, you can match three pairs of curly braces.
Given your test cases:
local text = [[
lorem ipsum dolor sit amet
{{{blarg}}}
lorem ipsum dolor sit amet
{{{blarg|default}}}
lorem ipsum dolor sit amet
{{{parameter1-a|{{{alias1-a|defaultValue}}}}}}
]]
and using the pattern {{%b{}}}
in gmatch
:
for match in text:gmatch"{{%b{}}}" do
print(match)
end
you get
{{{blarg}}}
{{{blarg|default}}}
{{{parameter1-a|{{{alias1-a|defaultValue}}}}}}
as expected. You can then further process this parameter:
local content = match:sub(4, -4) -- cut off curly braces
local param, default = match:match"^([^|]+)|([^|]+)$"
if not param then param = content end -- no default
(I've simplified your pattern a bit here, this isn't exactly equivalent; it is more permissive)