I'm handling Mediawiki markup with Javascript. I'm trying to remove certain parameters. I'm having trouble getting to exactly the text, and only the text, that I want to remove.
Simplified down, the template text can look something like this:
{{TemplateX
| a =
Foo bar
Blah blah
Fizbin foo[[domain:blah]]
Ipsum lorem[[domain:blah]]
|b =1
|c = 0fillertext
|d = 1alphabet
| e =
| f = 10: One Hobbit
| g = aaaa, bbbb, cccc, dddd
|h = 15000
|i = -15000
| j = Level 4 [[domain:filk|Songs]]
| k =7 fizbin, 8 [[domain:trekkies|Shatners]]
|l =
|m =
}}
The best I've come up with so far is
/\|\s?(a|b|d|f|j|k|m)([^][^\n\|])+/gm
Updated version:
/\|\s?(a|b|d|f|j|k|m)(?:[^\n\|]|[.\n])+/gm
which gives (with the updated regexp):
{{TemplateX
|c = 0fillertext
| e =
| g = aaaa, bbbb, cccc, dddd
|h = 15000
|i = -15000
|Songs]]
|Shatners]]
|l =
But what I'm trying to get is:
{{TemplateX
|c = 0fillertext
| e =
| g = aaaa, bbbb, cccc, dddd
|h = 15000
|i = -15000
|l =
}}
I can deal with the extraneous newlines, but I still need to make sure that '|Songs]]' and '|Shatners]]' are also matched by the regexp.
Regarding Tgr's comment below,
For my purposes, it is safe to assume that every parameter starts on a new line, where | is the first character on the line, and that no parameter definition includes a | that isn't within a [[foo|bar]] construct. So '\n|' is a safe "start" and "stop" sequence. So the question boils down to, for any given params (a,b,d,f,j,k, and m in the question), I need a regex that matches 'wanted param' in the following:
| [other param 1] = ...
| [wanted param] = possibly multiple lines and |s that aren't after a newline
| [other param 2]
You can try this below - it is matching on the variables you want to include, not those you want to exclude:
(^{{TemplateX)|\|\s*(c|e|g|h|i|l[ ]*\=[ ]*)(.*)|(}}$)
I enhanced it to this which I think is a bit better if you compare the two regexes using the diagram tool at regexper.com:
(^{{TemplateX)|(\|[ ]*)(c|e|g|h|i|l)([ ]*\=[ ]*)(.*)|(}}$)
Further to the comments, the regex to match the unwanted parameters is this:
\|[ ]?(a|b|d|f|j|k|m)([ ]*\=[ ]*)((?![\r\n]+\|)[0-9a-zA-Z, \[\]:\|\r\n\t])+
Leveraging this answer - it uses a negative lookahead to only match upto [\r\n]+\|
which will in part satisfy the statement that:
So '\n|' is a safe "start" and "stop" sequence
Tested here with the introduction of a few newlines in the parameters to be retained (e.g. g
).
The visual explanation:
There is a risk that you may have a parameter value with a character other than
[0-9a-zA-Z, \[\]:\|\r\n\t]
To solve that you would need to update that list.