regexwikipediainfoboxwikitext

RegEx to format Wikipedia's infoboxes code


I am a contributor to Wikipedia and I would like to make a script with AutoHotKey that could format the wikicode of infoboxes and other similar templates.

Infoboxes are templates that displays a box on the side of articles and shows the values of the parameters entered (they are numerous and they differ in number, lenght and type of characters used depending on the infobox).

Parameters are always preceded by a pipe (|) and end with an equal sign (=). On rare occasions, multiple parameters can be put on the same line, but I can sort this manually before running the script.

A typical infobox will be like this:

{{Infobox XYZ
 | first parameter  = foo
 | second_parameter = 
 | 3rd parameter    = bar
 | 4th              = bazzzzz
 | 5th              = 
 | etc.             = 
}}

But sometime, (lazy) contributors put them like this:

{{Infobox XYZ
|first parameter=foo
|second_parameter= 
|3rd parameter=bar
|4th=bazzzzz
|5th= 
|etc.= 
}}

Which isn't very easy to read and modify.

I would like to know if it is possible to make a regex (or a serie of regexes) that would transform the second example into the first.

The lines should start with a space, then a pipe, then another space, then the parameter name, then any number of spaces (to match the other lines lenght), then an equal sign, then another space, and if present, the parameter value.

I try some things using multiple capturing groups, but I'm going nowhere... (I'm even ashamed to show my tries as they really don't work).

Would someone have an idea on how to make it work?

Thank you for your time.


Solution

  • I got an answer on AutoHotKey forums:

    ^i::
    out := ""
    Send, ^x
    regex := "O)\s*\|\s*(.*?)\s*=\s*(.*)", width := 1
    Loop, Parse, Clipboard, `n, `r
        If RegExMatch(A_LoopField, regex, _)
            width := Max(width, StrLen(_[1]))
    Loop, Parse, Clipboard, `n, `r
        If RegExMatch(A_LoopField, regex, _)
            out .= Format(" | {:-" width "} = {2}", _[1],_[2]) "`n"
    else
        out .= A_LoopField "`n"
    Clipboard := out
    Send, ^v
    Return
    

    With this script, pressing Ctrl+i formats the infobox code just right (I guess a simple regex isn't enough to do the job).