regexgithubmarkdown

Regex for Markdown Table Syntax?


I'm currently developing a tool that allows me to parse Github wikis; I'm trying to add support for Markdown tables, which are not supported by the parser I'm using.

I'm a bit stuck with the complicated table syntax. The official specification is here:

| Left align | Right align | Center align |
|:-----------|------------:|:------------:|
| This       |        This |     This     |
| column     |      column |    column    |
| will       |        will |     will     |
| be         |          be |      be      |
| left       |       right |    center    |
| aligned    |     aligned |   aligned    |

As you can see there's some structure but some parts are entirely optional.

I would like a regex that would capture the header (first line), the column alignment data (second line) and actual content as separate groups. It should contain at least one content line in order to match. The header and alignment data also has to obey certain rules as seen on the examples.

It's possible my approach is misguided (perhaps regex can be avoided?). If so, any answers leading to the same results easier are appreciated.


Solution

  • I need a regex solution to the same problem. Here's what I've got so far, will update it as I am able to improve it:

    |(?:([^\r\n|]*)\|)+\r?\n\|(?:(:?-+:?)\|)+\r?\n(\|(?:([^\r\n|]*)\|)+\r?\n)+
    

    Regular expression visualization

    Debuggex Demo

    Tested with javascript