javaregextranslationyaml-front-matter

Regex to match keys and array values inside a front matter


I'd like to match keys and some array values inside a front matter to convert them to tags in a translation memory. Basically, any matched key and value will be filtered out and appear as non-translatable tag.

The system supports Java regexp.

Here's the front matter:

The array values do not have a hyphen anymore due to some preprocessing.

---
title: This is a title
label:
one
two
three
ultra
description: "this is a description text"
other_key: value
---

note: this is a note outside the front matter
tip: this is a tip ...
one: this is a one

The problem:

Important note: The docs state "we will reject complex regular expressions with quantifiers (except possessives) on groups which contain other quantifiers (except possessives)."

Depending on what this means we might need to go with a very naive approach :/

My regular expressions so far:


Solution

  • Converting my comment to answer so that solution is easy to find for future visitors.

    You may use this regex with look arounds and \G to match all keys and labels before ---:

    (?:(?:^label:|(?<!\A)\G)\R(\S+)|^\w+(?=:))(?=(?:.*\R)+---)
    

    RegEx Demo

    Breakup: