regexobsidian

Regex to match strings when text starts with specific string


I am using Obsidian with the Obsidian_to_Anki-Plugin. I need Regex to match all first level headings of a page with a capturing group, but only when the page starts with #match. The plugin compiles the Regex with a multiline flag. The pages have this structure:

#match

# Heading 1
Text of Heading 1
# Heading 2
Text of Heading 2
# Heading 3
Text of Heading 3

This shouldn't get matched:

# Heading 1
Text of Heading 1
# Heading 2
Text of Heading 2
# Heading 3
Text of Heading 3

I came up with this Regex: #match\s\s(# .*). But this way only Heading 1 gets matched with capturing group 1, because there is no #match before Heading 2.

Is there a way to solve this?

Thanks in advance!


Solution

  • Updated

    (?<=                        # Match something preceded by
      (?<![\s\S])               #          at the start of the file
      #match\n                  # '#match'
      [\s\S]*                   # and anything in between:
    )                           # 
    ^(# .+)\n                   # A heading followed by
    ([\s\S]+?)                  # its corresponding content, which might be as long as possible,
    (?=\n# |<!--ID|(?![\s\S]))  # until another heading, '<!--ID' or the end of file.
    

    Try it on regex101.com.

    Since ECMAScript doesn't support \A and \Z (there's a proposal for adding them though), we'll have to make do by using negative lookarounds: (?![\s\S])/(?<![\s\S]). [\s\S] matches any single character, so (?![\s\S]) can only match at the position where no succeeding characters can be found: the end of string. The same explanation applies to (?<![\s\S]).

    Original answer

    (?:\A#match|\G(?!\A))  # Match '#match' at the start of the file, or the end of the last match
    \s*                    # followed by 0 or more whitespaces;
    \K                     # we forfeit everything we just matched
    ^(# .+)\n              # then match and capture a heading, before continuing to the next line
    ([\s\S]+?)             # and capture the section's content,
    (?=\n# |\Z)            # which must precedes either another heading or the end of file.
    

    Try it on regex101.com.

    This makes use of the following metasequences: