I am using Obsidian with the Obsidian_to_Anki-Plugin. I need Regex to match all first level headings of a page with a capturing group, but only when the page starts with #match
. The plugin compiles the Regex with a multiline flag. The pages have this structure:
#match
# Heading 1
Text of Heading 1
# Heading 2
Text of Heading 2
# Heading 3
Text of Heading 3
This shouldn't get matched:
# Heading 1
Text of Heading 1
# Heading 2
Text of Heading 2
# Heading 3
Text of Heading 3
I came up with this Regex: #match\s\s(# .*)
. But this way only Heading 1
gets matched with capturing group 1, because there is no #match
before Heading 2
.
Is there a way to solve this?
Thanks in advance!
(?<= # Match something preceded by
(?<![\s\S]) # at the start of the file
#match\n # '#match'
[\s\S]* # and anything in between:
) #
^(# .+)\n # A heading followed by
([\s\S]+?) # its corresponding content, which might be as long as possible,
(?=\n# |<!--ID|(?![\s\S])) # until another heading, '<!--ID' or the end of file.
Try it on regex101.com.
Since ECMAScript doesn't support \A
and \Z
(there's a proposal for adding them though), we'll have to make do by using negative lookarounds: (?![\s\S])
/(?<![\s\S])
. [\s\S]
matches any single character, so (?![\s\S])
can only match at the position where no succeeding characters can be found: the end of string. The same explanation applies to (?<![\s\S])
.
(?:\A#match|\G(?!\A)) # Match '#match' at the start of the file, or the end of the last match
\s* # followed by 0 or more whitespaces;
\K # we forfeit everything we just matched
^(# .+)\n # then match and capture a heading, before continuing to the next line
([\s\S]+?) # and capture the section's content,
(?=\n# |\Z) # which must precedes either another heading or the end of file.
Try it on regex101.com.
This makes use of the following metasequences:
\A
: The very start of the whole string\G
: The end of the last match or \A
\G(?!\A)
: The end of the last match only\K
: Forfeit everything matched by the expression on its left\Z
: The end of string or the position before the last line terminator iff it is also the last character of the whole string.