python-3.xregex

Catching optional content just after a new line


Here's a code that retrieves information about package imports in a LaTeX file. I fail to catch the optional dates in square brackets. How can I do this?

import re

test_str = r"""
\RequirePackage[
  top            = 2.5cm,
  bottom         = 2.5cm,
  left           = 2.5cm,
  right          = 2.5cm,
  marginparwidth = 2cm,
  marginparsep   = 2mm,
  heightrounded
]{geometry}%
 [2020-01-02]

\RequirePackage{tocbasic}

\RequirePackage[svgnames]%
               {xcolor}%
               [2023/11/15]

\RequirePackage[raggedright]%  OK?
                {titlesec}

\RequirePackage{xcolor}%
               [2022/06/12]

\RequirePackage{hyperref}% To load after titlesec!
               [2023-02-07]
    """

pattern = re.compile(
    r"\\RequirePackage(\[(.*?)\])?([^{]*?)?{(.*?)}",
    flags = re.S
)

matches = pattern.finditer(test_str)

for m in matches:
    print('---')

    for i in [0, 2, 4]:
        print(f"m.group({i}):")
        print(m.group(i))
        print()

Here is the actual output.

---
m.group(0):
\RequirePackage[
  top            = 2.5cm,
  bottom         = 2.5cm,
  left           = 2.5cm,
  right          = 2.5cm,
  marginparwidth = 2cm,
  marginparsep   = 2mm,
  heightrounded
]{geometry}

m.group(2):

  top            = 2.5cm,
  bottom         = 2.5cm,
  left           = 2.5cm,
  right          = 2.5cm,
  marginparwidth = 2cm,
  marginparsep   = 2mm,
  heightrounded


m.group(4):
geometry

---
m.group(0):
\RequirePackage{tocbasic}

m.group(2):
None

m.group(4):
tocbasic

---
m.group(0):
\RequirePackage[svgnames]%
               {xcolor}

m.group(2):
svgnames

m.group(4):
xcolor

---
m.group(0):
\RequirePackage[raggedright]%  OK?
                {titlesec}

m.group(2):
raggedright

m.group(4):
titlesec

---
m.group(0):
\RequirePackage{xcolor}

m.group(2):
None

m.group(4):
xcolor

---
m.group(0):
\RequirePackage{hyperref}

m.group(2):
None

m.group(4):
hyperref

Solution

  • You could update the pattern using negated character classes and omit the flags = re.S

    \\RequirePackage(\[([^][]*)\])?([^{]*){([^{}]*)}.*(?:\n\s*\[([^][]*)])?
    

    The pattern matches:

    See a regex 101 demo and a Python demo.


    If you are only interested in group 2, 4 and the added group 5 then you can omit 2 capture groups which are not interesting use 3 capture groups in total in the regex:

    \\RequirePackage(?:\[([^][]*)\])?[^{]*{([^{}]*)}.*(?:\n\s*\[([^][]*)])?
    

    See the group values in the regex101 demo and another Python demo