Here's a code that retrieves information about package imports in a LaTeX
file. I fail to catch the optional dates in square brackets. How can I do this?
import re
test_str = r"""
\RequirePackage[
top = 2.5cm,
bottom = 2.5cm,
left = 2.5cm,
right = 2.5cm,
marginparwidth = 2cm,
marginparsep = 2mm,
heightrounded
]{geometry}%
[2020-01-02]
\RequirePackage{tocbasic}
\RequirePackage[svgnames]%
{xcolor}%
[2023/11/15]
\RequirePackage[raggedright]% OK?
{titlesec}
\RequirePackage{xcolor}%
[2022/06/12]
\RequirePackage{hyperref}% To load after titlesec!
[2023-02-07]
"""
pattern = re.compile(
r"\\RequirePackage(\[(.*?)\])?([^{]*?)?{(.*?)}",
flags = re.S
)
matches = pattern.finditer(test_str)
for m in matches:
print('---')
for i in [0, 2, 4]:
print(f"m.group({i}):")
print(m.group(i))
print()
Here is the actual output.
---
m.group(0):
\RequirePackage[
top = 2.5cm,
bottom = 2.5cm,
left = 2.5cm,
right = 2.5cm,
marginparwidth = 2cm,
marginparsep = 2mm,
heightrounded
]{geometry}
m.group(2):
top = 2.5cm,
bottom = 2.5cm,
left = 2.5cm,
right = 2.5cm,
marginparwidth = 2cm,
marginparsep = 2mm,
heightrounded
m.group(4):
geometry
---
m.group(0):
\RequirePackage{tocbasic}
m.group(2):
None
m.group(4):
tocbasic
---
m.group(0):
\RequirePackage[svgnames]%
{xcolor}
m.group(2):
svgnames
m.group(4):
xcolor
---
m.group(0):
\RequirePackage[raggedright]% OK?
{titlesec}
m.group(2):
raggedright
m.group(4):
titlesec
---
m.group(0):
\RequirePackage{xcolor}
m.group(2):
None
m.group(4):
xcolor
---
m.group(0):
\RequirePackage{hyperref}
m.group(2):
None
m.group(4):
hyperref
You could update the pattern using negated character classes and omit the flags = re.S
\\RequirePackage(\[([^][]*)\])?([^{]*){([^{}]*)}.*(?:\n\s*\[([^][]*)])?
The pattern matches:
\\RequirePackage
Match \RequirePackage
(\[([^][]*)\])?
Optionally capture [...]
([^{]*)
Capture optional chars other than {
{([^{}]*)}
Capture what is between {...}
.*
Match the rest of the line(?:
Non capture group
\n\s*\[([^][]*)]
Match a newline, optional whitespace chars and then capture what is between [...]
)?
Close the non capture group and make it optionalSee a regex 101 demo and a Python demo.
If you are only interested in group 2, 4 and the added group 5 then you can omit 2 capture groups which are not interesting use 3 capture groups in total in the regex:
\\RequirePackage(?:\[([^][]*)\])?[^{]*{([^{}]*)}.*(?:\n\s*\[([^][]*)])?
See the group values in the regex101 demo and another Python demo