pythonregexparsingalteryx

Parsing Data using Regex. Split it into columns via groups


I want to use REGEX to parse my data into 3 columns

Film data:
Marvel Comics Presents (1988) #125
Spider-Man Legends Vol. II: Todd Mcfarlane Book I (Trade Paperback)
Spider-Man Legends Vol. II: Todd Mcfarlane Book I
Spider-Man Legends Vol. II: Todd Mcfarlane Book I (1998)
Marvel Comics Presents #125

Expected output: enter image description here

I can see how to group it, but can't seem to REGEX it: enter image description here

I built this expression: (.*)\((\d{4})\)(.*)

I want to essentially use the ? quantifier to say the following: (.*)\((\d{4})\)**?**(.*) sort of like saying this group may or may not be there?

Nevertheless, it's not working.


Solution

  • You could use 2 capture groups, where the last 2 are optional:

    ^(.*?)(?:\((\d{4})\))?\s*(#\d+)?$
    

    The pattern matches:

    See a regex101 demo.