I'm trying to write a Prettify-style syntax highlighter for Qiskit Terra (which closely follows the Python syntax). Apparently, Prettify uses Javascript flavor regex. For instance, /^\"(?:[^\"\\]|\\[\s\S])*(?:\"|$)/, null, '"'
is the regex corresponding to valid strings in Q#. Basically I'm trying to put together the equivalent regex expression for Python.
Now, I know that Python supports strings within triple quotes i.e. '''<string>'''
and """<string>"""
are valid strings (this format is especially used for docstrings). To deal with this case I wrote the corresponding capturing group as:
(^\'{3}(?:[^\\]|\\[\s\S])*(?:\'{3}$))
Here is the regex101 link.
This works okay except in some cases like:
''' 'This "is" my' && "first 'regex' sentence." ''' &&
''' 'This "is" the second.' '''
Clearly here it should have considered ''' 'This "is" my' && "first 'regex' sentence." '''
as one string and ''' 'This "is" the second.' '''
as another. But no, the regex I wrote groups together the whole thing as one string (check the regex101 link). That is, it doesn't conclude the string even when it encounters a '''
(corresponding to the '''
at the beginning).
How should I modify the regex (^\'{3}(?:[^\\]|\\[\s\S])*(?:\'{3}$))
to take into account this case? I'm aware of this: How to match “anything up until this sequence of characters” in a regular expression? but it doesn't quite answer my question, at least not directly.
I Don't know what else you want to use this for but the following regex does what you want with the example given with the MULTILINE flag on.
My_search = re.findall("(?:^\'{3})(.*)(?:\'{3})", My_string, re.MULTILINE)
print(My_search[0])
print(My_search[1])
Output is,
'This "is" my' && "first 'regex' sentence."
'This "is" the second.'
You can also see it working here https://regex101.com/r/k4adk2/11