javascriptpythonregexgoogle-code-prettify

Javascript flavor regex for identifying valid Python strings enclosed within triple quotes


I'm trying to write a Prettify-style syntax highlighter for Qiskit Terra (which closely follows the Python syntax). Apparently, Prettify uses Javascript flavor regex. For instance, /^\"(?:[^\"\\]|\\[\s\S])*(?:\"|$)/, null, '"' is the regex corresponding to valid strings in Q#. Basically I'm trying to put together the equivalent regex expression for Python.

Now, I know that Python supports strings within triple quotes i.e. '''<string>''' and """<string>""" are valid strings (this format is especially used for docstrings). To deal with this case I wrote the corresponding capturing group as:

(^\'{3}(?:[^\\]|\\[\s\S])*(?:\'{3}$))

Here is the regex101 link.

This works okay except in some cases like:

''' 'This "is" my' && "first 'regex' sentence." ''' &&
''' 'This "is" the second.' '''

Clearly here it should have considered ''' 'This "is" my' && "first 'regex' sentence." ''' as one string and ''' 'This "is" the second.' ''' as another. But no, the regex I wrote groups together the whole thing as one string (check the regex101 link). That is, it doesn't conclude the string even when it encounters a ''' (corresponding to the ''' at the beginning).

How should I modify the regex (^\'{3}(?:[^\\]|\\[\s\S])*(?:\'{3}$)) to take into account this case? I'm aware of this: How to match “anything up until this sequence of characters” in a regular expression? but it doesn't quite answer my question, at least not directly.


Solution

  • I Don't know what else you want to use this for but the following regex does what you want with the example given with the MULTILINE flag on.

    My_search = re.findall("(?:^\'{3})(.*)(?:\'{3})", My_string, re.MULTILINE)
    
    print(My_search[0])
    print(My_search[1])
    

    Output is,

    'This "is" my' && "first 'regex' sentence." 
    'This "is" the second.' 
    

    You can also see it working here https://regex101.com/r/k4adk2/11