pythonregexstring

Split string at patterns alternatingly


I'm relatively new to regex and I'm struggeling with a very specific appliciation. Say I have a string such as this:

"> The following will be split : This was split: But this wasn't: And neither is this > But this is again: Aswell as this"

I want to split this string at >'s and :'s alternatingly, that is split at the first > and at the first : but not at the :'s after that until the next > follows. Also, idealy, the >'s should be captured but not the :'s. (The symbols are placeholders for more complex patterns). For the record, that's:

['>', 'The following will be split', 'This was split: But this wasn't: And neither is this', '>', 'But this is again', 'Aswell as this']

How should I do that using a single regex expression?


Solution

  • use re

    import re
    re.findall("(>)([^:]+):([^>]+)", string)
    [('>', ' The following will be split ', " This was split: But this wasn't: And neither is this "), ('>', ' But this is again', ' Aswell as this')]
    

    If you want exact results do:

    
    list(sum(re.findall("(>)([^:]+):([^>]+)", string), ()))
    ['>', ' The following will be split ', " This was split: But this wasn't: And neither is this ", '>', ' But this is again', ' Aswell as this']