pythonstringreplace

How to replace *all* occurrences of a string in Python, and why `str.replace` misses consecutive overlapping matches?


I want to replace all patterns 0 in a string by 00 in Python. For example, turning:

'28 5A 31 34 0 0 0 F0'

into

'28 5A 31 34 00 00 00 F0'.

I tried with str.replace(), but for some reason it misses some "overlapping" patterns: i.e.:

$ python3
Python 3.12.3 (main, Feb  4 2025, 14:48:35) [GCC 13.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> '28 5A 31 34 0 0 0 F0'.replace(" 0 ", " 00 ")
'28 5A 31 34 00 0 00 F0'
>>> '28 5A 31 34 0 0 0 F0'.replace(" 0 ", " 00 ").replace(" 0 ", " 00 ")
'28 5A 31 34 00 00 00 F0'

notice the "middle" 0 pattern that is not replaced by 00.


Edit 1:

Thanks for the answer(s). The regexp works nicely.

Still, I am confused. The official doc linked above says:

"Return a copy of the string with all occurrences of substring old replaced by new. If count is given, only the first count occurrences are replaced. If count is not specified or -1, then all occurrences are replaced.".

"Clearly" this is not the case? (or am I missing something?).


Solution

  • A better tactic would be to not look for spaces around the individual zeros, but to use regex substitution and look for word boundaries (\b):

    >>> import re
    >>> re.sub(r'\b0\b', '00', '28 5A 31 34 0 0 0 F0')
    '28 5A 31 34 00 00 00 F0'
    

    This has the added benefit that a 0 at the start or end of the string would get replaced into 00 as well.

    If you want the exact same semantics, you could use positive lookbehind and lookahead to not "consume" the space characters:

    >>> re.sub(r'(?<= )0(?= )', '00', '28 5A 31 34 0 0 0 F0')
    '28 5A 31 34 00 00 00 F0'
    

    The reason why your original attempt does not work is that when str.replace (or re.sub) finds a pattern to be replaced, it moves forward to the next character following the whole match.

    So:

    '28 5A 31 34 0 0 0 F0'.replace(' 0 ', ' 00 ')
    #           ^-^      #1 match, ' 0 ' → ' 00 '
    #              ^     start looking for second match from here
    #               ^-^  #2 match, ' 0 ' → ' 00 '
    '28 5A 31 34 00 0 00 F0'
    #           ^--^ ^--^
    #            #1   #2
    

    The CPython (3.13.3) str.replace implementation can be seen from here: https://github.com/python/cpython/blob/6280bb547840b609feedb78887c6491af75548e8/Objects/unicodeobject.c#L10333, but it's a bit complex with all the Unicode handling.


    If it would work as you'd "wish", you still wouldn't get the output that you desire, as you'd get extra spaces (each overlapping  0  in the original string would cause its own  00  to appear into the output string):

    # Hypothetical:
    '28 5A 31 34 0 0 0 F0'.replace(' 0 ', ' 00 ')
    #           ^-^      #1 match, ' 0 ' → ' 00 '
    #             ^-^    #2 match, ' 0 ' → ' 00 '
    #               ^-^  #3 match, ' 0 ' → ' 00 '
    '28 5A 31 34 00  00  00 F0'
    #           ^--^^--^^--^
    #            #1  #2  #3
    

    If it still seems unintuitive why you'd get those extra spaces, consider ABA to be  0  and X__X to be  00 , and look at this:

    # Analogous to: ' 0 0 0 '.replace(' 0 ', ' 00 ')
    'ABABABA'.replace('ABA', 'X__X')
    'X__XBX__X'     # What you get in reality now.
    'X__XX__XX__X'  # What you would get with the above logic (=extra consecutive X characters, i.e. spaces).
    

    And finally, if it would work like calling replace as many times as there's something to replace does, a trivial 'A'.replace('A', 'AA') would just loop infinitely ('A''AA''AAAA'→…).


    So, it just "has" to work this way. This is exactly why regex allows using lookahead and lookbehind to control which matched parts actually consume characters from the original string and which don't.